Best Practices for Accessible Media

Correct grammar, punctuation, and spelling with speaker identifications and sound descriptions are as important in audio transcription to clarify content as voice intonation and good enunciation are in oral speech.

To provide quality transcripts, please follow the Style Guide by Casting Word.

To provide quality captions, please follow the Captioning Style Guidelines by DCMP.

Podcasts and videos are pre-recorded media, so the final transcription for that media is expected to be 100% error free and:

  • synchronized with audio (meaning no delay – for video captions),
  • equal in content (verbatim) to that of the audio (plus speaker identifications and sound descriptions), and
  • accessible and readily available along with audio (meaning that those who need it can enjoy it at same time as everyone else).

To provide quality real-time captioning services for live events and webinars:

please hire only professionally trained CART writers who can meet the minimum expected speed is 225 wpm with at least 98% accuracy. Nobody else can meet those expectations.

Shown is the table by AST (Automatic Sync Technologies) that analyzes error rates using speech recognition systems, trained stenographers and student workers was conducted.

Source Typical Error Rate Result
Trained Stenographer 0.5% to 1% No problems
Student transcriber Variable Expect to be worse than stenographer
Speech Rec: trained 3% to 5+% Varies from acceptable to poor
Speech Rec: untrained 20% to 40% Unintelligible

The presentation by AST (Automatic Sync Technologies) explains in details why it is important to provide good quality captions and why it is better to create a transcript from scratch than to edit one created by a machine.

According to AST, “Analysis on comprehension and attention focus indicates that with an error rate greater than 10%, readers are less able to comprehend the main concepts and facts presented.”

Captioning Style Guidelines

It is not enough to just create captions for videos – it is also important to style them properly to make it easy to read and understand them. Captioning Key, developed by DCMP (Described and Captioned Media Program) and NAD (National Association of the Deaf) developed the guidelines for captioning:

  • Quality Captioning: Accurate, consistent, clear, readable, equal captions.
  • Text: Case, font, line division, and caption placement.
  • Language Mechanics: “Language mechanics incorporate the proper use of spelling, grammar, punctuation, capitalization, and other factors deemed necessary for high-quality captioned media. Rules included in these guidelines are primarily those which are unique to captioning and speech-to-text.”
  • Presentation Rate: “The presentation rate is the number of captioned words per minute that are displayed onscreen.”
  • Sound Effects: “Sound effects are sounds other than music, narration, or dialogue. They are captioned if it is necessary to the understanding and/or enjoyment of the media.”
  • Speaker Identifications: “Establishing the identity of both onscreen and offscreen speakers is vital for clarity. When names are unknown, be as specific as possible in providing a label.”
  • Synchronization: “Captions should closely match the original audio. Maintaining the textual unity with picture and sound ensures clarity, and can be especially important to hard of hearing viewers.”
  • Special Considerations: Intonation, Play on Words, and No Audio; Foreign Language, Dialect, Slang, and Phonetics; Music.
  • Appendices: Spelling out numbers, dates, time, periods of time, fractions, percentages, dollar amounts, measurements.

Joe Clark’s article on multimedia accessibility also provides detailed information on how to style captions.

Difference between captions and subtitles

  • Captions convey both spoken language and sound effects (for deaf and hard of hearing).
  • Subtitles convey only translated spoken language (for foreigners).

While deaf people enjoy watching subtitled foreign movies with hearing people, they may miss important sound descriptions, speaker identifications, and what is said in a language native to hearing viewers. For example, in a French movie with English subtitles any conversation in English may not be captioned.

To give a better experience to those with hearing loss,
captions are the best choice.

General rules for text alternatives to audio/video

  • For content that is audio only, adding a transcript should suffice.
  • For a video, both captions and a transcript should be provided.

It is easier to follow a video with embedded captions than to have to go back and forth between a video and a transcript. A transcript may be useful in case a video/podcast does not work on a computer or a mobile phone. There are deaf-blind users who cannot follow captions and rely on transcripts. Also, it is faster for anyone (even blind users using screenreaders) to skim a transcript than to listen to an entire video. So it is best to use both captions and transcripts for videos. It gives users more options.

Two ways of adding a transcript

  • As a link to the script on a separate HTML page located above or under the video/audio
  • As a text display just below the video/audio.

It is advised to post a transcript as an HTML page, not as a PDF or in some other file format, unless they are supplemental to an HTML format.

Two ways of adding captions to a video

  • Open captions are a permanent part of the video and cannot be turned off. This enables videos to run in any player without technical problems
  • Closed captions display the captioned text only when it is desired. They can be used as external files. This format enables and increases content search and indexing.

For accessibility, SEO and ROI reasons,
closed captions would be the best choice.