While it’s great that more people are interested in captions and transcripts, they often ask me if auto captions or ASR (automated speech recognition) are a great accessibility solution. I would say no it’s not the best practice, especially for professional content. But it also depends.
I wrote an article about ASR a few years ago. Speech technologies may have gotten better, but they still cannot replace human captioners.
The video explains why quality speech to text is important. It shows that even if auto captions may be accurate, they may be hard to follow without proper formatting. It’s something that can be done only by humans.
Captioning is more than just adding words. It’s an art that is best done by experts who are familiar with captioning best practices. It’s important that captions and transcripts are formatted properly for optimal reading and understanding.
Reading auto captions is like reading a book. It may be hard to understand if it has many grammatical mistakes and poor formatting. You wouldn’t be okay with that book, right? Then why would expect deaf people to be okay with auto captions?
Bad captions are not better than nothing. It may not be a big deal for those of you who can fall back on good hearing if poor captions don’t make sense to you. However, bad captions are very taxing on our deaf brains the same way bad audio is difficult for those with good hearing to follow.
I suggest you to try to turn off sound and to read auto captions for at least an hour. Yes, an hour, not a minute. If it was difficult for you to read auto captions, imagine how we deaf people deal with auto captions for videos and virtual meetings on a regular basis – especially during the pandemic. Imagine listening to bad audio all day every day.
When someone complains about bad audio, it’s fixed right away. Yet when we complain about bad captions, we are told that it’s better than nothing. We don’t tell you to put up with bad audio but you tell us to put up with bad captions. It’s not acceptable. Good quality captions are as critical as good quality audio.
International organizations of deaf and hard of hearing people made a joint statement about ASR. They say that while they encourage research and development of ASR (with involvement of deaf and hard of hearing people), they still do not consider that ASR is mature enough to replace human captioners at this point.
Now you may wonder that if quality captioning is so important, then why deaf and hard of hearing people have petitioned lately that Zoom offer auto captions for free like some other video platforms? Yes, it may be confusing.
As a deaf person and a professional accessibility consultant, I would say that any media platform definitely needs to offer auto captions as part of standard software features. I personally would want that, too. It doesn’t mean, however, that it can be used for any situations.
Auto captions are like a spell check feature. Spell check would not replace the need for professional text editors. It may be good enough if you want to send a email without typos, for example. However, it would not be enough if you are writing a book – you would need to hire an editor.
Or take the example of auto translation. It may be okay when you read someone’s messages or social media posts in other language than your own. Or maybe to get an idea of an online article content in a foreign language. However, if an organization wants to be reputable, their content needs to have professional translations.
The same goes to ASR.
For example, you want to be able to communicate with your family or friends or coworkers via video or in person. In that case, turning on auto captions or using an ASR app on a mobile device would be a great solution. It’s informal conversations. ASR can also help deaf people to communicate with people in masks. You can ask to clarify certain parts of conversations that don’t make sense to you.
Auto captions are usually better when one person is speaking clearly into a microphone than when there’s a group of people talking fast and over each other. If you want to create a podcast transcript, for example, using ASR may be a great start. It allows you to auto generate text while you speak and then to clean it up afterwards and to add final touches.
For professional content, however, using professional services to create captions and transcripts would be the best solution. It not only offers optimal readability of content for deaf users, but also improves your business brand.
Last, but not least, there’s a big difference between real-time and post-production captions.
While I may be okay with auto captions for some live events (depending on how clear speakers are), I would not watch any post-production media if it doesn’t have accurate and well formatted captions and transcripts.
There are many various options to properly caption and transcribe aural information – using auto tools or professional services or both. It depends on a situation and a type of the content.
The best way to ensure that deaf people have good experience is to ask them directly. Whenever possible, use professional services.
I find it funny when I ask people to type for me what they say, they worry about accuracy of their writing. Yet when they share their information verbally with the public, they don’t seem to worry about accuracy of speech to text output. Not just that, experiences of reading transcriptions by hearing people is not the same as experiences by deaf people who depend on it on a regular basis. That’s why it’s important to listen to deaf people.
In conclusion, I would say that ASR is not the best accessibility practice for professional content. However, it may be a valuable tool for informal conversations and – if used right – for certain live events and as a starting point for creating post production captions and transcripts.
Do you have more questions or need consulting services? Contact me – I look forward to hearing from you.