Regardless of how transcripts are done (by typing manually, outsourcing the service, or using speech recognition), it is important to follow best practices to create a good quality and verbatim transcript with speaker identifications and sound descriptions.
Transcripts are enough for podcasts, but not for videos that also need to be captioned.
Typing by hand is the most time consuming way to create a transcript, but it may be the most cost effective if an audio component is not too long and if you know transcription best practices.
Hiring a vendor is a great option if you don’t have time and need to create good quality transcripts for long audio components. You can also use transcripts with timestamps that you outsourced to a professional vendor if you want to convert them into good quality captions for videos on your own.
Using speech recognition software may sound like an exciting option, but this technology is far from perfect. A person must train the software and deliver the audio in a very clear voice and in an environment with minimal background noise. Many multimedia presentations fail to meet these requirements. Therefore speech recognition cannot be fully automatic. If used, it needs to be automated by a human.
Below is a video showing the example of why speech recognition is not accurate and may cause comprehension problems if not automated by a human.
Do you have more questions or need customized solutions?