On-site face-to-face events have become limited or difficult to access since 2020 due to the global epidemiological situation. This situation has posed and continues to pose challenges to companies and institutions, leading to the increased use of videoconferencing and the broadcast of e.g. conferences or trainings via online live streams. This enables location-independent communication and thus replaces “face-to-face communication” on site. However, the fact that online communication requires different preconditions for successful information transfer and brings new challenges, especially for persons with impairments, are aspects that are often not taken into consideration when switching from “on site” to online.
Persons with hearing impairments often have difficulty following the content in online meetings or live streaming, whether due to poor video transmission or presentations with reduced or no speaker video, lip-reading becomes difficult or not possible at all. Live transcription of the spoken presentation can therefore be very helpful.
AI enables live transcription without much effort
Subtitles with synchronous translation or so-called live transcriptions (closed captioning) are suitable for circumventing the difficulties that arise when lip-reading using video images. There are two options: “Closed Captioning” with a manual (a person in the meeting types in what is said via a separate window) or an automatic solution with AI.
How automatic transcription works
Automatic transcription solutions link acoustic sounds to words in a digital language model – similar to a digital dictionary. When these sounds have multiple possible matches – for example, due to slurred pronunciation – the automatic transcription software examines the overall context and assigns a probability to each possible word and selects the word it thinks is the most likely match. This analysis is driven by deep learning algorithms.
Most automatic transcription solutions are designed for post-production. Services of this type run your audio file through automatic transcription software and send you the result in text form. The processing typically takes place in the cloud, but there are also on-site speech-to-text solutions. However, such post-production solutions are not suitable for live events such as (video) conferences, court hearings or sporting events.
AI-driven live transcription offers better options here and works relatively easily: imagine a speaker on stage giving a keynote speech. The microphone into which he speaks is connected to a laptop or other device running cloud-based automatic transcription software. Everything the speaker says is sent to the cloud as audio. In the cloud, AI natural language processing technology matches the various sounds with words in a digital language model. The software then sends the text back to be displayed on a monitor for anyone to read along. The data that the software uploads and downloads is very small, so the whole process happens very quickly.
The advantages of automatic transcriptions: Accuracy and convenience
AI-based transcription works well when conditions are optimal. Background noise, poor acoustics, strong accents and dialects, specialized vocabulary, and inferior recording equipment can all affect the accuracy of AI speech-to-text transcription.
However, with the constant optimization of neural networks that drive speech recognition technology, machine transcription is getting better every day. With some transcription solutions, you can already ensure before an event that potentially difficult accents or dialects are recognized more effectively than with human transcription. With other solutions, it is possible to add words and terms to the system’s dictionary to support improved recognition. This feature is invaluable for events where foreign words, jargon and technical language are used.
The accuracy advantage of AI doesn’t end there, however. Speech recognition solutions can also analyze context to eliminate ambiguity in word usage. In addition, with live machine-driven transcription software, an editor can make corrections on the fly using a live editor, so small errors can still be corrected quickly.
In addition, it’s not always possible to bring in human help for live captioning or subtitling in general. Perhaps you have scheduled a meeting on short notice and would like to provide participants with a transcript for review. Maybe there are multiple conferences happening at the same time and no transcription professional with the right skills is available.
With AI-based transcription, you don’t have to worry about that. You can quickly set up the automated transcription solution and it will do its job. You can also test the system in advance of an event to check the accuracy of speech recognition or customize it to recognize industry-specific words or dialects.
Automatic transcription solutions are also more flexible, with many supporting multiple languages at once and offering simultaneous translation into other languages.
Now, there are numerous AI-powered transcription solutions on the market, such as those from Google, Microsoft, or Amazon, all of which do a good job in certain fields and languages.
But what if you had access to a variety of providers on a single platform, and even had the option of not having to decide which solution to use at all, and instead always get the best result from all providers delivered to you?
Eugen Gross, CEO & Founder of aiconix, and his team have developed various digital solutions that use Artificial Intelligence to convert speech into text and use it to create transcripts of audio files as well as (live) subtitles in videos for people with hearing impairments. At the same time, these subtitles can be automatically translated into other languages, making the audio-visual content accessible to an international audience.
Eugen Gross: “We want to show with our solutions that AI applications can contribute to inclusion and improved communication for people with hearing impairments and foreign language speakers.”
Already, some public institutions are managing to create a piece of digitalization and accessibility together with aiconix, without tying up a lot of resources with manual work.