Understanding Spoken subtitles

Why and how video players should follow the EN 301 549 standard and offer spoken subtitles when the main audio is difficult to hear or understand.

What are spoken subtitles?

Spoken subtitles (also known as spoken captions or audio subtitles) means that the subtitles can be presented as speech in addition to text.

A practical example

The following video has spoken subtitles.

Click this button if you need to activate them:

Bokstaven T med tre bågar ovanpå

In the player settings, you can chose if the video should pause when subtitles are spoken, or if subtitles should be spoken "on top of" the main soundtrack. Using the volume control of the video player, you should be able to adjust the balance between the spoken subtitles and the original sound.

Why do users need spoken subtitles?

This is a reasonable question. After all, the content of subtitles IS already spoken!

First of all, there are several reasons why some people cannot or find it very difficult to understand speech in the main soundtrack.

When is speech hard to understand?

When the main soundtrack is in a foreign language. (Translated subtitles are very common in smaller countries, while larger countries often provide a dubbed translation instead.)
Accents, dialects, sociolects and idiolects sometimes make speech difficult to understand even if you know the language
Speech disorders can make some speech difficult to understand
Sound quality problems - such as when there's too little difference in volume between speech and background sounds.

Ok, but why not just read the subtitles?

Why are text subtitles not enough?

The non-sighted cannot read visual subtitles
Many partially sighted either cannot or find it difficult to see the subtitles
Some persons with dyslexia find it hard to read the subtitles
Some people cannot read quickly enough
Most children have not yet learnt how to read
Reading text subtitles can make it difficult to focus on other visual content at the same time

An unknown legal requirement?

Since 2021, clause 7.1.5 Spoken subtitles in the accessibility standard EN 301 549 has required us to "provide a spoken output of the available captions" when publishing video with sound. Being one of the clauses that is not inherited from the more well-known standard WCAG, few people probably know about it.

An unfortunate exception

Captions that are not "programmatically determinable" are excepted from the requirement. A note indicates that open captions (where the characters are bitmaps in the video imagery) are an example of this. For users who need spoken captions this is unfortunate, because it brings many videos out of scope. However, thanks to ever-increasing computing power, even open captions may soon be considered as programmatically determinable.

Hard to understand?

As with most clauses in the EN 301 549, the formulations are compact and not very easy to read. For the WCAG based clauses there are Understanding documents, but for other clauses it is hard to find explanations and examples. (This is one of the reasons we wrote the article you are now reading.)

The EN 301 549 is almost EU law

EU regulation such as the Web Accessibility Directive and the European Accessibility Act point to this standard. But still after three years there aren't many video players that can present subtitles as audio themselves. And only some of them work well together with commonly available accessibility features for spoken subtitles.

Do all EU public sector videos break the law?

No, not all. If the combination of video content, video player and the users' tools (features in the operating system, assistive technologies) can meet the requirement, then videos are ok. But you have to make sure it works in practice.

What solutions are there?

There are several ways subtitles can be spoken:

Text-to-speech rendered as sound by the user's screen reader software. (A great advantage of this option is that it can also render Braille output. But the option is only available for users who have a screen reader and know how to use it. And it requires that the closed captions are encoded in a suitable way. Under Windows, the Jaws screen reader can do this, for example. For iOS, you can go to Settings > Accessibility > Voiceover > Verbosity and ask for spoken captions, but it doesn't seem to work with all media players. Eg. not with Youtube, as far as I can tell.)
Extra audio track containing a recording of somebody reading the subtitles. (A human voice can give a great user experience, and volume and timing can be optimized to match the original sound. However, production is expensive and this method is not common.)
Extra audio track containing a text-to-speech rendered as sound on the server side. (This should be an interesting option, and is being used by some national broadcasting corporations for TV. But I have not yet seen it on the web.)
Text-to-speech rendered as sound by the user agent (eg. web browser), based on text files such as VTT or SRT (commonly used for closed captions). An example of this option is in the video on this page. The video player used here - the open-source AblePlayer - is one of few players that can present audio subtitles rendered from text subtitles (using a feature developed for text based audio descriptions).
Text-to-speech rendered as sound by the user agent (eg. web browser), based on OCR of open captions. (Quite computationally intensive, and very unusual in a web context.)
Text-to-speech rendered as sound by software (accessibility feature of the operating system or plug-in) in the user's computer, based on subtitle text files. (This is a simple and attractive solution, but it only works if the software can access the subtitle text. There are plug-ins for Chrome that can read the subtitles of Youtube videos, for example.)
There are probably more solutions as well. (Please tell me if you know of one!)

Recommendations

Always produce subtitles for your videos.
Preferrably, use closed captions for your subtitles, because it is more likely that the user will be able to hear them than open captions.
Ask your video player provider to support at least one type of text based spoken subtitles, if they haven't already! Those who develop and sell video player software need to know about the legal requirements, user needs and technical possibilities. They should experiment to be able to produce excellent UX for spoken subtitles.
In practice, the mixing of spoken captions and original sound can be a challenge. Ideally, the user should be given the ability to adjust the balance between them.

Acknowledgements

Many thanks to Henrik Götesson, policy manager for digital inclusion at the Swedish Association of the Visually Impaired (SRF), for insightful contributions, especially regarding the role of screen reading software and the importance of making Braille rendering possible!