Working with HTML5 video: Adding captions and subtitles

In this content kit we'll teach the basics of adding captions and subtitles (and other text track types) to HTML5 <video> using the <track> element, and look at support possibilities in legacy browsers. Learners will gain new knowledge of HTML and JavaScript related to this area, and how subtitles and captions are defined and used on the Web.

Technology level: HTML5 <video> is well-supported across modern browsers (going back to Internet Explorer 9); <track> is also fairly well supported (all modern browsers, going back to Internet Explorer 10).

Please file any issues you find against This content kit's github repo.

Versioning information

Content kit v0.9: last significant update 13th April 2015. This content kit is published under the Mozilla Public License, version 2.0.

What should the presenter have?

It would be helpful to have a good level of HTML knowledge — including <video> and <track> elements, and HTML5 element fallbacks — and JavaScript/DOM knowledge.

What should the audience have?

Learning objectives

After you present or teach this content kit, your audience will:

Links to resources

Project resources overview

Supporting docs/references

Presentation setup

Presenting about HTML5 video subtitles and captions is fairly simple — you just need the slides and demo materials, downloaded locally if possible so network connectivity is not a problem. Just running the presentation without a code walkthrough or workshop should take about 45 minutes.

Demo setup

Active learning

The slides include a marker — starting with "Code time" — that links through to the relevant code version at each point. These are good places to present the demo!

At these points you should click the link (and get any audience members that are following along with their own computers to do the same), and then have a short pause to allow everyone to have a play with the code and see what's happening for themselves. to show you where to include each step of the tutorial walkthrough. The tutorial sections include notes to show which slide number the section corresponds to. The source code has a separate directory for each stage of the tutorial/code walkthrough that shows what the code should look like at each stage.

If you want to show a detailed code walkthrough, allow another 15-20 minutes, and follow the steps provided in the tutorial.

If your viewers have computers available and you want them to follow along with the tutorial in a workshop type situation, allow an additional 40 minutes for experimentation and Q&A/troubleshooting. At the end of this session it is a good idea to have a "sharing" session so that the audience can share anything interesting they've created, give each other feedback, and ask questions.

To make this process as seamless as possible, you (and your attendees) should have the slides open in one browser window, the code result and tutorial open in another browser window, and the code open in your text editor. This way you can easy switch between your slides and your coding environment, if/when the time comes to do some more live coding.

Frequently asked questions (FAQs)

Why do we need different sources? Why can't browsers just support the same video formats?
Because patents and competition. Microsoft and Apple are part patent holders of the H264 video format contained in the MP4 container format. Google, Mozilla and Opera aren't, so they have direct support for the non-patent encumbered WebM format instead. You should have at least these two formats available, possibly more if you want to support older browsers. Read Media formats supported by the HTML audio and video elements for exhaustive detail.
Why was the older (better established) SRT format dropped in favour of WebVTT?
Because SRT was only really for subtitles, whereas video text tracks encompass subtitles plus a wide range of other uses. In addition, WebVTT allows you to add rudimentary styling to text tracks.
Why is JavaScript needed to reliably display captions?
Modern browsers implement the HTMLMediaElement textTracks property that provides access to all the text tracks associated with the video, and the associated APIs that allow further manipulation of these tracks. However, default browser styling of text tracks is not very reliable, so you are advised to handle it yourself.
Is there a reliable tool that speeds up subtitle/caption creation and translation?
Yes. There are a few services available, but we'd recommend Amara — UniversalSubtitles