In this tutorial we will guide you through the process of adding subtitles and captions to HTML5 video, looking at some of the problems that currently exist, and solutions to those problems.
Below we will build up a simple demo. You can see the source code, and also view it live. You'll notice that the source code has different directories — these correspond to the different stages of the tutorial, allowing you to both check what your code should look like after each stage, and start the tutorial at any stage if you don't wish to go right from the beginning.
At this point, download the content kit so you have the demo code available when working through the tutorial — see the demo directory.
This demo purely concentrates on text tracks, therefore we've not added much in terms of CSS or additional HTML/JavaScript to get in the way. If you want to style your video page up after completing the tutorial, please go ahead!
Note: Thanks to Ian Devlin for letting us use some of his code as the basis for the demo in this tutorial.
Note: This section relates to Slide 5 of the slideshow.
Let's begin by inspecting the start state of the demo. At this point we have a simple HTML5 video in our page, and not much else:
<video controls preload="metadata">
<source src="../video/sintel-short.mp4" type="video/mp4">
<source src="../video/sintel-short.webm" type="video/webm">
<p>It appears that your browser doesn't support HTML5 video. Here's a
<a href="../video/sintel-short.mp4">direct link to the video instead</a>.</p>
</video>
there is not much to see here — we have set the preload
attribute value to metadata
, so the browser will cache the video's metadata (meaning not too much data is downloaded immediately, but we have access to useful data like the video's length), included the default browser controls using the controls
attribute, and added a fallback paragraph that is displayed if the browser doesn't support HTML5 video.
The two <source>
elements provide a choice of different video formats for cross browser support.
Note: This section relates to Slide 8 of the slideshow.
Now let's move on to adding some text tracks to our video. Open up your start index.html file, and add the following lines below the <source>
elements:
<track label="English" kind="subtitles" srclang="en" src="vtt/sintel-subtitles-en.vtt" default>
<track label="Deutsch" kind="subtitles" srclang="de" src="vtt/sintel-subtitles-de.vtt">
<track label="Español" kind="subtitles" srclang="es" src="vtt/sintel-subtitles-es.vtt">
The <track>
elements associate text tracks with the video. The attributes are as follows:
default
: Sets the default text track to display (only one element per video can have this), which can be overridden by JavaScript or user preferences.kind
: Indicates what kind of text track each track is. Here we only have subtitles, but you can also have captions, descriptions and other types.label
: A user-readable title for the text track, which is displayed by the browser when listing available text tracks.src
: The URL of the text track file.srclang
: A language code indicating to the browser what language the text tracks are written in.This sounds like it all makes sense, and it does, in terms of the spec definition. Unfortunately, browsers don't currently do a very good job with their default text track UX.
default
set on it, otherwise it shows none by default.default
attribute, otherwise it shows nothing. And it doesn't provide a menu to switch tracks.default
set on it. If none of them have default
set, it loads nothing. It doesn't provide a menu to switch tracks.default
attribute.It is therefore a good idea to implement your own custom menu using JavaScript. You'll see how in the next section.
Try testing your code now by double clicking your index.html
file; be warned that if you are using Chrome/Opera for testing, you'll need to run your code through a local web server (such as Python SimpleHTTPServer), otherwise you may get an error message in the console about the text tracks being blocked from loading if they are loaded via file://
.
Let's look at the contents of one of our .vtt
(video text track) files, before we move on. HTML5 video originally used .srt
(SubRip Text) files to provide text tracks, but these were replaced by .vtt
because .srt
is only really for subtitles, whereas there are lots of different types of text track you might want to use.
Open one of the files inside your code's vtt directory in a text editor. You'll see entries like this:
WEBVTT
0
00:00:00.000 --> 00:00:12.000
[Test]
NOTE This is a comment and must be preceded by a blank line
1
00:00:18.700 --> 00:00:21.500
This blade has a dark past.
2
00:00:22.800 --> 00:00:26.800
It has shed much innocent blood.
The file must start with WEBVTT. We then include separate text track blocks, each starting with a number. The numbers must go up in order.
The second line of each block is a timestamp range, indicating what time the text track should start being shown, and what time it should disappear again. The start and end timestamps are in the format hh:mm:ss:msmsms, allowing for very precise times. All digits must be filled in, so for example you can't just write 50 milliseconds as 50 — you'd need to include a leading zero — 050.
The third line onwards (you can include multiple lines in each block) is the text that you actually want to display.
Note: there is a lot more than this available in WebVTT syntax. See Step 4: captions
Note: This section relates to Slide 16 of the slideshow.
Let's add some proper interactivity to our text tracks that works across browsers. You can see the finished version of this code in the step3 directory in the source code, if you need to check it out.
Note: In a real project you'd probably hide the browser's default controls and create a complete custom control set, as shown in Video player styling basics. Here however we just wanted to focus on the basics of text tracks.
First of all, add the following HTML below your </video>
closing tag:
<form>
<select name="select">
</select>
</form>
This will act as our simple menu for selecting the different text tracks we want to display. Next, insert a <script></script>
element just above the closing </body>
tag to put your JavaScript in (or link to a separate script file if you wish).
Now add the following inside your script element:
var video = document.querySelector('video');
var select = document.querySelector('select');
This simply grabs a reference to the elements we want to manipulate using JavaScript.
Next, add the following below your first two lines of JavaScript.
function hideTracks() {
for (var i = 0; i < video.textTracks.length; i++) {
video.textTracks[i].mode = 'hidden';
}
}
hideTracks();
Here we a creating a function that loops through all the text tracks available on our video (you can grab an array of all available text tracks using video.textTracks
), and sets their mode
properties to hidden
, meaning that any currently showing text tracks will be hidden (to show a text track you'd set its mode
to showing
). We then run the function to make sure we start the video in a clean state.
Next, add the following block at the bottom of your other JavaScript:
var tracksOff = document.createElement('option');
tracksOff.setAttribute('value','off');
tracksOff.textContent = 'Tracks off';
select.appendChild(tracksOff);
for (var i = 0; i < video.textTracks.length; i++) {
var curTrack = video.textTracks[i];
var addTrackOpt = document.createElement('option');
addTrackOpt.setAttribute('value',curTrack.kind + '-' + curTrack.language);
addTrackOpt.textContent = curTrack.label + ' ' + curTrack.kind;
select.appendChild(addTrackOpt);
}
select.addEventListener('change',function() {
trackChange(select.value);
});
First of all, we create an <option>
element called tracksOff
, give it a value of off
and text content of Tracks off
, and then append it to our HTML as a child of our <select>
element. This creates our 'off' option, to turn off any text tracks that are currently showing.
Then we loop through our text tracks again. This time, in each loop we store a reference to the current text track in curTrack
(to make writing subsequent code shorter), create a new <option>
element, and give it a value and text content based on the current track's kind
, language
and label
properties. We then add each <option>
element to the <select>
element.
The final part of this code adds an event lister to our <select>
element so that every time its value is changed, it runs a function called trackChange()
, passing it the current select value. We'll look at this function in the next section — for now, save and refresh, and have a look at the generated code in your browser dev tools, to help you understand what the last section of code did. It will look something like this:
<select name="select">
<option value="off">
Tracks off
</option>
<option value="subtitles-en">
English subtitles
</option>
<option value="subtitles-de">
Deutsch subtitles
</option>
<option value="subtitles-es">
Español subtitles
</option>
</select>
Now we'll add that trackChange()
function to the code and look at what it does. Add the following, just below the hideTracks()
function:
function trackChange(value) {
if(value === 'off') {
hideTracks();
} else {
hideTracks();
var splitValue = value.split('-');
for (var i = 0; i < video.textTracks.length; i++) {
if(video.textTracks[i].kind === splitValue[0]) {
if(video.textTracks[i].language === splitValue[1]) {
video.textTracks[i].mode = 'showing';
}
}
}
}
}
The argument the function takes is the value of the <select>
element after a new option has been selected in it. the first if
block checks whether that value is off
. If so, we just run the hideTracks()
function to hide any active subtitles.
If the value isn't off
, the else
block is run. First, the hideTracks()
function is run, because we don't want to have multiple tracks shown at the same time.
Next, we split the value at the "-" character, to get an array of two values — the first is the track kind
, and the second is the track language
. Remember how when we generated the select menu in the first place, we generated the values from the kind
and language
and put a "-" in the middle of them, e.g. subtitles-en
? Here we are going in the opposite direction.
Next we have a for
loop with two nested if
s. In each loop iteration, if the current text track's kind
is equal to the kind from the select value , we then test to see if the current text track's language
is equal to the language from the select value. If that's also true, then we've found the correct text track and we display it by setting its mode
to showing
.
Save your code and try it out again.
Note: This section relates to Slide 26 of the slideshow.
So far we have only added subtitles to our video, but we should keep in mind that there are other types of text tracks to consider. Subtitles are generally for the use of people who can hear the audio dialog, but can't understand the language it is spoken in. They only include the words that are being spoken and are not positioned.
Captions on the other hand are generally for the use of people who are deaf or hard of hearing. They tend to include information on who is speaking each line of dialog, and the lines are often positioned near to the character to further aid recogition of this. In addition, captions tend to include information to describe any music that plays, sound effects that occur, etc.
Open your index.html
file and add the following line below the first <track>
element:
<track label="English" kind="captions" srclang="en" src="vtt/sintel-captions-en.vtt">
Now try refreshing your example — you should now have a fourth option in your select menu — English captions. Choose this one and observe how it differs from the English subtitles.
Instead of having no subtitles selected by default, it might be nice to have the English subtitles selected by default; this is pretty easy to achieve. In the for loop that populates the select menu, add the following highlighted block:
for (var i = 0; i < video.textTracks.length; i++) {
var curTrack = video.textTracks[i];
var addTrackOpt = document.createElement('option');
addTrackOpt.setAttribute('value',curTrack.kind + '-' + curTrack.language);
addTrackOpt.textContent = curTrack.label + ' ' + curTrack.kind;
select.appendChild(addTrackOpt);
if(curTrack.language === 'en' && curTrack.kind === 'subtitles') {
addTrackOpt.setAttribute('selected','selected');
trackChange(select.value);
}
}
Here, we check whether the language is English and the track kind is "subtitles". If they both return true, we set the selected
attribute on that particular <option>
element, and run the trackChange
function to set the English subtitles to play by default.
Open up your sintel-captions-en.vtt file — you'll see that the cues in this file have some extra information available, for example:
1
00:00:18.700 --> 00:00:21.500 line:20% align:end
<c.man><b>Man</b>: This blade has a dark past. </c>
On the second line after the cue timings we can see Cue settings — optional instructions used to position where the cue text will be displayed over the video.
A setting's name and value are separated by a colon. The settings are case sensitive so use lower case as shown. There are five cue settings:, discussed in the sections below.
vertical
indicates that the text will be displayed vertically rather than horizontally, such as in some Asian languages.
vertical:rl |
writing direction is right to left |
---|---|
vertical:lr |
writing direction is left to right |
line
specifies where text appears vertically. If vertical
is set, line
specifies where text appears horizontally.
Value can be a line number:
Or value can be a percentage
vertical omitted |
vertical:rl |
vertical:lr |
|
---|---|---|---|
line:0 |
top | right | left |
line:-1 |
bottom | left | right |
line:0% |
top | right | left |
line:100% |
bottom | left | right |
position
specifies where the text will appear horizontally. If vertical
is set, position
specifies where the text will appear vertically.
vertical omitted |
vertical:rl |
vertical:lr |
|
---|---|---|---|
position:0% |
left | top | top |
position:100% |
right | bottom | bottom |
size
specifies the width of the text area. If vertical
is set, size
specifies the height of the text area.
vertical omitted |
vertical:rl |
vertical:lr |
|
---|---|---|---|
size:100% |
full width | full height | full height |
size:50% |
half width | half height | half height |
align
specifies the alignment of the text. Text is aligned within the space given by the size cue setting if it is set.
vertical omitted |
vertical:rl |
vertical:lr |
|
---|---|---|---|
align:start |
left | top | top |
align:middle |
centred horizontally | centred vertically | centred vertically |
align:end |
right | bottom | bottom |
Lets have a look at our caption example again:
1
00:00:18.700 --> 00:00:21.500 line:20% align:end
<c.man><b>Man</b>: This blade has a dark past. </c>
You can see that the text cue has some special tags marking it up. Text cues can be styled via CSS Extensions.
The ::cue
pseudo-element is the key to targetting individual text track cues for styling, as it matches any defined cue. Have a look in the style/style.css file and you'll see some simple rulesets like this:
video::cue { whitespace: pre; }
video::cue(.man) { color:yellow }
The first rule conserves whitespace on all cues, whereas subsequent rules apply different colours to the text spoken by specific characters.
There are only a handful of CSS properties that can be applied to a text cue:
color
opacity
visibility
text-decoration
text-shadow
background
shorthand propertiesoutline
shorthand propertiesfont
shorthand properties, including line-height
white-space
The different available tags are as follows:
The timestamp tag specifies a timestamp to show a subset of the cue text at a slightly later time than the start timestamp specifies. The timestamp must be greater that the cue's start timestamp, greater than any previous timestamp in the cue payload, and less than the cue's end timestamp.
The active text is the text between the timestamp and the next timestamp or to the end of the payload if there is not another timestamp in the payload. Any text before the active text in the payload is previous text. Any text beyond the active text is future text. This enables karaoke style captions.
1
00:16.500 --> 00:18.500
When the moon <00:17.500>hits your eye
1
00:00:18.500 --> 00:00:20.500
Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie
1
00:00:20.500 --> 00:00:21.500
That's <00:00:21.000>amore
The following tags require opening and closing tags (e.g. <b>text</b>
).
The class tag (<c></c>
) styles the contained text using a CSS class.
<c.classname>text</c>
The italics tag (<i></i>
) Italicizes the contained text.
<i>text</i>
The bold tag (<b></b>
) bolds the contained text.
<b>text</b>
The underline tag (<u></u>
) underlines the contained text.
<u>text</u>
The Ruby tag (<ruby></ruby>
) is used with ruby text tags to display ruby characters (i.e. small annotative characters above other characters).
<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
The ruby text tag (<rt></rt>
) is used with ruby tags to display ruby characters (i.e. small annotative characters above other characters).
<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
Similar to class tag, the voice tag (<v></v>
) is also used to style the contained text using CSS.
<v Bob>text</v>
That rounds off our tutorial on text tracks. If you look in the final directory, you'll find a slightly improved version that contains a simple JavaScript library, captionator, which adds in a JavaScripted version of the necessary APIs to allow the example to work in older browsers.
Note: There are a few services available for speeding up the process of developing subtitles/captions, but we'd recommend checking out Amara — UniversalSubtitles.