Title: Adding Media Overlays Source: EPUB Zone Published: January 2015.
Adding media overlays is a great feature for children’s ebooks, because it highlights words as they are narrated. Although the process can be tedious, it is fairly straightforward and involves three types of files:
For an overview of media overlays, read EPUBZone’s “EPUB 3 Media Overlays.”
The SMIL files are the key, because they break up the audio narration times, signaling when to highlight which words.
But before working on the SMIL files, it’s best to get the audio narration times. The easiest way to do this is by using the free audio editor and recorder, Audacity. After downloading the program, you can either record your own audio or upload a previously recorded audio file.
For this post, I’m using a children’s book I wrote and narrated as an example, called Apple’s Adventures. The ebook already has all the images, text, fonts, and styles in place.
The first step in Audacity is to label each word. Using the Label Track in Audacity is the quickest way to get the begin and end times for each word in the story. To add the Label Track, click on Tracks<Add New<Label Track.
The Label Track will appear below the audio. Press play, and add a label at the beginning of every word. You only need to worry about the beginning of each word, because you can use the begin time of a new word for the end time of the previous word. One keyboard shortcut you can use to add labels is Ctrl B (for PC) or Cmd B (for Mac).
Label each word. To keep things simple, I used p#w#, where p is the page number and w is the word number.
After you finish adding labels to all the words, you will want to export the Label Track.
Audacity will export all the start and end times associated with the labels to a txt file. You may notice that the start and end times are the same. For the SMIL files, you will want to use the start time for p1w1 to begin the audio for the first word and the start time for p1w2 to end the audio for the first word. Then use the start time for p1w2 to begin the audio for the second word and the start time for p1w3 to end the audio for the second word, and so on.
Now that all the prep work is done, you will need to open a text editor, such as TextWrangler to work on the SMIL, X/HTML, and OPF files.
There should be a SMIL file for each page with narration. Each SMIL file will contain the following heading and close with the </body> and </smil> tags:
In the body tags, insert the following for each word on the page:
<par id=”id1“><text src=”../page01.xhtml#p1w1“/><audio clipBegin=”7.358141” clipEnd=”7.905533” src=”../Audio/Apples_Adventures.mp3“/></par>
Note that audio files can also be mp4.
Everything in red will need to be changed to convey the information in your book. Below is an example of how to implement the code for a whole page:
The reading system will use the OEBPS folder as the root folder, and move forward to subfolders from there. Therefore, to make sure all files are connected and will work smoothly in the EPUB, it’s easiest to have all the XHTML or HTML files loose in the OEBPS folder. The audio file can be in a subfolder of the OEBPS folder. In this case, I’ve put it in a folder called Audio. Syntax is important, so I made sure to keep the capitalization when referencing it in my SMIL files.
For longer works, where you want to highlight full paragraphs, it’s helpful to also use the seq element to help represent structures such as sections or lists. The seq element has media that is rendered sequentially. But since this is a children’s book, I’ve decided to only use the par element, which has media that is rendered in parallel.
The same par id can be used in multiple SMIL documents, but it must be unique within each document. The fragment identifier (after the # in the text src line) however, must always be unique. In this case, I used the same p#w# format for the fragment identifiers.
The clipBegin and clipEnd times come from the Audacity Label Track. Again, these times signal the time it takes to read each word. Breaking it up this way allows the highlighting of individual words.
Next, you will need to add some code to each XHTML or HTML file that displays words. The code around each word should look like this (anything highlighted in red will need to be changed):
The span tag relates to the specific times for each word—which is why it’s important to have unique identifiers for each in the SMIL files.
Here is an example of what an entire XHTML page would look like:
Last, you will need to add all this information to your OPF file, or else none of this will work. The OPF states which files are included.
If you’d like, you can add extra information in the metadata section:
<meta property=”media:narrator”>Sabrina Ricci</meta>
The first line refers to the length of the entire audio narration. The second line is the name of the narrator, and the third defines the name of the class that dictates the color of the highlighted word. You could also include more granular metadata by specifying the media:duration of each audio segment. To do that, use
<meta property=”media:duration” refines=”#audio1″>0:00:08</meta>
You will also need to add media-overlay=”audio1″ to the end of each reference to an X/HTML page that contains narration. You can change “audio1” to any name, but make sure to use the same name for the SMIL id. Also make sure all the syntax is correct (capitalizations match, etc.). See below for examples of what the manifest looks like when referencing all the audio narration elements.
As a fun bonus, you can choose which color the text should change to as it’s highlighted. I believe the default color is blue, but you can change it to whatever you’d like. Here is some CSS code that will change the highlighted words to red:
You can change the red text to another HTML color.
The current version of epubcheck is 3.0.1, which you can find at https://code.google.com/p/epubcheck/.
You can also use the EPUB Validator at IDPF if your epub is under 10 MB.
If, after building your epub, you run it through epubcheck and find a long list of errors, don’t worry. There is some support for Media Overlays, but you may still see some errors for fallbacks.
A new epubcheck is under development at GitHub (https://github.com/idpf/epubcheck), but it is currently in alpha stage.
If you sideload the epub to your iPad or other reading system that supports media overlays, you will be able to play your narration: