How to create captions/subtitles for video and audio in WebVTT, SRT, DFXP format

Updated: 15 March 2019. This tutorial shows how easy it is to create captions/subtitles files in WebVTT, SRT and DFXP format for subtitling in videos and audios. Depending on the media player you use, you can even provide subtitles in different languages for the same media. Subtitle files are basically text files you can write in Notepad(Win) or SimpleText(Mac).

Don’t use a text editor that permits formatting (changing the appearance of text) because the files will contain hidden information that prevents the subtitles from appearing in your video/audio. It has to be in the UFT-8 format, which both text editors mentioned above provide.
For Mac users: although you can format text in SimpleText, don’t do it!

How do we start?

The ideal would be to start from a scenario in which the spoken text is already written. Then you just need to extract the spoken text and place it in your simple text editor and afterward adapt it to the format of your choice.
However, many people create videos in which they simply talk in a natural way about a subject. In this case, you will have to play your video or audio and write down everything that is said, preferably dividing it in short sentences, because you want to avoid many lines of text. In this first take, you do not need to worry about the time, we cover that further up.

Here is an example of text in its initial state:

Lost Corners consists of charcoal paintings with pastel on paper and canvas.
The series shows landmarks, places and objects which we are so used to that we do not really see them anymore.
We would only notice if they would disappear, when it is too late.
Most of the charcoal paintings have a desolate atmosphere, enhanced by the limited range of color, consisting mostly of tinted grays, with the deep black of charcoal.
While showing rather inactive scenery with no humans in sight, there is nevertheless a suggestion of life.
Somehow these places look as if action can start any minute.
Soon a car might drive by…
a crane  starts working…
a railway bridge may tumble… and click into place.
However, some of the artwork do contain people
Etc…

When you have created a list with spoken text like that, it is time to get the time for each line of text.

Timing and text

It will be beneficial to study how subtitling is implemented on DVDs. In this way, you get a feel how to cut your text up in piecemeal chunks that can be read quickly as the video or audio progresses. Writing a subtitle file is easy, but getting the time right requires trial and error. After all, subtitling is actually a profession on its own.  TV stations and Movie producers hire these professionals to do the work, so you best learn from them by studying actual cases.

For instance, line 4 on my example above is rather long.  It requires 3 lines of text which impose over the video.  Not only does it take time to read, it also occupies a lot of video space:

Ideally, it would be better to cut this up in four parts, like this:

Most of the charcoal paintings have
a desolate atmosphere,

enhanced by the limited range of color,

consisting mostly of tinted grays,

with the deep black of charcoal.

But this is not always possible.  It depends on the speed of the speaker, whether you have time to allow cutting it up. After all, you cannot replace a line of text with another before the viewer had a chance to read it. In practice it is often a trade-off between the time you can leave a line of text exposed and its length.  It is obvious that the longer a line of text is, the more time it has to remain on screen.

According to DCMP, lines shouldn’t go beyond 32 characters, but in practice this is often impossible without shortening the text, i.e.: making it different from what is actually said. There is in principle nothing wrong with that, as long as you don’t change the sense of what is being said. Quite a few people take the liberty to create a summary of long phrases, which makes it more agreeable to follow.

When you have cut up your text in readable chunks, it is time to write down the time. Any subtitling format, whether it is WebVTT, SRT or DFXP requires two settings:

  1. The time a text line(caption) needs to appear
  2. The time it ends showing up

To find out what needs to show up and when, can be found out by playing your video or audio and watch the clock in the controlbar of your video player. Most video players have this feature. If you do not have a video player like that, you may want to download QuickTime from Apple.  It is available for Windows and Mac.

Play your media and jot down the times the texts have to appear and when to end. You should end up with something like this:

00:1400:20.5
Lost Corners consists of charcoal paintings
with pastel on paper and canvas.

00:21.200:26.5
The series shows landmarks, places and
objects which we are so used to

00:26.500:27.5
that we do not really see them anymore.

00:27.800:31
We would only notice if they would disappear,
when it is too late.

Note that you can divide seconds into miliseconds. How this is implemented in the subtitle file depends on the format you select. Some formats allow for milliseconds, others 1/10 of a second. Below you find how to translate your list into the three formats, using the example above:

WebVTT example

This format allows for time notation of hours, minutes, seconds and milliseconds, respectively use like this:

00:00:00.000 where the 3 zeros at the end are the milliseconds.

WEBVTT

00:00:14.000 –> 00:00:20.500
Lost Corners consists of charcoal paintings
with pastel on paper and canvas.

00:00:21.000 –> 00:00:26.500
The series shows landmarks, places and objects
which we are so used to

00:00:26.600 –> 00:00:27.500
that we do not really see them anymore.

00:00:27.800 –> 00:00:31.250
We would only notice if they would disappear,
when it is too late.

When you are finished, save the file with a .vtt extension, like mycaption.vtt

SRT example

Basically the same setup as for WebVTT except that you need to add a number before each caption:

1
00:00:14,000 –> 00:00:20,500
Lost Corners consists of charcoal paintings
with pastel on paper and canvas.

2
00:00:21.000 –> 00:00:26,500

The series shows landmarks, places and objects
which we are so used to

3
00:00:26.600 –> 00:00:27,500
that we do not really see them anymore.

4
00:00:27.800 –> 00:00:31,250

We would only notice if they would disappear,
when it is too late.

When you are finished, save the file with a .srt extension, like mycaption.srt

DFXP example

This format is more complicated than WebVTT and SRT.  It requires the type declaration of the file, but you can simply copy the example below and adapt it.
This format allows for time notation of hours, minutes, seconds and 1/10 of a second, respectively use like this:

<p begin=”00:14.1″ end=”00:20.5″> where the single number after the dot represents the 10th of seconds.

<tt xmlns=”http://www.w3.org/2006/10/ttaf1″>
<body>
<div>
<p begin=”00:14″ end=”00:20.5″>Lost Corners consists of charcoal paintings with pastel on paper and canvas.</p>
<p begin=”00:21.2″ end=”00:27.5″>The series shows landmarks, places and objects which we are so used to that we do not really see them anymore.</p>
<p begin=”00:27.8″ end=”00:31.3″>We would only notice if they would disappear, when it is too late.</p>
<p begin=”00:35″ end=”00:45″>Most of the charcoal paintings have a desolate atmosphere, enhanced by</p>
</div>
</body>
</tt>

When you are finished, save the file with an .dfxp extension, like mycaption.dfxp, but you also may use mycaption.xml since this basically is a XML file.

Test your captions/subtitles

It will take some tweaking to get those times right, therefore you need to test. If you do not have a media player for your site yet, you may want to download FlowPlayer or JW Player. For audios, you best try JW Player as it allows setting the height of the audio to leave room for captions. See this tutorial how to use JW player:
Embedding an audio with poster image, watermark and subtitles, using JW Player 5.10

NOTE: From JW Player 7.4 onward, you need to use the .VTT format because of iPad. SRT gives unrespected results in full screen.

An alternative is YouTube if you have a video, that is. When you upload your video to YouTube, you have the chance to add a subtitle file.  That way you can check whether the times are correct or not. If it needs tweaking, and trust me, it will, adapt the file and test again until you have it right.
Be patient, in the beginning it takes a lot of time to get used how it works, but after a while, you get a feeling for timing and then it becomes easier.

See also How to fix subtitle problems in foreign languages to troubleshoot typical display errors.

26 thoughts on “How to create captions/subtitles for video and audio in WebVTT, SRT, DFXP format”

  1. Thank you. I love how you explain this. Do you know any good tools to take the srt and make it a dfxp or do I have to write my own program to do that?

    Reply
    • Hi Miriam, So far I did not find anything that did a good job. Doesn’t mean it does not exist, but no open source at this time of writing.

      Reply
  2. hi there, been trying to find a free software for me to edit the text in mac… any suggestions?

    thank you!

    Reply
  3. I teach an ESL class and am creating my own subtitles for a video we will watch in our next class. I didn’t realize that the subtitles (SRT format) would be gone from the notepad the next day. I saved everything and there’s an SRT subtitle file that has 1/3 of the video subtitles finished, so that didn’t get erased, but now I’m concerned about the next subtitles. I can’t remember what number I was on. Can I start over from 1?

    Reply
    • Hi Mary, I don’t think you need to start all over. What I would do is this:
      1. Open the subtitles file.
      2. Play the video and stop at the last sentence that was subtitled.
      3. From there, you can add new text in the the subtitle file, sentence by sentence as the video goes.
      I hope this helps?

      Reply
  4. Hi Rudolf,
    How can I automatically convert this format to srt

    From: 00:00:20.25 –> 00:02:10.25
    To: 00:00:20,250 –> 00:02:10,250 (srt)

    Ideally I don’t want to do this manually, therefore any advice you can give, it would be much appreciated. Thank you.

    Reply
  5. Hi Linda,

    You could try http://subtitleconverter.net/welcome.jsp
    You select the format of your existing file, then upload it and select the srt option.
    The only error it makes is that it glues everything together, so you need to add line breaks afterward, but that is easy compared with re-doing everything by hand. I hope that helps?

    Reply
  6. Thank you. I have tried it, but it seems to be an error loading issue in the browser. It cannot be converted.
    I will try again later. Thank you for your help.

    Reply
  7. Hi I uploaded a video on http://subtitle-horse.com/ and the I added subtitles to it but when i had to save it, they give me the option to save it in VTT, SBV, timedText, SRT and Encore . I just wanted to add subtitle to the video an download it in mobile format, but I dont understand all these formats :/

    Reply
    • Hi Manny,

      You can use either VTT or SRT, but it depends player you are using. If JW Player 6 or 7, best use VTT, otherwise use SRT.

      Reply
  8. Captions shouldn’t really go beyond 32 characters per line. In addition, you should pay attention to the grammar involved, e.g., try not to break your preposition phrases up so that they are split between different lines or captions. Shorter captions and well crafted caption lines well help to make your captions easily readable. Check out the Captioning Key guidelines – http://captioningkey.org to learn more about creating good captions.

    Reply
  9. I have a question about the statement “Depending on the media player you use, you can even provide subtitles in different languages for the same media.” Does that mean that I can write the captions in English and it will automatically translate the text to a specified language? I’m trying to figure out if I will manually have to translate all of the spoken text myself?

    Reply
    • Hi Michael,
      No, it means that you can upload subtitles in various languages and link to them (with for each language a separate file). They need to be created by you, there is no player at this time of writing that automatically translates subtitles.

      Reply
  10. Hi,
    I’d like to know if it’s possible to do this with Textedit (Mac), let’s say I tried but didn’t work out…the problem is that I don’t know what I’m doing wrong

    Reply
    • Hi Lila,
      Sometimes you get indeed invisible formatting with TextEdit (current version). Good you could solve the issue with Atom. 🙂

      Reply
  11. I extract Cc text from .ts files using CCExtractor and /Clumpco.I check them with Sub Title Edit. Quite regularly I find errors and and errant , usually at the end of a line.
    I am curious as to
    a) how do these errors occur?
    b) how can these errors be quickly eradicated?
    Thanks for any help

    Reply
    • Hi Genevieve,
      Can you show an example of such an error? And are they always the same errors or does it vary?
      When there are repetitive errors, you can remove them easily with the Find and Replace function in your text editor (Notepad.exe, SimpleText, … Not MS Word).

      Reply
  12. Rudolf, Ta for respose:
    is a common and repetitive error
    a single at the end of a line is less common.

    Are these errors related to colour changes?

    Re editting them I use Subtitle edit for the job…..but the single escapes scrutiny.
    Genevieve

    Reply
  13. Hi again Genevieve,
    I presume that it has probably something to do with contrast between text and background, yes.
    But I never extracted hardcoded subtitles from a video myself, so this is only an educated guess.
    (The character you mean seems to be deleted by the comment box)
    As suggested earlier, you can use the find and replace function afterwards in a text editor that doesn’t do any character formatting.
    Wouldn’t that work for you?

    Reply
  14. Hi, I’m writing this in 2018.

    The format for the WEBVTT above needs slightly tweaking to work.

    E.g.

    WEBVTT

    00:00.000 –> 00:03.000
    It’s 2018

    00:03.000 –> 00:07.000

    Hope this helps 🙂

    PS make sure save file as .vtt file type – dont just change its name.

    Reply

Leave a Reply to Gaz Cancel reply