SMPTE-TT (Society of Motion Picture and Television Engineers Timed Text)

SMPTE-TT is a standard for XML captions and subtitles, created by the Society of Motion Pictures and Television Engineers (SMPTE). It is largely based on TTML, but also includes some additional features that TTML did not support.

Since the SMPTE charter includes standards development, the group closely followed the development of the TTML, which was created by a working group within the World Wide Web Consortium (W3C).

The SMPTE ultimately concluded that the TTML standard addressed many of the needs of the TV and film industries, but the feature set is not sufficient to address all of those needs. In particular, it lacked support for bitmap images (needed for certain European caption formats), binary payloads (needed to support the existing CEA-608 and CEA-708 standard for live broadcast), and the ability to adapt the presentation style based on the playback device.

To accommodate these additional requirements, the SMPTE extended the TTML standard to meet their requirements. The new standard, called SMPTE-TT, encompasses all of the capabilities and requirements of TTML as well as the new extensions.

SMTPE-TT is popular in the broadcast television industry, because the FCC designated it as a “safe harbor interchange and delivery format” for online captioning. SMPTE-TT files end with the file extension .xml.

More Information

To avoid duplication, this page only discusses the SMPTE-TT specific extensions to TTML, but not TTML itself. To learn more about TTML or XML captions in general, click one of the following links:

SMPTE-TT’s Extensions to TTML

SMPTE-TT adds three extensions to TTML:

  • Features for bitmap images (needed for certain European caption formats).
  • The ability to carry binary data (needed to support the existing CEA-708 standard for live broadcast).
  • The ability to render the caption in multiple ways. The SMPTE wanted to preserve the look & feel of legacy captions in some cases, and in other cases, to take full advantage of the enhanced presentation features afforded by TTML. Therefore, a new feature is needed to tell which mode is in use.

Each of these three features is discussed in detail below.

Bitmap Images

In order to support certain bitmap-based subtitle formats, such as Digital Video Broadcasting (DVB) Subtitles, SMPTE-TT provides the ability to define bitmap images. The <smpte:image> element is used to define pre-rendered images which can be displayed at any particular time through the smpte:backgroundImage style attribute. The <smpte:image> elements are nested inside of TTML <metadata> elements as shown below:

<head>
  <metadata>
    <smpte:image xml:id="SMPTE_logo16" imagetype="PNG" encoding="Base64">
      iVBORw0KGgoAAAANSUhEUgAAAGQAAABHCAMAAADGBBL+AAAAGXRFWHRTb
      ...
    </smpte:image>
  </metadata>
...
</head>

In the example above, a PNG image with the ID “SMPTE_logo16” is defined inline. The imagetype attribute defines bitmap format. Currently, PNG is the only supported format. The encoding attribute specifies how the binary data of the bitmap is encoded into text. In this case, Base64 is the encoding scheme. The bitmap data is in between the opening and closing <smpte:image> tags.

Once an image is defined, it can be referenced from within the body:

<body>
  <div region="logoArea" smpte:backgroundImage="#SMPTE_logo16"/>
  <div>
     <p>SMPTE Logo</p>
  </div>
</body>

External images can also be used:

<body>
  <div>
    region="logoArea"
    smpte:backgroundImage="http://www.smpte.org/sites/default/files/favicon.png"
  </div>
  <div>
    <p>SMPTE Logo</p>
  </div>
</body>

Binary Data

Some legacy caption formats, such as CEA-608 and CEA-708, use a binary encoding scheme. That is, they use computer codes that are not human-readable. Since SMPTE-TT is an XML-based format, it cannot contain pure binary data. There are two ways to solve this problem:

  1. Decode the format and translate it into the equivalent semantic elements in SMPTE-TT.
  2. “Tunnel” the untranslated data through SMPTE-TT by re-encoding the binary data into a textual representation suitable for inclusion into XML.

The first method provides more flexibility, in that the resulting caption file can be used with modern rendering software/devices. However, the second method may be necessary in cases where the ultimate destination is a legacy endpoint (e.g. a television) that does not support TTML.

Binary data is carried using an <smpt:data> element nested in a TTML <metadata> element. For example:

<metadata>
  <smpte:data 
    encoding="BASE64" 
    datatype="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt#cea608" 
    begin='00:00:02:05'>
    lCCAgJSugICRWICAkTeAgJQvgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
    gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA
    ...
  </smpte:data >
</metadata>

One benefit of encapsulating the <smpte:data> element inside of a TTML <metadata> element is that it will be successfully parsed and ignored by TTML processors that do not support SMPTE-TT.

In the above example, the encoding attribute specifies that BASE65 is the scheme by which the binary data is re-encoded into a textual format. BASE64 is the most common encoding scheme for this purpose. The datatype attribute specifies the original format of the binary payload, in this case CEA-608. The begin attribute specifies the beginning presentation time, relative to frame 0, for which the encoded payload (data) applies. In other words, this data applies to the section of the video beginning at two minutes and five seconds. The data itself is placed between the opening and closing <smpte:data> tags.

Translation Modes

SMPTE-TT caption files can either be created directly or translated from an existing source (legacy format). The remainder of this discussion deals with the latter, and in particular, the way in which functionality is preserved when translating from legacy formats to SMPTE-TT.

In order for users to correctly interpret captions that have been translated from a legacy format, the display features, timing, layout, and styling of the original format should be preserved as much as possible, especially where such features convey meaning to the viewer.

In some cases, it may be desirable to preserve the original presentation and render the captions exactly as they would be seen in the legacy format, while in other cases, the preferred approach may be to replace legacy features with the equivalent new presentation capabilities afforded by the underlying TTML standard. In either case, the intent of the legacy features is preserved.

In order to accommodate either presentation style, the standard defines two modes of translation: Preserved and Enhanced. The translation mode is recorded in a new element, <smpte:information> (see below). Each translated SMPTE-TT file must specify one of these two modes to be used exclusively for the entire document.

Preserved Mode

If Preserved mode is specified, that means that during the translation process, all legacy features (e.g., CEA-608 pop-on, roll-up, etc.) were converted such that
the resulting TTML rendering is as accurate a representation of the original rendering as possible, given the constraints of the formats. If any presentation features of the original file could not be translated exactly into an SMPTE-TT
feature, this must be indicated in the header of the file using the <smpte:information> element.

Enhanced Mode

If Enhanced mode is specified, that means that during the translation process, legacy features were translated in a way that they convey the semantic information of the legacy data, but may differ from the exact presentation. For example, fonts, coloring, and certain visual effects may not have been duplicated. However, consistency of styling must be maintained. In other words, all content styled the same way in the original source must be styled the same way in the translated document, although the two styles may differ.

In Enhanced mode, the exact positioning of captions was not necessarily preserved, but may have been mapped to approximately equivalent areas (e.g., upper third, lower third). Furthermore, the exact timing may not have been preserved either. In the case of CEA-608 (broadcast TV), each frame may contain up to two caption letters. An exact replication of the rendering would add the letters to the screen, frame by frame, consistent with how they would have been rendered in the original source. However, in Enhanced mode, this rendering does not need to be preserved, and the letters can be aggregated into caption groups, so long as each caption group appears on the screen at the time the first letter in that group would have appeared on the screen in the original source.

The <smpte:information> Element

Only one <smpte:information> element can be present, and it must be a child of the <head> element. The following predefined attributes should be specified:

  • origin—Specifies a unique URI that identifies the source format. For example, “http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt#cea608” would indicate that the source format was CEA-608. If the file is original work (e.g. not translated from any legacy format), the value “NONE” should be specified. Otherwise, a specific value must be specified.
  • mode—The Translation Mode (discussed above). Must be “Preserved” or “Enhanced.”
  • threshold—Indicates the minimum duration in seconds of events/effects in the source that can be ignored. In other words, it’s the duration of a “low pass filter” used by the translation software to ignore temporary states.

Other attributes can be used so long as they are prefixed by a defined namespace. For example:

<smpte:information
  xmlns:m608="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt#cea608
  origin="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt#cea608"
  mode="Preserved"
  m608:channel="CC1"
  m608:programName="Six O'clock News"
  m608:captionService="F1C1CC"
/>

In the above example, captions from a television newscast were translated from CEA-608, and the presentation mode was preserved, so that the captions will appear exactly as they did on TV.

Example

The following video shows an example of what you would get if you ordered Speechpad’s Standard Captions. After you begin playing the video, click the “CC” on the video player to turn the captions on. The text box below the video shows you the SMPTE-TT file for those same captions. SMPTE-TT is just one of many formats you can download once the captions have been created. You could then use the SMPTE-TT file to allow various players and video hosting services to present captions with your video (see compatibility list below).

<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/ns/ttml"
	xmlns:smpte="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt"
	xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
	xmlns:tts="http://www.w3.org/ns/ttml#styling"
	xmlns:ttp="http://www.w3.org/ns/ttml#parameter"
	xmlns:m608="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt#cea608"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt http://www.smpte-ra.org/schemas/2052-1/2010/smpte-tt.xsd"
	xml:lang="en" ttp:frameRate="24" ttp:frameRateMultiplier="1000 1001" ttp:timeBase="media">
	<head>
		<metadata>
			<title/>
			<language>en_US</language>
			<region>US</region>
			<guid/>
			<emailid>support@speechpad.com</emailid>
		</metadata>
	<styling>
		<style xml:id="basic"
			tts:color="white"
			tts:backgroundColor="black"
			tts:fontFamily="monospace"
			tts:fontSize="20%"
			tts:lineHeight="8%"
			tts:fontWeight="normal" />
	</styling>
	</head>
	<body style="basic">
		<div>
			<p begin="00:00:03:10" end="00:00:06:04">In this lesson, we're going to<br />be talking about finance. And</p>
			<p begin="00:00:06:04" end="00:00:10:00">one of the most important aspects<br />of finance is interest.</p>
			<p begin="00:00:10:00" end="00:00:13:16">When I go to a bank or some<br />other lending institution</p>
			<p begin="00:00:13:16" end="00:00:17:17">to borrow money, the bank is happy<br />to give me that money. But then I'm</p>
			<p begin="00:00:17:22" end="00:00:21:12">going to be paying the bank for the<br />privilege of using their money. And that</p>
			<p begin="00:00:21:16" end="00:00:26:11">amount of money that I pay the bank is<br />called interest. Likewise, if I put money</p>
			<p begin="00:00:26:15" end="00:00:31:05">in a savings account or I purchase a<br />certificate of deposit, the bank just</p>
			<p begin="00:00:31:07" end="00:00:35:19">doesn't put my money in a little box<br />and leave it there until later. They take</p>
			<p begin="00:00:35:19" end="00:00:40:20">my money and lend it to someone<br />else. So they are using my money.</p>
			<p begin="00:00:40:20" end="00:00:44:10">The bank has to pay me for the privilege<br />of using my money.</p>
			<p begin="00:00:44:10" end="00:00:48:17">Now what makes banks<br />profitable is the rate</p>
			<p begin="00:00:48:17" end="00:00:53:08">that they charge people to use the bank's<br />money is higher than the rate that they</p>
			<p begin="00:00:53:12" end="00:01:00:17">pay people like me to use my money. The<br />amount of interest that a person pays or</p>
			<p begin="00:01:00:19" end="00:01:06:15">earns is dependent on three things. It's<br />dependent on how much money is involved.</p>
			<p begin="00:01:06:20" end="00:01:11:07">It's dependent upon the rate of interest<br />being paid or the rate of interest being</p>
			<p begin="00:01:11:12" end="00:01:17:22">charged. And it's also dependent upon<br />how much time is involved. If I have</p>
			<p begin="00:01:17:22" end="00:01:22:18">a loan and I want to decrease the amount<br />of interest that I'm going to pay, then</p>
			<p begin="00:01:22:19" end="00:01:28:01">I'm either going to have to decrease how<br />much money I borrow, I'm going to have</p>
			<p begin="00:01:28:05" end="00:01:32:10">to borrow the money over a shorter period<br />of time, or I'm going to have to find a</p>
			<p begin="00:01:32:14" end="00:01:37:07">lending institution that charges a lower<br />interest rate. On the other hand, if I</p>
			<p begin="00:01:37:07" end="00:01:41:12">want to earn more interest on my<br />investment, I'm going to have to invest</p>
			<p begin="00:01:41:12" end="00:01:46:21">more money, leave the money in the<br />account for a longer period of time, or</p>
			<p begin="00:01:46:21" end="00:01:49:23">find an institution that will pay<br />me a higher interest rate.</p>
		</div>
	</body>
</tt>

Compatibility

The SMPTE-TT file format is supported by video players, streaming platforms, authoring tools, editing software, including:

video players, streaming platforms, authoring tools, editing software, including:

  • Google YouTube (CEA-608 support only)
  • Netflix
  • Amazon Video
  • Crackle
  • Microsoft Media Platform’s Player Framework
  • Yahoo
  • AOL
  • Brightcove
  • Open DCP
  • Adobe Premiere
  • Open Source Media Framework (OSMF)
  • Apple HTTP Live Streaming (HLS)
  • Flowplayer
  • SubtitlePlus
  • Subtitle Edit

Speechpad Supports SMPTE-TT Captions

SMPTE-TT captions are available with either of Speechpad’s captioning services: Standard Captions or Premium Captions.

If your project requires a specialized version of SMPTE-TT captions or any other caption format, please contact us, and we’d be happy to assist you.

Learn more about Speechpad’s other XML-based caption formats.