Collectively, these XML-based caption formats have gained significant traction within established industries such as broadcast television and film. Despite the successful adoption within these domains, support for the standard in the web community has been mixed, and XML standards for captions and subtitles are currently competing with WebVTT, a new standard adopted by HTML5.
The best way to understand how we wound up with so many different flavors of XML subtitles/captions is to review the history of how the standards were developed.
Captions for broadcast video have been around since 1970. As media presentation methods have evolved, so have caption formats. The emergence of web video accelerated this evolution exponentially. By the turn of the century, there were dozens of caption formats, and no de facto standard had emerged. It seemed that each video playback or editing application had its own new way of storing captions.
It seems counter-intuitive that the simple task of displaying text on a screen at a particular time would give rise to so many formats. However, the wide-openness of the web fostered unbridled innovation, and this simple requirement quickly evolved as new features were introduced, such as adding positioning and layout information, styling, animation, metadata about the video and its content, hyperlinks, etc. The result—a mess of incompatible formats.
In early 2003, the World Wide Web Consortium (W3C) finally recognized and addressed the problem. The Timed Text Working Group (TTWG) was chartered with the mission of developing an XML-based format used for representing streamable text synchronized with some other timed media, like audio and video. It was designed to incorporate of all functionality of existing formats, and therefore, become the standard interchange format between applications.
The TTWG realized that the standard needed to address the requirements of three major groups:
The group foresaw that one standard would not be sufficient to satisfy the needs of all these groups, so they created a framework that allowed multiple standards to be built on top of one base standard. The base was called Timed Text Markup Language (TTML). It defined a set of features needed for captioning. Standards could be defined that incorporated groups of base features. These are called Profiles.
To address the needs of web developers, three profiles were defined: DFXP Presentation, DFXP transform, and DFXP full. DFXP stands for “Distribution Format Exchange Profile.” The DFXP Presentation profile is used by video players to render captions, and the DFXP Transform profile is used in video editing to convert from/to other caption formats. The DFXP Full profile includes all the features defined in the base standard.
The Society of Motion Pictures and Television Engineers (SMPTE) is an international organization established with a goal of advancing moving-imagery education and engineering across the communications, technology, media, and entertainment industries, i.e. the TV and film industries. Part of their charter is to develop standards for all aspects of motion-imaging, ensuring that content is seen and heard in the highest possible quality on any display screen. As such, they took a keen interest in the new TTML standard produced by the TTWG.
SMPTE concluded that the TTML standard addressed many of the needs of the TV and film industries, but the feature set is not sufficient to address all of those needs. In particular, it lacked:
Thankfully, rather than create a new standard, the SMPTE decided to extend the TTML standard to meet their requirements. They did so by defining three new features/extensions: #image for bitmaps, #data for binary data, and #information for presentation mode. They also created a new profile called SMPTE-TT, which encompassed the DFXP Full profile as well as the new extensions.
TTML is gaining international traction among large, established players in the broadcast, film, and web industries.
As mentioned above, The Society of Motion Picture and Television Engineers (SMPTE) uses TTML as the basis for SMPTE-TT. Following the enactment of the U.S. Federal Communications Commission’s (FCC) 21st Century Communications and Accessibility Act (CVAA) in October 2010, the FCC designated SMPTE-TT as “safe harbor interchange and delivery format” for online captioning. The SMPTE also recently introduced a new profile, Internet Media Subtitles and Captions 1.0 (IMSC1), intended for use as an interchange format across subtitle and caption delivery applications worldwide.
Other broadcasting organizations have followed suit. The European Broadcasting Union (EBU) created the EBU-TT profile. The Japanese Association of Radio Industries and Businesses (ARIB) created the ARIM-TTML profile. The Digital Entertainment Content Ecosystem defined a profile of SMPTE-TT to deliver captions and subtitles in the UltraViolet™ digital media format.
Live broadcasters such as FOX, CNN, ABC, CBS, NBC, and PBS have also adopted TTML-based captions for rebroadcast and simulcast applications over the Internet.
TTML has seen broad adoption among major web companies, as well. Microsoft uses DFXP for Silverlight, Expression Studio, and other media streaming technologies and tools. The company has also proposed a “Simple Delivery Profile for Closed Captions (US)” with the goal of establishing a minimum level of interoperability between TTML and legacy caption formats employed in US markets, such as CEA608 and CEA708.
Adobe was one of the early adopters of TTML. Beginning in 2007, the company has implemented support for DFXP throughout its product line, including Flash, Premiere Pro, Adobe Connect, Adobe TV, Open Source Media Framework (OSMF), and Adobe Media Server.
In addition, FlowPlayer, Panopto, VLC, JW Player, and Subtitle Edit all support TTML.
Most major video hosting and streaming delivery platforms support some form of TTML, generally DFXP. These companies include: Brightcove, Limelight Networks, Ooyala, Kaltura (MediaSpace), and Akamai.
Streaming portals such as YouTube, Yahoo, AOL, Vimeo, Dailymotion, and YouView all support the DFXP caption format. Major streaming video services such as Netflix and Amazon Video require captions to be submitted in TTML-based formats as well.
If your project requires a specialized version of XML-based captions, please contact us, and we’d be happy to discuss it with you.
Learn more about Speechpad’s XML caption formats.