Caption This: Why Subtitling Is Big Business Amid the Content Boom

While closed captioning was once a niche service used mostly by hard-of-hearing viewers, subtitles have seen an explosion in the streaming age; deaf-led charity Stagetext found in a 2021 survey that 80 percent of 18- to 24-year-olds use subtitles some or all of the time when watching TV on any device, and only 10 percent of those surveyed were deaf or hard of hearing. It also found that only 23 percent of those in the 56-75 age group use captions, despite a higher rate of hearing loss.

This cultural shift has coincided with there being more to watch than ever before, particularly with content that is streamed around the world rather than just broadcast in the U.S. That’s created a massive task for the companies behind film and TV captioning. “Everything has changed in the past 10 years,” says Heather York, vp marketing and government affairs for Vitac, the largest captioning company in North America.

More from The Hollywood Reporter

The business itself is one of those things; studios and streamers rarely have their own in-house subtitlers, so they outsource the work to third-party companies — fueling a captioning services market in the U.S. valued at nearly $170 million, with transcription as high as $30 billion and a compound annual growth rate of 7 percent. Some studios are hands-on in the captioning process, and others ask producers to deliver content to them with subtitling already complete. And while a handful of the major Hollywood captioning companies have highly trained employees on staff, the industry these days is mostly made up of freelance subtitlers across the globe.

Companies like Rev are leading that charge, with 70,000 to 75,000 international freelancers actively doing transcription work each year. Anyone can apply, with the workforce ranging from “college students through stay-at-home parents, or people who have been in the industry for years, up to retirees,” says Rev vp operations Pat Krouse. “It gives everyone that flexibility to come in and say, ‘I want to just do this for some side spending money that supplements my income and I’m just going to do a few jobs till I hit that number.’ Versus there are other people who say, ‘I’m really good at this and I want to treat this as a full-time job, but where I’m my own boss.’ “

Rev pays per minute of output with varying rates based on difficulty; it offers compensation of $0.54 to $1.10 for each minute of runtime for captioning and $1.50 to $3 a minute for foreign-language subtitlers, paying a higher rate for complex jobs like those with multiple speakers talking over one another or those dealing with complicated topics like legal or medical content, where the language can be very specific and requires prior knowledge. The company also lets freelancers select content that is particularly interesting or relevant to them. Says Krouse, “We have people, for example, who love doing documentaries because they’re like, ‘I feel like I’m learning something and I’m getting paid to do it,’ ” or “people who may love reality TV and that’s their guilty pleasure. They can do the unscripted shows and get to watch from there.”

Several other major Hollywood caption companies also embrace the freelance model, particularly for non-English subtitling and translation, but at Vitac — whose clients include Fox, CNBC and Bravo — English-language employees are kept on staff and put through rigorous training. York says training for captioning prerecorded content can take up to four months, and about six months for those working on live content. “A lot of people who apply don’t get accepted, and then [many] who start the training don’t finish,” she adds of the live caption process.

Part of that training is learning each studio’s very specific subtitle requirements, or style guide. Netflix’s style guide, which is public, includes rules like a limit of 42 characters per line, a set reading speed of up to 20 characters per second for adult shows (up to 17 for children’s programs) and an emphasis that “dialogue must never be censored.” One exec says Disney’s guide is so detailed that it specifies which words are to be used for R2-D2’s beeps or the sound of a moving lightsaber; those words are then consistently maintained throughout Star Wars programming.

Outside of that, captioning can be a very creative process, which leads to some comical back-and-forth with producers. York recalls that on one project, “We described an actor as bald and we got in trouble for that. They were bald!”

Capital Captions owner Jodene Antoniou notes, “Sometimes you’ll describe a sound and the client will say, ‘I don’t think that’s what that sound was.’ I did one and I said ‘funky music’ or something, and then the client said, ‘I don’t think that was funky music.’ They wanted to use a different word for it.”

If possible, it’s preferred that captioners stay on a specific show for consistency’s sake, so that creative choices match from episode to episode.

“There is a certain level of subjectivity to it. … [It’s] not just having the ability to capture and transcribe the dialogue and the audio, but you also need to be very well-versed in pop culture,” says Deluxe senior vp Magda Jagucka. “Knowledge of music, knowledge of history is very important as well, cultural aspects of anything that the movie depicts.”

Studios also have different preferences for just how detailed they want captions to be. Notes Antoniou, “The general rule is that you caption the sound description if it’s relevant to what’s going on: If you don’t have it, that you’re missing something by not having that sound there. So if a door clicks open, but you can see the door, you don’t really need to say the door’s clicked open because you can see it has.” Some streamers, however, want every single sound to be transcribed and also dictate things like how to caption swear words, where they want captions to be physically placed on the screen and how they want speakers identified.

Though the FCC doesn’t regulate captioning on streaming like it does broadcast TV, York notes that anything that originally aired on TV must be captioned when it’s delivered on a web platform, for example Friends and Seinfeld. The streamers have also realized it’s good business to caption, she adds, as it allows their content to be fed around the world.

Live captioning is an entirely different animal, and those workers use steno machines, like those used in courtrooms, to create hyperfast captions. “We clocked Rachel Maddow at 270 words a minute, so she’s probably one of the fastest [talkers] out there,” says York, joking, “Only the best captioners can do MSNBC primetime.”

To prep for live events like awards shows, captioners are given a script in advance of everything from the teleprompter — except for the names of the winners. When people ad-lib or give their acceptance speeches, the captioners are working from scratch. Explains York, “The person gets up and thanks [someone with] a very complicated name. We take a guess at it, but we’re going to spell it wrong. That’s bound to happen.”

The uncertainty of live TV led to a viral moment at the Grammys in February, when Bad Bunny took the stage for a mashup of his hits “El Apagón” and “Después de la Playa.” As he sang in Spanish, his lyrics weren’t translated into closed captions — instead, the words “singing in non-English” were displayed for those who had captions turned on during the performance. When the Puerto Rican star later won best música urbana album, the words “speaking in non-English” were displayed in parts of his speech that were in Spanish. The incident led CBS president and CEO George Cheeks to address the issue, noting, “A bilingual (English- and Spanish-language) real-time live captioner should have been utilized and the words used on the screen were insensitive to many.” The captions were later corrected for both the Grammys’ West Coast feed and next-day streaming on Paramount+, and a source notes that Spanish- and English-language captioners were on deck for the awards shows in the weeks that followed.

Bad Bunny went viral after his Grammys performance was captioned “singing in non-English.” It’s since been fixed.
Bad Bunny went viral after his Grammys performance was captioned “singing in non-English.” It’s since been fixed.

The global nature of content has also changed things for the subtitling industry, as streamers now often ask for subtitles in nine languages — including French, Italian, German and Spanish — before their shows drop, and put greater emphasis on their non-English-language projects.

“We’ve seen the change in strategy, not only in globalizing English content, but we’re seeing that streamers are actually investing and promoting content from specific regions,” says Jagucka. “International content is now becoming even more popular than your traditional English content, and so that creates a new challenge for us as service providers, because we’ve got to pivot with our workflows, with our resources. That process to bring non-English original content to global audiences requires multiple translation and adaptation steps.” And social media has provided another shake-up, as many content creators across Instagram, TikTok and YouTube now caption their posts.

“That really is a huge boom,” says Krouse, explaining that individual creators are often interested in getting subtitles in anywhere from five to 15 languages in order to expand their follower base.

Antoniou says social media subtitles can be particularly tricky “because of the different aspect ratios for the videos. You need to work to different kinds of specs for them. So if a video is on Facebook, it’s a standard resolution and you’ll have quite long subtitles. If it goes on Instagram, then it’s usually watched on a phone and it’s that long shape, so the subtitles actually need to be really short. That creates an extra challenge to work with.”

Though some are done by hand, social media subtitles are frequently created by artificial intelligence because it’s cheaper. Today, AI is often used to give a first pass at transcription, with human editors then going through to make corrections.

“There’s definitely a big discourse that’s happening right now around AI. We believe that technology really is serving the human talent, so the way we have approached AI is to help both translators and adapters speed up the work where possible,” says Jagucka. Of the current limitations, she continues, “There’s a lot of nuance, and the audio-visual translation isn’t really just based on text. When you’re thinking about AI, it goes through that textual base, but translators get our cues from the sound, from the visual, from the picture, from the tonality of the dialogue and the actors acting, as well.” Others note that AI can’t describe sound effects or song lyrics and frequently gets tripped up on content with multiple speakers.

“AI is really helpful where it speeds up humans and makes their jobs easier, moving from a pure typist to an editor and a proofreader, and eventually a summarizer,” says Krouse. “It makes humans focus on higher value things, as opposed to just pure typing work. So [AI] is definitely going to be important, but more in the sense of helping humans be better at their jobs than anything.”

This story first appeared in the June 21 issue of The Hollywood Reporter magazine. Click here to subscribe.

Best of The Hollywood Reporter

Click here to read the full article.