Published on Medium.com

We watch TV with subtitles a lot these days. It’s like reading television.
[rustling leaves]

Actually, we watch a lot of closed captioned (CC) TV. In addition to the already-subtitled productions (translating non-English films) we started by turning on closed-captioning for those highly-accented British period pieces or procedurals (where you know it’s English but just can’t get in the rhythm).
[footsteps approaching]

Intended for the hearing challenged (which, increasingly, we are becoming!) the optional closed captions (as opposed to open captions, which are automatically visible) add audio track descriptions in addition to dialog; they classify soundtrack music (which are not even close to running out of adjectives, see [dramatic musical sting]* below) and pick up distance noise, or off screen sounds, turning it all into words. In italics.
[dog barking in distance]

CC’s are one way for couples to avoid a life of DVR/Streaming rewinds because one or the other didn’t catch something. It’s better than both wearing headphones, which is a bit more isolation than any of us need at the moment. There are now, in cinemas (but soon at home?) closed caption glasses! Words in green float in front of the wearer (at an adjustable distance!) to narrate the action and dialog.
[clapping]

Half of what we watch is already subtitled, which can lead to a funny competition downscreen between the CC’s and subtitles; like alternative fact sets, or quarreling kids describing the same misbehavior to a parent, or an Altman film (and now I really need to see how captions handle a room of people all talking at the same time).
[indistinct chatter]

We hear actor’s voices and, independently, read their words; but these two streams of data operate as if on different frequencies. Magically, in one’s memory actor and words rejoin; it feels like it was all in English (even when it wasn’t), seamlessly merged and comprehensible. But it was born of a deconstructed viewing, reassembled by one’s brain.
[breaking glass]

Because all sound is uniformly displayed (shouting doesn’t provoke BIG TYPE SIZE nor does whispering reduce it) and all sounds, those close and far and those not even discernable, are dutifully transcribed. The information is flattened yet there is an attempt to distinguish variants (again, see list below of just a few musical subtitles*). A dichotomy develops between what you are seeing/hearing and what you are told you are seeing/hearing. Background dialog, not meant to be anything but atmospheric, are reported as clearly as foreground ones.
[ripping paper]

Reading TV sets up a friendly competition between your left and right brain halves. You negotiate between what are you seeing (right) and what you are reading (left). The paraphrase, ‘are you going to believe these subtitles or your lying eyes?’ (and is ‘paraphrase’ even the right word?) might be ‘are you going to believe your left brain or your lying right brain?!’.
[squishy noises]

Is the bilateral competition multitasking, or does it prove that there is no such thing as multitasking? Because one can miss the action while reading, I suspect it’s rapid toggling between tasks, in this case reading and watching, left and right. Reading, the most interior act vs. watching what is essentially a play, a communal activity. What sounds like it should produce dissonance results instead in harmony, in part because the brain lobes are wired to work together.
[sudden overlapping shouts]

Subtitles can constitute an entire novel, or at least a screenplay, in the most stripped-down “just the facts ma’am”, Hemingway-styled delivery possible. American Sign Language (ASL) translations, on the other hand (yes, pun), are like picture-in-picture TV viewing or interpretive dances responding to a text. Think Jules Feiffer’s many cartoons titled ‘A Dance to Spring’ which are shorthand for stagey histrionics (an admittedly childish view of an important visual language).
[crumpling paper]

The question American Sign Language raises is why a second right-brain language (ASL) is competing with the main right brain show (the visual action, or slide show, or horse race…whatever it is) when we could all benefit from closed captioning…you know, like reading TV? The overload must feel like trying to speak on a phone producing an echo of your own voice; conversation just grinds to a halt.
[door slams]

I once gave a talk with Michael Bierut when we were young partners at Pentagram. We were at a lectern in an auditorium that was large and varied enough to have an ASL interpreter standing just below the stage in front of us. Michael intentionally added a “fuck” to the talk and quickly leaned over the lectern to watch the translation! Adding an entirely new visual language once made sense when there were no alternatives, but when will technology overtake this, just as traffic lights replaced policemen directing traffic at intersections?
[wind whistling]

Carin Goldberg, designer and teacher of graphic design at SVA in New York (and my wife) had a non-hearing student who required two ASL interpreters to accompany him to each class; one would fatigue before the 3-hour class ended, so they alternated. Is it the physical work or the mental calculations that exhausts them so much faster than the originator of the words? Subtitles MUST be less expensive (and less distracting) than two translators for each student in need. And could it not be a phone app (in fact, isn’t the phone already a transcription device)?
[pencil scribbling]

The first TV CC’s were on Julia Child’s “The French Chef” in 1972 (and it would be sad to miss her warbling) but the first test of CC’s was on ABC’s “The Mod Squad”!
I would love to watch Linc say (and this is real dialog):
Linc: “There are those who look at things and ask why, I dream of things that never were and ask why not?…Who said that?”
Leila: “A great man, who died for it” was the answer, also spoken by an African American, actress Judy Pace.
[bird warbling]

I knew this as a Robert F Kennedy quote from his time running for president in 1968, but apparently JFK referenced it in a speech and Ted Kennedy used it in his eulogy for his younger brother. And all of them cribbed it from George Bernard Shaw, so maybe Linc was really asking a nuanced question, not just being portrayed as unknowing.
[car door slams]

It might be, except that this Mod Squad moment was from a 1968 show aired October 1, barely 100 days after RFK was shot and killed, and the show was occasionally filmed at the same hotel. The Mod Squad was incorporating RFK’s legacy into the dialog almost instantly. This was just a month before the 1968 election (in which a living Robert Kennedy might have overturned history) rivaling only the 2016/2020 elections in the foreshadowing of a nation in precipitous decline.
[gunshot]

Real time, on-the-run, live transcription CC on TV started 10 years after the Mod Squad test employing banks of court reporters able to transcribe 225 wpm. But even transcription services touted these days as non-AI (Artificial Intelligence, i.e. actual human beings doing the translation) are prone to fault; as I was writing this a friend was correcting a raw transcript that read “before he went all homo” when the words were actually “when he went all Pomo” as in “postmodern”! Big difference.
[audience gasps]

Maybe that’s why the BBC uses a ‘respeaker’ when feeding sports commentary into voice recognition; a respeaker is someone who repeats, with careful enunciation, the exact dialog to assure accurate AI transcription. Unlike digital copies, and more like Xerox copies, each re-transmission of language degrades it a bit, until the original and mutation begin to diverge into parallel worlds of meaning.
[echoing footsteps]

Simultaneous translation at the United Nations is a booming industry employing more than 100, and that’s just interpreters; translators (who translate the written word) are a separate group. The interpreters WFH now, but normally work in soundproof booths with teams of 2 or 3 to ‘tag team’ the instantaneous alchemy from one language to another.
[hushed murmuring]

There are 6 official interpretable languages at the UN: English, French, Spanish, Russian, Arabic and Chinese. That leaves out a lot of languages (che casino!) that have to be first translated into one of the big six before they can be re-interpreted into the other 5 official languages…all in real time. The rule is no more than two language leaps before distribution; precision of language is the currency of the UN. As I said, it’s an industry, all happening in real time, managing distortion while seeking precise translated meaning.
[microphone feedback]

We’re streaming a series titled “Our Time is Now” in Sweden where it is made, but for some reason is called “The Restaurant” in English. Yes, there is a restaurant at the center of the series, but WTF? That’s all they could come up with?
[silverware rattling]

Movie titles are worse:
Mr. Cat Poop (As Good As It Gets: China)
Urban Neurotic (Annie Hall: Germany)
Big Liar (Nixon: China)
Meetings and Failures in Meetings (Lost in Translation: Portugal)
Please, Do Not Touch the Older Women (: Italy)
[crowd laughs]

There’s even a version of The Sixth Sense in China titled “He’s a Ghost”, which, if true, is the best spoiler imaginable; why bother even seeing it after you’ve heard the title?
My suggestions for re-titling movies and books; The Red Pony DiesKevin Spacey is Keyser SozeRosebud, the Sled. So that should save some time…
[laughter in distance]

Google Translate absorbed the visual translator Word Lens back in 2014. Word Lens was that magical app that translated signs and text, anything written that the phone camera focused on, in real time, into one’s own language. It could be a street sign in China or a page of text in German, but it was (and still is) magic. It may be a simulacrum of multilingual thinking (the cinematic version) and it makes all of us feel invincible, able to access the whole world. Google Translate has handled audio translation since 2015. And Siri has been around since 2011, steadily improving (I am told; I rarely use it).[horses approaching]

There is no lack of tech (for the literate) to blur the language barrier, but nothing is the same in translation. There is always the ‘telephone’ effect; the game where a simple message after being whispered from ear to ear to ear becomes completely mangled (like Pomo/homo, but usually much worse). It’s the accumulation of tiny revisions suffering from ‘compound interest’. The more complex the data, the more distortion is likely.
[no audible dialog]

The alternate reality our brains create when faced with disparate messages are instructive. Millennials, the ones I know and likely lots of other generations, text more than call. When I ask my son ‘do you speak with them anymore?’ and he answers ‘yes’, I ask one more question, ‘phone or text?’ and the answer is invariably ‘text’. It is the same sensation to many. It’s like (for older generations) leaving a long voicemail and trying to remember later if you actually spoke together or just ‘monologued’ the message. It feels virtually the same.
[tap dancing sounds]

Subtracting sound can be equally dramatic. We’ve eaten more than once at SingleThread in Healdsburg, where Kyle and Katina Connaughton have created one of the most refined dining experiences imaginable. Everything, from the four different ceramic wall tiles in the bathrooms (from Kyle’s four work venues; France, Japan, Britain and California) to the loomed hangings referencing DNA arrays, to the knives forged from parts of a vintage 1968 Volkswagen, is layered with meaning and thoughtful design.
[car backfires]

The open kitchen (with sliding door acting as stage curtain) has almost magically edited out noise (pots, shouting, clanking, bells, commotion) to create a kind of quiet ballet. The mood is calm, wait staff walks instead of running, and the only sound you hear is the hiss or crackle of food being cooked. Other than training the staff, the trick is covering all the stainless steel work surfaces with thick silicone mats. Drop something on the counter and it makes zero noise. Silent transparency; all visible and all nearly soundless.
[dishes breaking in kitchen]

The dialectic between the left and right brain, between watching and reading, between sound and silence, between truth and fiction, is the fulcrum upon which our understanding of the world balances. We are accustomed to more and more data these days, but less and less data can be equally stimulating.

It all depends on how reading sounds to you.


*Like naming colors for Revlon, or Benjamin Moore, the varieties of musical description are endless (or need to be).
Here is a small selection from a single movie:

[dramatic musical sting]
[dark music]
[eerie music]
[emotional music]
[epic music]
[fantastical music]
[intense musical buildup]
[intense percussive music]
[ominous music]
[uplifting music]
[whimsical music]
[brooding dramatic music]
[brooding music]
[dramatic music]
[gentle music]
[pensive orchestration]
[rousing music]
[sinister tone]
[soft dramatic music]
[soft music]
[soft rousing music]
[soft tense music]
[somber music]
[somber orchestration]
[suspenseful music]
[tense music]