Computational Video Editing may replace Assistant Editors

Eric Escobar writes on Film Independent about his trip to Siggraph 2017 and the one technology that blew his mind: Computational Video Editing.

Three researchers from Stanford University and one from Adobe demonstrated a system that:

“automatically selects the most appropriate clip from one of the input takes, for each line of dialogue, based on a user-specified set of film-editing idioms. Our system starts by segmenting the input script into lines of dialogue and then splitting each input take into a sequence of clips time-aligned with each line. Next it labels the script and the clips with high-level structural information (e.g., emotional sentiment of dialogue, camera framing of clip, etc.). After this pre-process, our interface offers a set of basic idioms that users can combine in a variety of ways to build custom editing styles. Our system encodes each basic idiom as a Hidden Markov Model that relates editing decisions to the labels extracted in the pre-process. For short scenes (< 2 minutes, 8-16 takes, 6-27 lines of dialogue) applying the user-specified combination of idioms to the pre-processed inputs generates an edited sequence in 2-3 seconds.”

That’s right. Three seconds. For a 90 second scene. Versus 90 minutes for a human. If my math is correct, that makes this system 180,000% faster!

The idioms, from the research notes:

  • Avoid jump cuts
  • Change zoom gradually
  • Emphasize character
  • Intensify emotion
  • Mirror position
  • Peaks and valleys
  • Performance fast/slow
  • Performance loud/quiet
  • Short lines
  • Speaker visible
  • Start wide
  • Zoom consistent
  • Zoom in/out

Editors combine a number of these idioms and weight them to generate different assemblies of the rushes, automatically.

Of course, editors will then proceed to polish these rough cuts, tweaking the edits and finessing the sound.

My take: This promises to take out all the tedium in editing and let editors focus on truly being creative. Eric envisions a client-side version of this in which every viewer’s version of a film is custom-generated for them, based on their favourite editing style. That may be going a little too far but what I find fascinating about this system is that it starts with the script, once again highlighting how crucial it is.