Artificial Intelligence vs. Video Editors

With Transcriptive, our new tool for doing automated transcriptions, we’ve dove into the world of A.I. headfirst. So I’m pretty familiar with where the state of industry is right now. We’ve been neck deep in it for the last year.

A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text.  Searching via object recognition is something that also is already happening. But what about actual video editing?

One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.

But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.

Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.

Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.

Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.

It’s Not All Good News

There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.

Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.

You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.

Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.

For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?

Let The A.I Do The Grunt Work, Not The Editing

The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.

While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off.  I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.


Speeding Up De-flickering of Time Lapse Sequences in Premiere

Time lapse is always challenging… you’ve got a high resolution image sequence that can seriously tax your system. Add Flicker Free on top of that… where we’re analyzing up to 21 of those high resolution images… and you can really slow a system down. So I’m going to go over a few tips for speeding things up in Premiere or other video editor.

First off, turn off Render Maximum Depth and Maximum Quality. Maximum Depth is not going to improve the render quality unless your image sequence is HDR and the format you’re saving it to supports 32-bit images. If it’s just a normal RAW or JPEG sequence, it  won’t make much of a difference. Render Maximum Quality may make a bit of difference but it will likely be lost in whatever compression you use. Do a test or two to see if you can tell the difference (it does improve scaling) but I rarely can.

RAW: If at all possible you should shoot your time lapses in RAW. There are some serious benefits which I go over in detailed in this video: Shooting RAW for Time Lapse. The main benefit is that Adobe Camera RAW automatically removes dead pixels. It’s a big f’ing deal and it’s awesome. HOWEVER… once you’ve processed them in Adobe Camera RAW, you should convert the image sequence to a movie or JPEG sequence (using very little compression). It will make processing the time lapse sequence (color correction, effects, deflickering, etc.) much, much faster. RAW is awesome for the first pass, after that it’ll just bog your system down.

Nest, Pre-comp, Compound… whatever your video editing app calls it, use it. Don’t apply Flicker Free or other de-flickering software to the original, super-high resolution image sequence. Apply it to whatever your final render size is… HD, 4K, etc.

Why? Say you have a 6000×4000 image sequence and you need to deliver an HD clip. If you apply effects to the 6000×4000 sequence, Premiere will have to process TWELVE times the amount of pixels it would have to process if you applied it to HD resolution footage. 24 million pixels vs. 2 million pixels. This can result in a HUGE speed difference when it comes time to render.

How do you Nest?

This is Premiere-centric, but the concept applies to After Effects (pre-compose) or FCP (compound) as well. (The rest of this blog post will be explaining how to Nest. If you already understand everything I’ve said, you’re good to go!)

First, take your original image sequence (for example, 6000×4000 pixels) and put it into an HD sequence. Scale the original footage down to fit the HD sequence.

Hi-Res images inside an HD sequenceThe reason for this is that we want to control how Premiere applies Flicker Free. If we apply it to the 6000×4000 images, Premiere will apply FF and then scale the image sequence. That’s the order of operations. It doesn’t matter if Scale is set to 2%. Flicker Free (and any effect) will be applied to the full 6000×4000 image.

So… we put the big, original images into an HD sequence and do any transformations (scaling, adjusting the position and rotating) here. This usually includes stabilization… although if you’re using Warp Stabilizer you can make a case for doing that to the HD sequence. That’s beyond the scope of this tutorial, but here’s a great tutorial on Warp Stabilizer and Time Lapse Sequences.

Next, we take our HD time lapse sequence and put that inside a different HD sequence. You can do this manually or use the Nest command.

Apply Flicker Free to the HD sequence, not the 6000x4000 imagesNow we apply Flicker Free to our HD time lapse sequence. That way FF will only have to process the 1920×1080 frames. The original 6000×4000 images are hidden in the HD sequence. To Flicker Free it just looks like HD footage.

Voila! Faster rendering times!

So, to recap:

  • Turn off Render Maximum Depth
  • Shoot RAW, but apply Flicker Free to a JPEG sequence/Movie
  • Apply Flicker Free to the final output resolution, not the original resolution

Those should all help your rendering times. Flicker Free still takes some time to render, none of the above will make it real time. However, it should speed things up and make the render times more manageable if you’re finding them to be really excessive.

Flicker Free is available for Premiere Pro, After Effects, Final Cut Pro, Avid, Resolve, and Assimilate Scratch. It costs $149. You can download a free trial of Flicker Free here.

Getting transcripts for Premiere Multicam Sequences

Using Transcriptive with multicam sequences is not a smooth process and doesn’t really work. It’s something we’re working on coming up with a solution for but it’s tricky due to Premiere’s limitations.

However, while we sort that out, here’s a workaround that is pretty easy to implement. Here are the steps:

1- Take the clip with the best audio and drop it into it’s own sequence.
Using A.I. to transcribe Premiere Multicam Sequences
2- Transcribe that sequence with Transcriptive.
3- Now replace that clip with the multicam clip.
Transcribing multicam in Adobe premiere pro

4- Voila! You have a multicam sequence with a transcript. Edit the transcript and clip as you normally would.

This is not a permanent solution and we hope to make it much more automatic to deal with Premiere’s multicam clips. In the meantime, this technique will let you get transcripts for multicam clips.

Thanks to Todd Drezner at Cohn Creative for suggesting this workaround.

Wherein Jim Tierney rants and opines about After Effects, Premiere Pro, Final Cut Pro, and other nonsense