Tag Archives: AI

Free NAB 2024 Exhibits Pass, Come See Beauty Box 6.0

NAB is three months away! We have a booth right next to Adobe in South Hall Lower, SL7043, so we’ll be easy to find.

If you haven’t registered yet, you can use this code for a free exhibits pass: NS9892

Go here to register for NAB: https://registration.experientevent.com/ShowNAB241/

We’ve got a new version of Beauty Box, our AI based digital makeup and retouching plugin, coming soon! It’ll be released before NAB, but we’ll definitely be showing that off. The big new feature uses AI to whiten teeth. We’ve always used AI in other ways in Beauty Box, but this is a much more sophisticated version. We’ve put a lot of work into it and we’re pretty excited about it.

There will also be a couple other announcements/release around April, so stayed tune for that. If you haven’t chatted with us lately, come by the booth and say hello. You can learn about all the new stuff. Including Data Storyteller which we released a few months ago and won a Best of NAB 2023 from Videomaker!

Tom Cruise and The Deepfake End of The World

Any time I hear people freaking out about A.I., for good or bad, I’m skeptical. So much that happens in the world of A.I. is hype that the technology may never live up to, much less live up to it now, that you have to take it with a grain of salt.

So it goes with the great Tom Cruise deepfake.

It took a good VFX artist two months to turn footage of a Tom Cruise impersonator into the videos that people are freaking out about. The Verge has a good article on it.

This is an awesome demo reel for the VFX artist involved. It’s very well done.

But it doesn’t make an awesome case for the technology disrupting the world as we know it. It needed raw footage of someone that looked and acted like Cruise. It then took months to clean up the results of the A.I. modifying the Cruise look-a-like.

(It does make an interesting case for the tech to be used in VFX, but that’s something different)

This isn’t a ‘one-click’ or even a ‘whole-bunch-of-clicks’ technology. It’s a ‘shit-ton-of-work’ technology and given they had the footage of the Cruise look-a-like you can make an argument that it could’ve been done in less time with traditional rotoscoping and compositing.

Anyways, the fear and consternation has gotten ahead of itself. We’ve had the ability to put people in photos they weren’t in for a long time. We’ve figured out how to deal with it. (not to mention, it’s STILL very difficult to cut someone out of one picture and put them in another one without it being detectable… and Photoshop is, what, 30 years old now?)

It’s good to consider the implications but we’re a long, long way from anyone being able to do this to any video.

Why do we have the lowest transcription costs?

We occasionally get questions from customers asking why we charge .04/min ($2.40/hr) for transcription (if you pre-pay), when some competitors charge .25/min or even .50/min. Is it lower accuracy? Are you selling our data?

No and no. Ok, but why?

Transcriptive and PowerSearch work best when all your media has transcripts attached to it. Our goal is to make Transcriptive as useful as possible. We hope the less you have to think about the cost of the transcripts, the more media you’ll transcribe… resulting in making Transcriptive and PowerSearch that much more powerful.

The Transcriptive-AI service is equal to, or better, than what other services are using. We’re not tied to one A.I. and we’re constantly evaluating the different A.I. services. We use whatever we think is currently state-of-the-art.  Since we do such a high volume we get good pricing from all the services, so it doesn’t really matter which one we use.

Do we make a ton of money on transcribing? No.

The services that charge .25/min (or whatever) are probably making a fair amount of money on transcribing. We’re all paying about .02/min or less. Give or take, that’s the wholesale/volume price.

If you’re getting your transcripts for free… those transcripts are probably being used for training, especially if the service is keeping track of the edits you make (e.g. YouTube, Otter, etc.). Transcriptive is not sending your edits back to the A.I. service. That’s the important bit if you’re going to train the A.I. Without the corrected version, the A.I. doesn’t know what it got wrong and can’t learn from it.

So, for us, it all comes down to making Transcriptive.com, the Transcriptive Premiere Pro panel, and PowerSearch as useful as possible. To do so, we want the most accurate transcripts and we want them to be as low cost as possible. We know y’all have a LOT of footage. We’d rather reduce the barriers to you transcribing all of it.

So… if you’re wondering how we can justify charging .04/min for transcripts, that’s the reason. It enables all the other cool features of Transcriptive and PowerSearch. Hopefully that’s a win for everyone.

A.I. Speech-to-Text: How to make sure your data isn’t being used for training

We get a fair number of questions from Transcriptive users that are concerned the A.I. is going to use their data for training.

First off, in the Transcriptive preferences, if you select ‘Delete transcription jobs from server’ your data is deleted immediately. This will delete everything from the A.I. service’s servers and from the Digital Anarchy servers. So that’s an easy way of making sure your data isn’t kept around and used for anything.

However, generally speaking, the A.I. services don’t get more accurate with user submitted data. Partially because they aren’t getting the ‘positive’ or corrected transcript.

When you edit your transcript we aren’t sending the corrections back to the A.I. (some services are doing this… e.g. if you correct YouTube’s captions, you’re training their A.I.)

So the audio by itself isn’t that useful. What the A.I. needs to learn is the audio file, the original transcript AND the corrected transcript. So even if you don’t have the preference checked, it’s unlikely your audio file will be used for training.

This is great if you’re concerned about security BUT it’s less great if you really WANT the A.I. to learn. For example, I don’t know how many videos I’ve submitted over the last 3 years saying ‘Digital Anarchy’. And still to this day I get: Dugal Accusatorial (seriously), Digital Ariki, and other weird stuff. A.I. is great when it works, but sometimes… it definitely does not work. And people want to put this into self-driving cars? Crazy talk right there.

 If you want to help the A.I. out, you can use the Speech-to-Text Glossary (click the link for a tutorial). This still won’t train the A.I., but if the A.I. is uncertain about a word, it’ll help it select the right one.

How does the glossary work? The A.I. analyzes a word sound and then comes up with possible words for that sound. Each word gets a ‘confidence score’. The one with the highest score is the one you see in your transcript. In the case above, ‘Ariki’ might have had a confidence of .6 (out 0 to 1, so .6 is pretty low) and ‘Anarchy’ might have been .53. So my transcript showed Ariki. But if I’d put Anarchy into the Glossary, then the A.I. would have seen the low confidence score for Ariki and checked if the alternatives matched any glossary terms.

So the Glossary can be very useful with proper names and the like.

But, as mentioned, nothing you do in Transcriptive is training the A.I. The only thing we’re doing with your data is storing it and we’re not even doing that if you tell us not to.

It’s possible that we will add the option in the future to submit training data to help train the A.I. But that’ll be a specific feature and you’ll need to intentionally upload that data.

Artificial Intelligence vs. Video Editors

With Transcriptive, our new tool for doing automated transcriptions, we’ve dove into the world of A.I. headfirst. So I’m pretty familiar with where the state of industry is right now. We’ve been neck deep in it for the last year.

A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text.  Searching via object recognition is something that also is already happening. But what about actual video editing?

One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.

But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.

Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.

Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.

Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.

It’s Not All Good News

There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.

Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.

You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.

Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.

For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?

Let The A.I Do The Grunt Work, Not The Editing

The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.

While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off.  I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.