Posted on

Tacotron 2 or Human; Can You Tell the Difference?

When Apple introduced the new Siri back in June of 2017, I was pretty impressed. In fact, I was so impressed that I wrote a blog on the subject and noted that voice actors better pay attention. (Have You Heard The New Siri? As A Voice Actor, You Better Listen!)

Text-to-speech technology is something I’ve been keeping on eye on because I believe, one day, it could have a profound impact on my income as a voice actor. I’ve already seen it happen in certain areas of my business. In fact, just the other day I had a conversation with an eLearning company that said they prefer to use computer generated voices to voice actors because it’s more convenient for them.

Are you familiar with Moore’s Law? Around 1970 it became common term related to processing speed for computers. The principle states that processing power for computers will double every two years. For the most part, this has held true.

If it applies to computer processing speed, one can’t help but wonder if it could be applied to the advancement of other related computer technologies, such as A.I.

Tacotron 2 or Human; Can You Tell the Difference?

Google released some audio samples recently that are ear-opening.

The engineers at Google have been working very hard on a new text-to-speech system currently called “Tacotron 2.”

Here’s what they have to say about it, with regards to how it compares to human voice. “Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech.”

Mean Opinion Score is a fancy term used in telecommunications that measures how true to life something sounds. Based on the results Google is sharing, Tacotron 2 sounds darn near as real as it gets right now.

In addition to not only sounding real in quality of voice, it also has the ability to detect context! For example it can tell the difference between the noun “present” and the verb “present.”

Could text-to-speech replace voice actors in the near future? Listen to this!
Tweet Quote

The Future is Coming Fast

If you thought the new Siri sounded great back in June, the samples of Tacotron 2 are going to blow your mind and make you nervous all at the same time.

This technology isn’t all that far away from becoming a legit competition in a field like eLearning, in my opinion.

I don’t know about you, but that’s a concerning prospect for me, as that represents a significant chunk of my voice over income.

Hear the samples for yourself here – Tacotron 2 Audio Samples

What do you think?

Posted on

Have You Heard The New Siri? As A Voice Actor, You Better Listen!

Yesterday, during the WWDC Keynote, Apple unveiled the new Siri.

Did you hear it?

Last year while moderating a panel discussion at a voice over conference, J Michael Collins asked each of the panelists for their thoughts on the state of voice over in the next 5-10 years. I said that I was keeping on eye on text-to-speech and the technology involved in powering virtual assistants like Siri or Cortana.

A couple years ago, I consistently made money voicing real estate virtual tours. Then one day, one of my regular clients stopped calling. They began using a computer generated voice. It sounded absolutely awful… but to their point… it was free!

I can’t compete with free.

The Rise of Text-to-Speech

My point, in this particular panel, was that as text-to-speech technology improves, I fully expect more and more people are going to begin using it as an inexpensive alternative to hiring a professional voice actor.

Will it ever replace the human voice entirely? Hopefully not while I’m still working.

Is it something we need to keep an eye on as an industry in the coming years? Absolutely!

Yesterday, when Apple played the new Siri, this only hit home for me even more.

In the new and improved Siri, which will be part of the upcoming OS release in the Fall, Siri now has the ability to speak with basic expressiveness. Thanks to machine learning advances, Siri can have a level of emotion.

That’s scary stuff for voice actors!

As Text-to-Speech technology improves, voice actors need to pay attention!
Tweet Quote

To me, that says the advancement in this technology is inevitable.

Is is going to start popping up on national commercial campaigns? Doubtful.

Could this technology be deployed in a genre like eLearning? I think it could!

As I said, we’re not there yet. But it may be closer than we think… or closer than we want to think about!

Check out this clip to hear the new Siri and share your thoughts in the comments below.

The New Siri Voice WWDC 2017