OpenAI says it ran a small-scale test of its new voice cloning product Voice Engine with a few select partners. The results show promising applications for the tech, but safety concerns may keep it from being released.
OpenAI says that Voice Engine can clone a human’s voice based on a single 15-second recording of their voice. The tool can then generate “natural-sounding speech that closely resembles the original speaker.”
Once cloned, Voice Engine can turn text inputs into audible speech using “emotive and realistic voices.” The tool’s capability makes exciting applications possible but raises serious safety issues too.
Promising use cases
OpenAI started testing Voice Engine late last year to see how a small group of select participants could use the tech.
Some of the examples of how Voice Engine test partners used the product are:
- Adaptive teaching – Age of Learning used Voice Engine to provide reading assistance to children, create voice-over content for learning material, and provide personalized verbal responses to interact with students.
- Translating content – HeyGen used Voice Engine for video translation so product marketing and sales demos could reach a wider market. The translated audio retains the person’s native accent. So, when a native French speaker’s audio is translated into English you’d still hear their French accent.
- Provide wider social services – Dimagi trains health workers in remote settings. It used Voice Engine to give training and interactive feedback to health workers in underserved languages.
- Supporting non-verbal people – Livox enables non-verbal people to communicate using alternative communication devices. Voice Engine allows these people to choose a voice that best represents them rather than something that sounds more robotic.
- Helping patients recover their voice – Lifespan piloted a program offering Voice Engine to people with speech impairments due to cancer or neurologic conditions.
Voice Engine isn’t the first AI voice cloning tool, but the samples in OpenAI’s blog post point to it representing the state-of-the-art and may even be better than ElevenLabs.
Here’s just one example of the natural inflection and emotive characteristics it can generate.
OpenAI just launched Voice Engine,
It uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
Reference and Generated audio is very close and hard to differentiate.
More details inpic.twitter.com/tJRrCO2WZP — AshutoshShrivastava (@ai_for_success) March 29, 2024
Safety concerns
OpenAI said it was impressed with the use cases test participants came up with but more safety measures would need to be in place before the company decided on “whether and how to deploy this technology at scale.”
OpenAI says technology that can accurately reproduce someone’s voice “has serious risks, which are especially top of mind in an election year.” Fake Biden robocalls and the fake video of Senate candidate Kari Lake are cases in point.
In addition to the clear restrictions in its general usage policies, the participants in the trial had to have “explicit and informed consent from the original speaker” and were not allowed to build a product that enabled people to create their own voices.
OpenAI says it implemented other safety measures including an audio watermark. It didn’t explain exactly how but said it could perform “proactive monitoring” of Voice Engine’s use.
What’s next?
Will the rest of us get to play around with Voice Engine? It’s unlikely, and maybe that’s a good thing. The potential for malicious use is huge.
OpenAI is already recommending that institutions like banks phase out voice authentication as a security measure.
Voice Engine has an embedded audio watermark, but OpenAI says more work is needed to identify when audiovisual content is AI-generated.
Even if OpenAI decides not to release Voice Engine, others will. The days of being able to trust your eyes and ears are history.
The post OpenAI says Voice Engine might be too risky to release appeared first on DailyAI.