{"id":628310,"date":"2023-04-12T09:50:12","date_gmt":"2023-04-12T14:50:12","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/04\/12\/ai-can-clone-your-favorite-podcast-hosts-voice\/"},"modified":"2023-04-12T09:50:12","modified_gmt":"2023-04-12T14:50:12","slug":"ai-can-clone-your-favorite-podcast-hosts-voice","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/04\/12\/ai-can-clone-your-favorite-podcast-hosts-voice\/","title":{"rendered":"AI Can Clone Your Favorite Podcast Host\u2019s Voice"},"content":{"rendered":"<div data-testid=\"ArticlePageChunks\">\n<div data-journey-hook=\"client-content\" data-testid=\"BodyWrapper\">\n<p><span>One day this<\/span> year, you\u2019ll start listening to a podcast and realize something is a little off. The host, whose voice is familiar to you, will sound different. Sentences may be stilted or some words will have an odd tone. And so you\u2019ll ask, <em>Is this actually the host talking or their AI voice clone?<\/em><\/p>\n<p>Just as artificial intelligence has proven adept at generating lifelike images, effective videos, and cogent text, similar technologies can convincingly mimic the voices of podcast hosts, content creators, and other media professionals. A new set of tools from a growing list of startups is expected to hasten AI\u2019s conquest of our audio feeds.<\/p>\n<p>Our ears are already familiar with computer-generated speech. Artificial voices are\u00a0<a href=\"https:\/\/www.wired.com\/story\/spotify-ai-dj\/\">playing DJ<\/a> and answering your\u00a0<a href=\"https:\/\/www.theverge.com\/2023\/2\/22\/23609915\/samsung-ai-voice-clone-bixby-text-call-service-korean\">phone calls<\/a>. Technologists have cloned the voices of <a href=\"https:\/\/www.wired.com\/story\/these-hidden-deepfakes-anthony-bourdain-movie\/\">celebrities<\/a> <a href=\"https:\/\/www.wired.com\/story\/simpsons-voice-actors-ai-deepfakes\/\">alive<\/a> and <a href=\"https:\/\/www.wired.com\/story\/andy-warhol-diaries-artificial-intelligence-voice\/\">dead<\/a> and reconstructed the voices of those who have\u00a0<a data-offer-url=\"https:\/\/www.rogerebert.com\/roger-ebert\/finding-my-own-voice\" href=\"https:\/\/www.rogerebert.com\/roger-ebert\/finding-my-own-voice\" rel=\"nofollow noopener\" target=\"_blank\">lost their ability to speak<\/a> due to illness. Someday soon, AI-powered speech tools will be able to bring back the voices of our\u00a0<a data-offer-url=\"https:\/\/twitter.com\/chheplo\/status\/1644768774430793728?s=46&#038;t=ZqraHSbDFWmhlxFrLXJiNw\" href=\"https:\/\/twitter.com\/chheplo\/status\/1644768774430793728?s=46&#038;t=ZqraHSbDFWmhlxFrLXJiNw\" rel=\"nofollow noopener\" target=\"_blank\">dead relatives<\/a>.<\/p>\n<p>When it comes to producing podcasts, machines have proven\u00a0<a href=\"https:\/\/www.theverge.com\/22672123\/ai-voice-clone-synthesis-deepfake-applications-vergecast\">able to lend a hand<\/a> in the editing room. Editing services like\u00a0<a data-offer-url=\"https:\/\/www.descript.com\/\" href=\"https:\/\/www.descript.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Descript<\/a> offer machine learning features that clean up an audio recording of human speech by removing awkward pauses and filler words such as \u201cum\u201d and \u201clike.\u201d<\/p>\n<p>Lately, even more options are emerging to take care of the really messy part of making a podcast: the talking. Descript offers a feature called Overdub, which creates a virtual voice that can be used in production editing. If a host mispronounces somebody\u2019s name or gets a date wrong, a producer can task the robot with saying it correctly, then paste in the correction.<\/p>\n<p>Newer tools go even further. In January, Podcastle, a startup that offers a suite of podcasting software, released an AI-powered voice cloning tool called Revoice that can create a digital simulacrum of a human host. The company is positioning Revoice as a way for producers to create any aspect of an audio production\u2014from ad reads to voiceovers to\u00a0<a href=\"https:\/\/www.wired.com\/story\/apple-spotify-audiobook-narrators-ai-contract\/\">audiobooks<\/a>\u2014just by typing in the words they want the virtual version of the host to say.<\/p>\n<p>Creating a digital copy of your voice takes a bit of work. While some AI services can emulate voices by studying audio clips of the person talking, Podcastle requires users to read off a script of around 70 phrases, selected to capture a variety of mouth movements and phonemes. The process takes 30 to 45 minutes, depending on how particular you are about getting the intonations right.<\/p>\n<p>\u201cThe idea was always that it should be very close to your original voice,\u201d Podcastle CEO Artavazd Yeritsyan says of the resulting voice clone. \u201cNot a beautification or making your voice even better than it is, but very accurate in how you pronounce the words.\u201d<\/p>\n<p>It\u2019s a lofty goal, but voice AI doesn\u2019t always sound quite as melodious as an actual human voice would. The tone (at least in my experimentation) comes across as monotonous and robotic, with weird stutters and synthetic artifacts throughout.<\/p>\n<\/div>\n<div data-journey-hook=\"client-content\" data-testid=\"BodyWrapper\">\n<p>I&#8217;ll show you an example, starting with my actual speaking voice.<\/p>\n<figure data-testid=\"IframeEmbed\">\n<div data-testid=\"IframeEmbedContainer\">\n<p><span><\/p>\n<p>Here\u2019s a clip of audio from a recent episode of WIRED\u2019s <a href=\"https:\/\/www.wired.com\/tag\/gadget-lab-podcast\/\" target=\"_blank\" rel=\"noopener\"><em>Gadget Lab<\/em><\/a> podcast, where I went on the show to complain about <a href=\"https:\/\/www.wired.com\/story\/gadget-lab-podcast-584\/\" target=\"_blank\" rel=\"noopener\">phones being too good<\/a>. (Credit: WIRED)<\/p>\n<p><\/span><\/p>\n<\/div>\n<\/figure>\n<p>Next, my simulation.<\/p>\n<figure data-testid=\"IframeEmbed\">\n<div data-testid=\"IframeEmbedContainer\">\n<p><span><\/p>\n<p>This second clip was made in Revoice. I transcribed the same words I spoke on the show and put them through the AI voice clone software. (Credit: Podcastle)<\/p>\n<p><\/span><\/p>\n<\/div>\n<\/figure>\n<p>Those imperfections in rhythm and inflection are inevitable, says Vijay Balasubramaniyan. He\u2019s CEO of the company\u00a0<a data-offer-url=\"https:\/\/www.pindrop.com\/\" href=\"https:\/\/www.pindrop.com\/\" rel=\"nofollow noopener\" target=\"_blank\">Pindrop<\/a>, which analyzes voices in audio and phone calls to prevent fraud. \u201cYour voice is something that\u2019s developed over 10,000 years of evolution,\u201d he says. \u201cSo you\u2019ve developed certain things that are very hard for machines to replicate.\u201d<\/p>\n<p>Audio AI may feel only slightly more realistic than\u00a0<a href=\"https:\/\/www.wired.com\/story\/text-to-video-ai-generators-filmmaking-hollywood\/https:\/\/www.wired.com\/story\/text-to-video-ai-generators-filmmaking-hollywood\/\">AI video<\/a> at the moment, but the results from the current set of tools are good enough to make security experts nervous. There are very good reasons you\u2019d want to\u00a0<a href=\"https:\/\/www.wired.com\/story\/voice-recognition-privacy-speech-changer\/\">hide your voice<\/a> for the sake of security and privacy; it can be used to authenticate your identity, and machines can determine identifying factors like your age, ethnicity, gender, and economic status just by listening to you speak.<\/p>\n<\/div>\n<div data-journey-hook=\"client-content\" data-testid=\"BodyWrapper\">\n<p>Balasubramaniyan says voice AI services need to offer security on par with that of other companies that store personal data, like financial or medical information.<\/p>\n<p>\u201cYou have to ask the company, \u2018how is my AI voice going to be stored? Are you actually storing my recordings? Are you storing it encrypted? Who has access to it?\u2019\u201d Balasubramaniyan says. \u201cIt is a part of me. It is my intimate self. I need to protect it just as well.\u201d<\/p>\n<p>Podcastle says the voice models are end-to-end encrypted and that the company doesn\u2019t keep any recordings after creating the model. Only the account holder who recorded the voice clips can access them. Podcastle also doesn\u2019t allow other audio to be uploaded or analyzed on Revoice. In fact, the person creating a copy of their voice has to record the lines of prewritten text directly into Revoice\u2019s app. They can\u2019t just upload a prerecorded file.<\/p>\n<p>\u201cYou are the one giving permission and creating the content,\u201d Podcastle\u2019s Yeritsyan says. \u201cWhether it\u2019s artificial or original, if this is not a deepfaked voice, it\u2019s this person\u2019s voice and he put it out there. I don\u2019t see issues.\u201d<\/p>\n<p>Podcastle is hoping that being able to render audio in only a consenting person\u2019s cloned voice would disincentivize people from making themselves say anything too horrible. Currently, the service doesn\u2019t have any content moderation or restrictions on specific words or phrases. Yeritsyan says it is up to whatever service or outlet publishes the audio\u2014like Spotify, Apple Podcasts, or YouTube\u2014to police the content that gets pushed onto their platforms.<\/p>\n<p>\u201cThere are huge moderation teams on any social platforms or any streaming platform,\u201d Yeritsyan says. \u201cSo that\u2019s their job to not let anyone else use the fake voice and create something stupid or something not ethical and publish it there.\u201d<\/p>\n<p>Even if the very thorny issue of voice deepfakes and nonconsensual AI clones is addressed, it\u2019s still unclear whether people will accept a computerized clone as an acceptable stand-in for a human.\u00a0<\/p>\n<p>At the end of March, the comedian Drew Carey used another voice AI service, <a data-offer-url=\"https:\/\/beta.elevenlabs.io\/\" href=\"https:\/\/beta.elevenlabs.io\/\" rel=\"nofollow noopener\" target=\"_blank\">ElevenLabs<\/a>, to release a whole episode of a radio show that was read by his voice clone. For the most part, people\u00a0<a href=\"https:\/\/www.engadget.com\/drew-carey-made-a-radio-show-with-ai-fans-werent-pleased-143014038.html\">hated it<\/a>. Podcasting is an intimate medium, and the distinct human connection you feel when listening to people have a conversation or tell stories is easily lost when the robots step to the microphone.<\/p>\n<p>But what happens when the technology advances to the point that you can\u2019t tell the difference? Does it matter that it\u2019s not really your favorite podcaster in your ear? Cloned AI speech has a ways to go before it\u2019s indistinguishable from human speech, but it\u2019s surely catching up quickly. Just a year ago, AI-generated images looked cartoonish, and now they\u2019re realistic enough to fool millions into thinking the Pope had some\u00a0<a href=\"https:\/\/www.wired.com\/story\/pope-coat-artificial-intelligence-internet-trust\/\">kick-ass new outerwear<\/a>. It\u2019s easy to imagine AI-generated audio will have a similar trajectory.<\/p>\n<p>There\u2019s also another very human trait driving interest in these AI-powered tools: laziness. AI voice tech\u2014assuming it gets to the point where it can accurately mimic real voices\u2014will make it easy to do quick edits or retakes without having to get the host back into a studio.<\/p>\n<p>\u201cUltimately, the creator economy is going to win,\u201d Balasubramaniyan says. \u201cNo matter how much we think about the ethical implications, it\u2019s going to win out because you\u2019ve just made people\u2019s lives simple.\u201d<\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.wired.com\/story\/ai-podcasts-podcastle-revoice-descript\/\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Boone Ashworth<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One day this year, you\u2019ll start listening to a podcast and realize something is a little off. The host, whose voice is familiar to you, will sound different. Sentences may be stilted or some words will have an odd tone. And so you\u2019ll ask, Is this actually the host talking or their AI voice clone?Just<\/p>\n","protected":false},"author":1,"featured_media":628311,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28832,528,46],"tags":[],"class_list":{"0":"post-628310","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-clone","8":"category-favorite","9":"category-technology"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/628310","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=628310"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/628310\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/628311"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=628310"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=628310"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=628310"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}