On November 3rd 2016, at its Adobe Max Conference, the company announced that it had created an application that can reproduce human speech. The project code named, VoCo, is what Adobe is calling Photoshop for audio. VoCo allows a user to manipulate a piece of audio simply by editing lines of text. Words can be rearranged to change sentences or inserted where they were never spoken.
Voco works by sampling a large amount of voice data, which takes about 20 minutes. It then breaks down that data into "phonemes." Phonemes are the distinct sounds that make up a particular spoken language. Next, it voice models the speaker, using cadence, stresses, quirks and other artifacts. Then, to reproduce speech, VoCo either finds the word in the 20 minute sample or uses "phonemes" from the raw data to build it.
While users can make edits entirely from the raw data, it is suggested that copying and pasting the existing words sounds more realistic. Right now, the VoCo technology is only a prototype. However, Adobe is considering "watermarking and detection," to prevent fraudulent use. Adobe intends for VoCo to be used to make edits of podcasts, commercial voice overs, and/or audio book recordings.
No comments:
Post a Comment