
Restoring Voices—and Identity—with Neuroengineering
Decoding facial and muscle signals to restore authentic speech
Lee Miller vividly recalls the day in 2021 when he met a woman who had lost the function of her vocal cords. In hoarse, whispering tones she explained how her voice had been instrumental to her vocation. Losing it, she said, undercut her life’s purpose. He had to listen carefully to hear her faint words, but the lesson “was really powerful.”
"Our voice is so important to our sense of identify and empowerment,” said Miller, who is a professor of neurobiology, physiology, and behavior in the College of Biological Sciences, a professor of otolaryngology and head and neck surgery at the School of Medicine and Technical Director at the Center for Mind and Brain.
Now, Miller is working to restore original voices to those who have lost them – based partly on adapting technology for interpreting gestures and controlling robotic limbs.
Every year, nearly one million people worldwide are diagnosed with head and neck cancer. Many of them lose their ability to speak intelligibly due to surgical removal of or radiation damage to the larynx, mouth, and tongue. These people can learn to speak again, using devices that emit artificial sounds, which they can shape into words. But their new voices are often weak, mechanical, or distressingly unfamiliar.
Miller and his collaborators are developing a system that could one day restore a person’s unique, original voice.
Decoding the mind
Miller has spent 25 years studying how biological signals, such as the sound of a voice, travel from person to person and from one brain area to another.
Simply perceiving another person’s speech in a crowded room is surprisingly challenging.
“The brain has to focus on the needle in the haystack,” he said. It must screen out a mountain of irrelevant signals, such as background noise, music, echoes, and other peoples’ voices.
If engineers can learn to isolate that tiny relevant signal from all of the noise, as the brain does, they could accomplish some amazing things, said Miller.
He is working on a project to record electromyographic (EMG) signals on a person’s skin, which are generated by the muscle contractions created by reaching out or clenching a fist, and decode them into digital instructions that can be used to control a robotic arm. Doing this might one day allow astronauts to repair equipment on the outside of a space station, without undertaking a potentially risky spacewalk.
Miller has worked with the company Meta to use EMG signals to recognize and interpret a person’s gestures, so they can interact with computers using natural body language — rather than a clunky mouse and keyboard.

The difficulty is that EMG signals often vary from person to person, depending on their age, skin characteristics, body weight, and other factors. These biological signals also produce mountains of data per second — which computers will need to be able to process, and quickly.
“We have only a limited amount of time,” said Miller, “perhaps only 50 milliseconds, before the computer causes a delay, which would make real-time interpretations impossible.”
Miller and his Ph.D. student, Harsha Gowda (in the Electrical and Computer Engineering Graduate Group), solved this by using only tiny bits of the incoming signals while ignoring everything else. Rather than tracing the chaotic ups and downs of each EMG electrode on a person’s arm, Gowda employed a strategy that simply measures signal relationships between various pairs of electrodes.
These simplified signal representations turn out to be “very well-behaved,” said Miller. “You can actually see the different gestures by just glancing at the data.” And unlike the noisy signals from individual electrodes, they don’t vary from one person to another. “So now we have a gesture decoder that works for everybody,” he said, regardless of differing body types, skin types and ages.
Restoring voice
Miller became interested in applying these lessons to speech during a visit in 2021 with Peter Belafsky, a professor of otolaryngology at the School of Medicine and director of the UC Davis Center for Voice and Swallowing. It was at Belafsky’s clinic that he met the woman whose voice had been part of her vocation and others who had lost their voices. Hearing their stories “was profoundly motivating,” said Miller.
Miller embarked on the Silent Speech project in 2022, collaborating with Belafsky, as well as Sergey Stavisky, an assistant professor of neurological surgery, and David Brandman, a professor of neurosurgery at the School of Medicine.
Miller and Gowda began the project by working with healthy volunteers, using EMG electrodes to record the movements of their mouth and face muscles during speech. Then, they used these simplified EMG signals with speech that was simultaneously recorded to train a computer to match different EMG patterns with different speech sounds for each person. The result is tailored, computer-generated speech that is created using the unique tones of the person’s voice.
“We don’t need that much data to clone the person’s voice,” said Miller. In his experience, it requires only about five minutes of speech combined with that person’s EMG signals.
With the support of a UC Davis STAIR (Science Translation and Innovative Research) grant, Miller and his colleagues have now embarked on the next step, trying to use this system to restore the voices of people who no longer have functional larynxes. For those individuals, it is no longer possible to record natural voices, so Miller and his colleagues try to stitch together meaningful samples from other sources like family videos. One patient had recorded an audio diary to capture a record of his voice in the weeks before his larynx was surgically removed.
“It was a very personal choice that he made, preserving a memento of his voice that he knew he was about to lose,” said Miller. “It was very special that he shared those recordings with us.” They turned out to be a perfect trove of raw material for digitally recreating his voice. Miller’s team is now pairing those recordings with EMG and video of the man’s face, which they recorded as he spoke the same words silently, without his larynx. Here is an example of restored “silent speech” from an individual who no longer has a voice due to laryngectomy.
Miller envisions that this system might one day run on a smartphone. The person would move their mouth to speak silently into their phone as though doing a video call. The phone would simultaneously record EMG signals and video of their face – combining these with a sample of the person’s voice to create natural-sounding speech.
Engineering this system so that it works outside of the laboratory for a wide array of people could take several years, said Miller. Even so, he said, “Ultimately, we want this to work easily for anybody.”
Seed funding for this project was provided by the generous support of the Mike and Renee Child Family Fund for the Center for Mind and Brain in fall of 2022. Subsequent funding was obtained through the Center for Information Technology Research in the Interest of Society (CITRIS) and the Banatao Institute, and a close collaboration with technology consultancy Accenture, led by Adolfo Ramirez-Aristizabal in Accenture Labs San Francisco.
Media Resources
- Douglas Fox is a freelance science writer based in the Bay Area.
- Miller Lab
- Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography (ARXIV 2024)
- Topology of surface electromyogram signals: hand gesture decoding on Riemannian manifolds(Journal of Neural Engineering 2024)