Engineers translate brain signals directly into speech
Advance marks critical step toward brain-computer interfaces that hold immense promise for those with limited or no ability to speak
Date:
January 29, 2019
Source:
The Zuckerman Institute at Columbia University
Summary:
In a scientific first, neuroengineers have created a system that translates thought into intelligible, recognizable speech. This breakthrough, which harnesses the power of speech synthesizers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain.
In a scientific first, Columbia neuroengineers have created a system that translates thought into intelligible, recognizable speech. By monitoring someone's brain activity, the technology can reconstruct the words a person hears with unprecedented clarity. This breakthrough, which harnesses the power of speech synthesizers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain. It also lays the groundwork for helping people who cannot speak, such as those living with as amyotrophic lateral sclerosis (ALS) or recovering from stroke, regain their ability to communicate with the outside world.
These findings were published today in Scientific Reports.
"Our voices help connect us to our friends, family and the world around us, which is why losing the power of one's voice due to injury or disease is so devastating," said Nima Mesgarani, PhD, the paper's senior author and a principal investigator at Columbia University's Mortimer B. Zuckerman Mind Brain Behavior Institute. "With today's study, we have a potential way to restore that power. We've shown that, with the right technology, these people's thoughts could be decoded and understood by any listener."
Decades of research has shown that when people speak -- or even imagine speaking -- telltale patterns of activity appear in their brain. Distinct (but recognizable) pattern of signals also emerge when we listen to someone speak, or imagine listening. Experts, trying to record and decode these patterns, see a future in which thoughts need not remain hidden inside the brain -- but instead could be translated into verbal speech at will.
But accomplishing this feat has proven challenging. Early efforts to decode brain signals by Dr. Mesgarani and others focused on simple computer models that analyzed spectrograms, which are visual representations of sound frequencies.
But because this approach has failed to produce anything resembling intelligible speech, Dr. Mesgarani's team turned instead to a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.
"This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions," said Dr. Mesgarani, who is also an associate professor of electrical engineering at Columbia's Fu Foundation School of Engineering and Applied Science.
To teach the vocoder to interpret to brain activity, Dr. Mesgarani teamed up with Ashesh Dinesh Mehta, MD, PhD, a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute and co-author of today's paper. Dr. Mehta treats epilepsy patients, some of whom must undergo regular surgeries.
"Working with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity," said Dr. Mesgarani. "These neural patterns trained the vocoder."
Next, the researchers asked those same patients to listen to speakers reciting digits between 0 to 9, while recording brain signals that could then be run through the vocoder. The sound produced by the vocoder in response to those signals was analyzed and cleaned up by neural networks, a type of artificial intelligence that mimics the structure of neurons in the biological brain.
The end result was a robotic-sounding voice reciting a sequence of numbers. To test the accuracy of the recording, Dr. Mesgarani and his team tasked individuals to listen to the recording and report what they heard.
"We found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts," said Dr. Mesgarani. The improvement in intelligibility was especially evident when comparing the new recordings to the earlier, spectrogram-based attempts. "The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy."
Dr. Mesgarani and his team plan to test more complicated words and sentences next, and they want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer's thoughts directly into words.
"In this scenario, if the wearer thinks 'I need a glass of water,' our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech," said Dr. Mesgarani. "This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them."
Hassan Akbari, Bahar Khalighinejad, Jose L. Herrero, Ashesh D. Mehta, Nima Mesgarani. Towards reconstructing intelligible speech from the human auditory cortex. Scientific Reports, 2019; 9 (1) DOI: 10.1038/s41598-018-37359-z
The Zuckerman Institute at Columbia University. "Engineers translate brain signals directly into speech: Advance marks critical step toward brain-computer interfaces that hold immense promise for those with limited or no ability to speak." ScienceDaily. ScienceDaily, 29 January 2019. .
Hundreds of vets have tried out an experimental new treatment that could change how the world addresses mental disorders.
Tony didn’t know what to expect when he walked into the Brain Treatment Center in San Diego, California, last spring. The former Navy SEAL only knew that he needed help. His service in Iraq and Afghanistan was taking a heavy toll on his mental and physical wellbeing. He had trouble concentrating, remembering, and was given to explosive bursts of anger. “If somebody cut me off driving, I was ready to kill ’em at the drop of a hat,” he said. And after he got into a fistfight on the side of a California road, his son looking on from the car, he decided he was willing to try anything — even an experimental therapy that created an electromagnetic field around his brain.
What Tony and several other former U.S. Special Operations Forces personnel received Newport Brain Research Laboratory, located at the Center, was a new treatment for brain disorders, one that might just revolutionize brain-based medicine. Though the FDA clinical trials to judge its efficacy and risks are ongoing, the technique could help humanity deal with a constellation of its most common mental disorders — depression, anxiety, aggressiveness, attention deficit, and others—and do so without drugs. And if its underpinning theory proves correct, it could be among the biggest breakthroughs in the treatment of mental health since the invention of the EEG a century ago.
At the lab, Tony (whose name has been changed to protect his identity) met Dr. Erik Won, president and CEO of the Newport Brain Research Laboratory, the company that’s innovating Magnetic EEG/ECG-guided Resonant Therapy, or MeRT. Won’s team strapped cardiac sensors on Tony and placed an electroencephalography cap on his skull to measure his brain’s baseline electrical activity. Then came the actual therapy. Placing a flashlight-sized device by Tony’s skull, they induced an electromagnetic field that senta small burst of current to his brain. Over the course of 20 minutes, they moved the device around his cranium, delivering jolts that, at their most aggressive, felt like a firm finger tapping.
For Tony, MeRT’s effects were obvious and immediate. He walked out of the first session to a world made new. “Everything looked different,” he told me. “My bike looked super shiny.”
He began to receive MeRT five times a week— each session lasting about an hour, with waiting room time — and quickly noticed a change in his energy. “I was super boosted,” he said. His mood changed as well.
Today, he admits that he still has moments of frustration but says that anger is no longer his “go-to emotion.” He’s developed the ability to cope. He still wants help with his memory, but his life is very different. He’s taken up abstract painting and welding, two hobbies he had no interest in at all before the therapy. He’s put in a new kitchen. Most importantly, his sleep is very different: better.
Tony’s experience was similar to those of five other special-operations veterans who spoke with Defense One. All took part in a double-blind randomized clinical trial that sought to determine how well MeRT treats Persistent Post-Concussion Symptoms and Post-Traumatic Stress Disorder, or PTSD. Five out of the six were former Navy SEALS.
In many ways, SEALS represent the perfect test group for experimental brain treatment. They enter the service in superb health and then embark on a course of training that heightens mental and physical strength and alertness. Then comes their actual jobs, which involve a lot of “breaching”: getting into a place that the enemy is trying to keep you out of. It could be a compound in Abbottabad, Pakistan—or every single door in that compound. Breaching is so central to SEAL work that it’s earned them the nickname “door kickers.” But it often involves not so much kicking as explosives at closer-than-comfortable range. “I got blown up a lot in training,” says Tony, and a lot afterwards as well. Put those two factors together and you have a population with a high functioning baseline but with a lot of incidents of persistent post-concussive syndrome, often on top of heavy combat-related PTSD and other forms of trauma.
One by one, these former SEALs found their way to Won’s lab. One — let’s call him Bill — sought to cure his debilitating headaches. Another, Ted, a SEAL trainer, had no severe symptoms but wanted to see whether the therapy could improve his natural physical state and performance. A fourth, Jim, also a former SEAL, suffered from severe inability to concentrate, memory problems, and low affect, which was destroying his work performance. “I was forcing myself to act normal,” Jim said. “I didn’t feel like I was good at anything.”
Yet another, a former member of the Air Force Security Forces named Cathy, had encountered blasts and a “constant sound of gunfire” during her deployments to Iraq and Afghanistan. She suffered from memory problems, depression, anger, bouts of confusion, and migraines so severe she had to build a darkroom in her house.
Like Cathy, the rest had difficulty sleeping. Even Ted, who had no severe PTSD-related problems, reported that he “slept like crap,” before the treatment began.
All said that they saw big improvements after a course of therapy that ran five days a week for about four weeks. Bill reported that his headaches were gone, as did Cathy, who said her depression and mood disorders had lessened considerably. Jim’s memory and concentration improved so dramatically that he had begun pursuing a second master’s degree and won a spot on his college’s football team. Ted said he was feeling “20 years younger” physically and found himself better able to keep pace with the younger SEALS he was training. All of it, they say, was a result of small, precisely delivered, pops of electricity to the brain. Jim said the lab had also successfully treated back and limb pain by targeting the peripheral nervous system with the same technique.
Inside the Brain Treatment Center in San Diego, the location of the Newport Brain Research Lab, a wall displays paintings of patients who have undergone MeRT therapy, the tone, mood, and control in the paintings evolves as the patient continues through the treatment.
ADVERTISEMENT
Won, a former U.S. Navy Flight Surgeon, and his team have treated more than 650 veterans using MeRT. The walls of the lab are adorned with acrylic paintings from veterans who have sought treatment. The colors, themes, and objects in the paintings evolve, becoming brighter, more optimistic, some displaying greater motor control, as the painter progresses through the therapy.
The lab is about one-third of the way through a double-blind clinical trial that may lead to FDA approval, and so Won was guarded in what he could say about the results of their internal studies. But he said that his team had conducted a separate randomized trial on 86 veterans. After two weeks, 40 percent saw changes in their symptoms; after four weeks, 60 did, he said.
“It’s certainly not a panacea,” said Won. “There are people with residual symptoms, people that struggle…I would say the responses are across the board. Some sleep better. Some would say, very transformative.” (Won doesn’t even categorize the treatment as “curing,” as that has a very specific meaning in neurology and mental health, so much as “helping to treat.”)
Won believes the question might ultimately be not “Does MeRT work?” but “What therapies can it replace?”
“I think, in the future, there will be a discussion about whether this should be first-line management. What can we do to address the functional issues at play? There’s a whole lot of science to do before we get there,” he said.
Your Brain is Electric
The idea that electricity, properly administered, could treat illness goes back to 1743 when a German physician named Johann Gottlob Kruger of the University of Halle successfully treated a harpsichordist with arthritis via electrical stimulation to the hand. John Wesley, the father of Methodism, also experimented with electricity as a therapeutic and declared it “The nearest an Universal medicine of any yet known in the world.”
But the idea remained mostly an idea with no real science to back it up, until the 20th century.
Enter Hans Berger, a German scientist who wanted to show that human beings were capable of telepathy via an unseen force he referred to as “psychic energy.” He believed this energy derived from an invisible relationship between blood flow, metabolism, emotion, and the sensation of pain and thought that if he could find physical evidence that psychic energy existed, perhaps humanity could learn to control it.
To test his theory, he needed a way to record the brain’s electrical activity.In 1924, he applied a galvanometer a tool originally built to measure the heart’s electrical activity, to the skull of a young brain-surgery patient. The galvanometer was essentially a string of silver-coated quartz filament flanked by magnets. The filament would move as it encountered electromagnetic activity, which could be graphed. Berger discovered that the brain produced electrical oscillations at varying strengths. He dubbed the larger ones, of 8 to 12 Hz, the alpha waves, the smaller ones beta waves, and named the graphing of these waves an electroencephalogram, or EEG.
Berger’s telepathy theories never panned out, but the EEGbecame a healthcare staple, used to detect abnormal brain activity, predict potential seizures, and more.
ADVERTISEMENT
The separate notion that electricity could be used to treat mental disorder entered wide medical practice with the invention of electroconvulsive therapy, or ECT, in Italy in the 1930s. ECT — more commonly called shock therapy — used electricity to induce a seizure in the patient. Its use spread rapidly across psychiatry as it seemed to not only meliorate depression but also to temporarily pacify patients who suffered from psychosis and other disorders. Before long, doctors in mental institutions were prescribing it commonly to subdue troublesome patiets and even as a “cure” for homosexuality. The practice soon became associated with institutional cruelty.
In the 1990s, a handful of researchers, independent of another, realized that electricity at much lower voltages could be used to help with motor function in Parkinson’s patients and as an aid for depression. But there was a big difference between their work and that of earlier practitioners of ECT: they used magnetic fields rather than jolts of electricity. This allowed them to activate brain regions without sending high currents through the skull. Seizures, it seemed, weren’t necessary.
In 2008, researchers began to experiment with what was then called transcranial magnetic stimulation to treat PTSD. Since then, it’s been approved as a treatment for depression. Won and his colleagues don’t use it in the same way that doctors do when they’re looking for something simple and easy to spot, like potential signs of a seizure or head trauma. Won uses EEG/ECGbiometrics to find the subject’s baseline frequency, essentially the “normal” state to return her or him to, and also to precisely target the areas of the brain that will respond to stimulation in the right way.
YOU Have a Signature. Your Signature is YOU
No two people experience mental health disorders in the same way. Some PTSD sufferers have memory problems; others, depression; still others, uncontrollable anger. But people that are diagnosed with depression are more likely to suffer from another, separate mental health issue, such as anxiety, attention deficit, or something else.
A data visualization of brain electrical activity mapped via EEG. Courtesy of the Newport Brain Research Lab
The theory that underpins MeRT posits that many of these problems share a common origin: a person’s brain has lost the beat of its natural information-processing rhythm, what Won calls the “dominant frequency.”
Your dominant frequency is how many times per second your brain pulses alpha waves. “We’re all somewhere between 8 and 13 hertz. What that means is that we encode information 8 to 13 times per second. You’re born with a signature. There are pros and cons to all of those. If you’re a slower thinker, you might be more creative. If you’re faster, you might be a better athlete,” Won says.
Navy SEALS tend to have higher-than-average dominant frequencis, around 11 or 13 Hz. But physical and emotional trauma can disrupt that, causing the back of the brain and the front of the brain to emit electricity at different rates. The result: lopsided brain activity. MeRT seeks to detect arrhythmia, find out which regions are causing it, and nudge the off-kilter ones back onto the beat.
“Let’s just say in the left dorsal lateral prefrontal cortex, towards the front left side of the brain, if that’s cycling at 2 hertz, where we are 3 or 4 standard deviations below normal, you can pretty comfortably point to that and say that these neurons aren’t firing correctly. If we target that area and say, ‘We are going to nudge that area back to, say, 11 hertz,’ some of those symptoms may improve,” says Won. “In the converse scenario, in the right occipital parietal lobe where, if you’ve taken a hit, you may be cycling too fast. Let’s say it’s 30 hertz. You’re taking in too much information, oversampling your environment. And if you’re only able to process it using executive function 11 times per second, that information overload might manifest as anxiety.”
If the theory behind MeRT is true, it could explain, at least partially, why a person may suffer from many mental-health symptoms: anxiety, depression, attention deficits, etc. The pharmaceutical industry treats them with separate drugs, but they all may have a similar cause, and thus be treatable with one treatment. That, anyway, is what Won’s preliminary results are suggesting.
ADVERTISEMENT
“You don’t see these type of outcomes with psychopharma or these other types of modalities, so it was pretty exciting,” he said.
There are lots of transcranial direct stimulation therapies out there, with few results to boast of. What distinguishes MeRT from other attempts to treat mental disorders with electrical fields is the use of EEG as a guide. It’s the difference between trying to fix something with the aid of a manual versus just winging it.
If the clinical trials bear out and the FDA approves of MeRT as an effective treatment for concussion and/or PTSD, many more people will try it. The dataset will grow, furthering the science. If that happens, the world will soon know whether or not there is a better therapeutic for mood and sleep disorders than drugs; and a huge portion of the pharmaceutical industry will wake up to earth-changing news.
But there’s more. Won believes that MeRT may have uses for nominally healthy brains, such as improving attention, memory, and reaction time, as Ted discovered. It’s like the eyesight thing, the sudden, stark visual clarity. “These were unexpected findings, but we’re hearing it enough that we want to do more studies.”
Performance enhancement is “not something that we’re ardently chasing,” says Won. ”Our core team is about saving lives. But so many of our veterans are coming back asking.”
Already, there’s evidence to suggest that it could work. “What we’ve noticed in computerized neuro-psych batteries is that reaction times improve. Complex cognitive processing tasks can improve both in terms of speed to decision and the number of times you are right versus wrong. Those are all things we want to quantify and measure with good science,” he says.
What is one person’s therapy, in the years ahead, could be another person’s mental health regimen. Signs of that future are already here. Like so many strange portents, their origin is the tech field.
More and more high-level executives, including at technology companies, are seeking him out, asking to be strapped in and zapped for a few weeks. “That’s been a recent evolution. There’s a company” — he declined to name it — “a lot of programmers, engineers, etc. … One of their C-suite members got treatment and was so blown away that they want all of their key team members to get it…They’re ruthlessly competitive…They want an edge.”
Mark Zuckerberg and his pediatrician wife Priscilla Chan have sold close to 30 million shares of Facebook to fund an ambitious biomedical research project, called the Chan Zuckerberg Initiative (CZI), with a goal of curing all disease within a generation. A less publicized component of that US$5 billion program includes work on brain-machine interfaces, devices that essentially translate thoughts into commands. From a report:One recent project is a wireless brain implant that can record, stimulate and disrupt the movement of a monkey in real time. In a paper published in the highly cited scientific journal Nature on Monday, researchers detail a wireless brain device implanted in a primate that records, stimulates, and modifies its brain activity in real time, sensing a normal movement and stopping it immediately. Those researchers are part of the Chan Zuckerberg Biohub, a non-profit medical research group within the CZI. Scientists refer to the interference as "therapy" because it is designed to be used to treat diseases like epilepsy or Parkinson's by stopping a seizure or other disruptive motion just as it starts.
"Our device is able to monitor the primate's brain while it's providing the therapy so you know exactly what's happening," Rikky Muller, a co-author of the new study, told Business Insider. A professor of computer science and engineering at the University of California, Berkeley, Muller is also a Biohub investigator. The applications of brain-machine interfaces are far-reaching: while some researchers focus on using them to help assist people with spinal cord injuries or other illnesses that affect movement, others aim to see them transform how everyone interacts with laptops and smartphones. Both a division at Facebook formerly called Building 8 as well as an Elon Musk-founded company called Neuralink have said they are working on the latter.
My publication at Siggraph in '98 proposed conversion of
brainwave algorithms to create imagery and sound, plus AI to drive the imagery for guiding VR subject
towards a target brainwave state for deepening immersion into virtual
environments.
I've recently been writing ideas surrounding
creation of an image-based language to introduce a new communication
paradigm.
1. Matching mind images to an image/ video bank
2. Mapping image/impression-based communication forms which can be triggered by voice command
Highlights from the article (below):
Her vision is broad and sweeping: it runs from a new generation of
extremely high-resolution, affordable MRI machines for early detection
of cancer, heart disease, and more, to a far-out time (or maybe not so
far-out) when machines can read people’s minds and people can
communicate—with each other and maybe even with animals—via thoughts.
The idea “leverages the tools of our times,” Jepsen says, citing
advances in everything from physics to optoelectronics to consumer
electronics to big data and A.I. that can be combined to shrink the
size, improve the functionality, and lower the cost of MRI. “I could no
longer wait. I’m still writing up the patents. But I am incredibly
excited to strike off on this direction,” she says.“My big bet is we can use that manufacturing infrastructure to create
the functionality of a $5 million MRI machine in a consumer electronics
price-point wearable. And the implications of that are so big.” She
says every doctor’s office in the world could afford these wearable
devices and use them for early detection of neurodegenerative disease,
cancer, cardiovascular disease, internal bleeding, blood clots, and
more.
I had long planned a phone call with Mary Lou Jepsen for this
afternoon—a prep session for a chat I will be doing with her a week from
Monday night at Xconomy’s Napa Summit, where she is the featured dinner speaker. It was to be a normal prep chat until I got to work this morning and learned that CNET, Engadget, and Tech Insider
had all reported that the technology visionary was planning to leave
her post as executive director of engineering for Facebook and Oculus,
to focus on a new startup. It turned out she had talked about her plans
last night during a keynote speech at the Women of Vision Awards banquet
in Santa Clara, CA—and the media outlets had all seized on the news.
“I was actually really surprised anybody picked that up,” Jepsen told
me (showing she doesn’t fully understand what a big deal she is). So I
took advantage of the call to ask her more. Some of our talk was off the
record, but much of it was on the record, including quite a bit about
her new plans and the thinking behind them.
Her vision is broad and sweeping: it runs from a new generation of
extremely high-resolution, affordable MRI machines for early detection
of cancer, heart disease, and more, to a far-out time (or maybe not so
far-out) when machines can read people’s minds and people can
communicate—with each other and maybe even with animals—via thoughts.
The idea “leverages the tools of our times,” Jepsen says, citing
advances in everything from physics to optoelectronics to consumer
electronics to big data and A.I. that can be combined to shrink the
size, improve the functionality, and lower the cost of MRI. “I could no
longer wait. I’m still writing up the patents. But I am incredibly
excited to strike off on this direction,” she says.
The startup, whose name has not previously been released as far as I
can tell, is called Open Water (it could also be OpenWater, “not sure
yet…either is OK for now,” she says). “Peter Gabriel gave me the name.
He is a great advisor,” Jepsen says. In particular, she was inspired by
this article he wrote for Edge.org, called Open Water–The Internet of Visible Thought, in which he credited Jepsen for introducing him “to the potential of brain reading devices.”
Jepsen says she can’t talk about funding and more specific plans for
Open Water yet, and that she will remain at Facebook until August. But
here are some highlights of what she could say:
“What I try to do is make things that everybody knows are utterly,
completely impossible—I try to make them possible,” Jepsen sums up. She
does that by leveraging what she calls her “strange background” that
encompasses physics, computer science, media technology, art, electrical
engineering, and more. “That all comes together for me.” Indeed, you
can find more in this companion piece on that background,
which includes stints at Google X, One Laptop per Child (which she
co-founded), the MIT Media Lab, Intel, her own startups, and more.
In the case of Open Water, part of her motivation is her own health.
“I’m a brain tumor survivor,” she says. She had surgery to remove a
brain tumor in 1995, and since then has taken pills “twice a day every
day for the last 21 years to stay alive.” That has led her to read a lot
on the side about neuroscience—and think about how to advance the
field.
Part of the idea behind Open Water involves taking things at “the
hairy edge of what physics can do,” Jepsen says, and then “using my
substantial capability in consumer electronics” to make them possible at
consumer electronics price points. She says there is a huge potential
in the manufacturing plants in Asia that are primarily used to make
OLEDs, LCDs, and such. Jepsen adds that these consumer electronics
manufacturers have been mostly focused on smartphones for the past
decade or so. But, she says, we’ve reached saturation in mobile phones,
and sales are declining. “What I see,” she says, are “the subcomponent
makers being really hungry for what the new, new thing is.”
“My big bet is we can use that manufacturing infrastructure to create
the functionality of a $5 million MRI machine in a consumer electronics
price-point wearable. And the implications of that are so big.” She
says every doctor’s office in the world could afford these wearable
devices and use them for early detection of neurodegenerative disease,
cancer, cardiovascular disease, internal bleeding, blood clots, and
more.
“It’s such a big idea, it’s what I wanted to do for a decade. It’s
why I went to MIT [Media Lab]. It’s why I went to Google,” she says. “It
turned out that Google really needed me to do some other stuff that was
way more important to Google at the time. I’ve been incubating this
since 2005…and I clearly see how to do it and how to realize it in a few
short years.”
One factor in advancing her idea was work published about five years
ago by a group led by Jack Gallant at U.C. Berkeley, Jepsen says. The
research group used a functional magnetic resonance imaging scanner to
track blood flow and oxygen flow and image the brains of people shown
hundreds of hours of videos. You can read more about it here,
but the main point Jepsen stressed to me was that the work (and
subsequent work) has produced a library or database of sorts of how
brains react to different images. A computer using artificial
intelligence can then use such a database to basically look at MRI brain
images in real time and interpret what people are thinking about or
reacting to. This ability has been demonstrated at dozens of labs to
gauge the brain’s reactions to words, music, math equations, and more,
she says. But the resolution is poor and the process is expensive,
requiring people to lie still in big chambers inside a huge magnet.
“I was really struck by that, so I started thinking this is great,
but we need to up the resolution,” she says. “It’s in my head, I’ve got
this plan. I’ve got these inventions that I’m working on, and my next
step is to let myself pursue it full time.”
It is easy to see the power of these ideas to help make MRI far more
affordable and accessible. But for Jepsen, that is just Phase One. She
talks about the ability to image human thoughts in new ways, for
instance, by helping stroke sufferers who can’t talk find a new way to
communicate via their thoughts. Or for amputees to harness their
thoughts to move prosthetics more naturally.
And then she goes a step or two farther. “Can you imagine a movie
director waking up with an image of a new scene in her head, and just
being able to dump her dream” into a computer, she says. ”It could be so
much more efficient than the way we do it now.” For musicians, she
muses, this could be “a way to get the music out of your head.”
But that’s not all. “Maybe we can communicate with animals, maybe we
can scan animal brains and see what images they are thinking of,” Jepsen
says. “So little is known. Dolphins are supposed to be really
smart—maybe we can collaborate with them.”
It all sounds pretty far-out, I know, and she says so, too. But given
how long Jepsen has had these ideas in her head—and how much work has
been done in brain-machine interfaces—perhaps the world is finally ready
to receive her thoughts.
Reconstructing visual experiences from brain activity evoked by natural movies
Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu & Jack L. Gallant (Current Biology 2011, PDF1.4M).
Quantitative modeling of human brain activity can provide crucial
insights about cortical representations and can form the basis for brain
decoding devices. Recent functional magnetic resonance imaging (fMRI)
studies have modeled brain activity elicited by static visual patterns
and have reconstructed these patterns from brain activity. However,
blood oxygen level-dependent (BOLD) signals measured via fMRI are very
slow, so it has been difficult to model brain activity elicited by
dynamic stimuli such as natural movies. Here we present a new
motion-energy encoding model that largely overcomes this limitation. The
model describes fast visual information and slow hemodynamics by
separate components. We recorded BOLD signals in occipitotemporal visual
cortex of human subjects who watched natural movies and fit the model
separately to individual voxels. Visualization of the fit models reveals
how early visual areas represent the information in movies. To
demonstrate the power of our approach, we also constructed a Bayesian
decoder by combining estimated encoding models with a sampled natural
movie prior. The decoder provides remarkable reconstructions of the
viewed movies. These results demonstrate that dynamic brain activity
measured under naturalistic conditions can be decoded using current fMRI
technology.
Frequently asked questions about this work
Could you give a simple outline of the experiment?
The goal of the experiment was to design a process for decoding
dynamic natural visual experiences from human visual cortex. More
specifically, we sought to use brain activity measurements to
reconstruct natural movies seen by an observer. First, we used
functional magnetic resonance imaging (fMRI) to measure brain activity
in visual cortex as a person looked at several hours of movies. We then
used these data to develop computational models that could predict the
pattern of brain activity that would be elicited by any arbitrary movies
(i.e., movies that were not in the initial set used to build the
model). Next, we used fMRI to measure brain activity elicited by a
second set of movies that were completely distinct from the first set.
Finally, we used the computational models to process the elicited brain
activity, in order to reconstruct the movies in the second set of
movies. This is the first demonstration that dynamic natural visual
experiences can be recovered from very slow brain activity recorded by
fMRI.
Can you give an intuitive explanation of movie reconstruction?
As you move through the world or you watch a movie, a dynamic,
ever-changing pattern of activity is evoked in the brain. The goal of
movie reconstruction is to use the evoked activity to recreate the movie
you observed. To do this, we create encoding models that describe how
movies are transformed into brain activity, and then we use those models
to decode brain activity and reconstruct the stimulus.
Can you explain the encoding model and how it was fit to the data?
To understand our encoding model, it is most useful to think of the
process of perception as one of filtering the visual input in order to
extract useful information. The human visual cortex consist of billions
of neurons. Each neuron can be viewed as a filter that takes a visual
stimulus as input, and produces a spiking response as output. In early
visual cortex these neural filters are selective for simple features
such as spatial position, motion direction and speed. Our motion-energy
encoding model describes this filtering process. Currently the best
method for measuring human brain activity is fMRI. However, fMRI does
not measure neural activity directly, but rather measures hemodynamic
changes (i.e. changes in blood flow, blood volume and blood oxygenation)
that are caused by neural activity. These hemodynamic changes take
place over seconds, so they are much slower than the changes that can
occur in natural movies (or in the individual neurons that filter those
movies). Thus, it has previously been thought impossible to decode
dynamic information from brain activtiy recorded by fMRI. To overcome
this fundamental limitation we use a two stage encoding model. The first
stage consists of a large collection of motion-energy filters that span
a range of positions, motion directions and speeds as the underlying
neurons. This stage models the fast responses in the early visual
system. The output from the first stage of the model is fed into a
second stage that describes how neural activity affects hemodynamic
activity in turn. The two stage processing allows us to model the
relationship between the fine temporal information in the movies and the
slow brain activity signals measured using fMRI. Functional MRI records
brain activity from small volumes of brain tissue called voxels (here
each voxel was 2.0 x 2.0 x 2.5 mm). Each voxel represents the pooled
activity of hundreds of thousands of neurons. Therefore, we do not model
each voxel as a single motion-energy filter, but rather as a bank of
thousands of such filters. In practice fitting the encoding model to
each voxel is a straightforward regression problem. First, each movie is
processed by a bank of nonlinear motion-energy filters. Next, a set of
weights is found that optimally map the filtered movie (now represented
as a vector of about 6,000 filter outputs) into measured brain activity.
(Linear summation is assumed in order to simplify fitting.)
How accurate is the decoder?
A good decoder should produce a reconstruction that a neutral
observer judges to be visually similar to the viewed movie. However, it
is difficult to quantify human judgments of visual similarity. In this
paper we use similarity in the motion-energy domain. That is, we
quantify how much of the spatially localized motion information in the
viewed movie was reconstructed. The accuracy of our reconstructions is
far above chance.
Other studies have attempted reconstruction before. How is your study different?
Previous studies showed that it is possible to reconstruct static
visual patterns (Thirion et al., 2006 Neuroimage; Miyawaki et al., 2008
Neuron), static natural images (Naselaris et al., 2009 Neuron) or
handwriting digits (van Gerven et al. 2010 Neural Computation). However,
no previous study has produced reconstructions of dynamic natural
movies. This is a critical step toward obtaining reconstructions of
internal states such as imagery, dreams and so on.
Why is this finding important?
From a basic science perspective, our paper provides the first
quantitative description of dynamic human brain activity during
conditions simulating natural vision. This information will be important
to vision scientists and other neuroscientists. Our study also
represents another important step in the development of brain-reading
technologies that could someday be useful to society. Previous
brain-reading approaches could only decode static information. But most
of our visual experience is dynamic, and these dynamics are often the
most compelling aspect of visual experience. Our results will be crucial
for developing brain-reading technologies that can decode dynamic
experiences.
How many subjects did you run? Is there any chance that they could have cheated?
We ran three subjects for the experiments in this paper, all
co-authors. There are several technical considerations that made it
advantageous to use authors as subjects. It takes several hours to
acquire sufficient data to build an accurate motion-energy encoding
model for each subject, and naive subjects find it difficult to stay
still and alert for this long. Authors are motivated to be good
subjects, to their data are of high quality. These high quality data
enabled us to build detailed and accurate models for each individual
subject. There is no reason to think that the use of authors as subjects
weakens the validity of the study. The experiment focuses solely on the
early part of the visual system, and this part of the brain is not
heavily modulated by intention or prior knowledge. The movies used to
develop encoding models for each subject and those used for decoding
were completely separate, and there no plausible way that a subject
could have changed their own brain activity in order to improve
decoding. Many fMRI studies use much larger groups of subjects, but they
collect much less data on each subject. Such studies tend to average
over a lot of the individual variability in the data, and the results
provide a poor description of brain activity in any individual subject.
What are the limits on brain decoding?
Decoding performance depends on the quality of brain activity
measurements. In this study we used functional MRI (fMRI) to measure
brain activity. (Note that fMRI does not actually measure the activity
of neurons. Instead, it measures blood flow consequent to neural
activity. However, many studies have shown that the blood flow signals
measured using fMRI are generally correlated with neural activity.) fMRI
has relatively modest spatial and temporal resolution, so much of the
information contained in the underlying neural activity is lost when
using this technique. fMRI measurements are also quite variable from
trial-to-trial. Both of these factors limit the amount of information
that can be decoded from fMRI measurements. Decoding also depends
critically on our understanding of how the brain represents information,
because this will determine the quality of the computational model. If
the encoding model is poor (i.e., if it does a poor job of prediction)
then the decoder will be inaccurate. While our computational models of
some cortical visual areas perform well, they do not perform well when
used to decode activity in other parts of the brain. A better
understanding of the processing that occurs in parts of the brain beyond
visual cortex (e.g. parietal cortex, frontal cortex) will be required
before it will be possible to decode other aspects of human experience.
What are the future applications of this technology?
This study was not motivated by a specific application, but was aimed
at developing a computational model of brain activity evoked by dynamic
natural movies. That said, there are many potential applications of
devices that can decode brain activity. In addition to their value as a
basic research tool, brain-reading devices could be used to aid in
diagnosis of diseases (e.g., stroke, dementia); to assess the effects of
therapeutic interventions (drug therapy, stem cell therapy); or as the
computational heart of a neural prosthesis. They could also be used to
build a brain-machine interface.
Could this be used to build a brain-machine interface (BMI)?
Decoding visual content is conceptually related to the work on
neural-motor prostheses being undertaken in many laboratories. The main
goal in the prosthetics work is to build a decoder that can be used to
drive a prosthetic arm or other device from brain activity. Of course
there are some significant differences between sensory and motor systems
that impact the way that a BMI system would be implemented in the two
systems. But ultimately, the statistical frameworks used for decoding in
the sensory and motor domains are very similar. This suggests that a
visual BMI might be feasible.
At some later date when the technology is developed further, will it be possible to decode dreams, memory, and visual imagery?
Neuroscientists generally assume that all mental processes have a
concrete neurobiological basis. Under this assumption, as long as we
have good measurements of brain activity and good computational models
of the brain, it should be possible in principle to decode the visual
content of mental processes like dreams, memory, and imagery. The
computational encoding models in our study provide a functional account
of brain activity evoked by natural movies. It is currently unknown
whether processes like dreaming and imagination are realized in the
brain in a way that is functionally similar to perception. If they are,
then it should be possible to use the techniques developed in this paper
to decode brain activity during dreaming or imagination.
At some later date when the technology is developed further, will it
be possible to use this technology in detective work, court cases,
trials, etc?
The potential use of this technology in the legal system is
questionable. Many psychology studies have now demonstrated that
eyewitness testimony is notoriously unreliable. Witnesses often have
poor memory, but are usually unaware of this. Memory tends to be biased
by intervening events, inadvertent coaching, and rehearsal (prior
recall). Eyewitnesses often confabulate stories to make logical sense of
events that they cannot recall well. These errors are thought to stem
from several factors: poor initial storage of information in memory;
changes to stored memories over time; and faulty recall. Any
brain-reading device that aims to decode stored memories will inevitably
be limited not only by the technology itself, but also by the quality
the stored information. After all, an accurate read-out of a faulty
memory only provides misleading information. Therefore, any future
application of this technology in the legal system will have to be
approached with extreme caution.
Will we be able to use this technology to insert images (or movies) directly into the brain?
Not in the foreseeable future. There is no known technology that
could remotely send signals to the brain in a way that would be
organized enough to elicit a meaningful visual image or thought.
Does this work fit into a larger program of research?
One of the central goals of our research program is to build
computational models of the visual system that accurately predicts brain
activity measured during natural vision. Predictive models are the gold
standard of computational neuroscience and are critical for the
long-term advancement of brain science and medicine. To build a
computational model of some part of the visual system, we treat it as a
“black box” that takes visual stimuli as input and generates brain
activity as output. A model of the black box can be estimated using
statistical tools drawn from classical and Bayesian statistics, and from
machine learning. Note that this reverse-engineering approach is
agnostic about the specific way that brain activity is measured. One
good way to evaluate these encoding models is construct a corresponding
decoding model, and then assess its performance in a specific task such
as movie reconstruction.
Why is it important to construct computational models of the brain?
The brain is an extremely complex organ and many convergent
approaches are required to obtain a full understanding of its structure
and function. One way to think about the problem is to consider three
different general goals of research in systems/computational
neuroscience. (1) The first goal is to understand how the brain is
divided into functionally distinct modules (e.g., for vision, memory,
etc.). (2) The second goal, contingent on the first, is to determine the
function of each module. One classical approach for investigating the
function of a brain circuit is to characterize neural responses at a
quantitative computational level that is abstracted away from many of
the specific anatomical and biophysical details of the system. This
helps make tractable a problem that would otherwise seem overwhelmingly
complex. (3) The third goal, contingent on the first two, is to
understand how these specific computations are implemented in neural
circuitry. A byproduct of this model-based approach is that it has many
specific applications, as described above.
Can you briefly explain the function of the parts of the brain examined here?
The human visual system consists of several dozen distinct cortical
visual areas and sub-cortical nuclei, arranged in a network that is both
hierarchical and parallel. Visual information comes into the eye and is
there transduced into nerve impulses. These are sent on to the lateral
geniculate nucleus and then to primary visual cortex (area V1). Area V1
is the largest single processing module in the human brain. Its function
is to represent visual information in a very general form by
decomposing visual stimuli into spatially localized elements. Signals
leaving V1 are distributed to other visual areas, such as V2 and V3.
Although the function of these higher visual areas is not fully
understood, it is believed that they extract relatively more complicated
information about a scene. For example, area V2 is thought to represent
moderately complex features such as angles and curvature, while
high-level areas are thought to represent very complex patterns such as
faces. The encoding model used in our experiment was designed to
describe the function of early visual areas such as V1 and V2, but was
not meant to describe higher visual areas. As one might expect, the
model does a good job of decoding information in early visual areas but
it does not perform as well in higher areas.
Are there any ethical concerns with this type of research?
The current technology for decoding brain activity is relatively
primitive. The computational models are immature, and in order to
construct a model of someone’s visual system they must spend many hours
in a large, stationary magnetic resonance scanner. For this reason it is
unlikely that this technology could be used in practical applications
any time soon. That said, both the technology for measuring brain
activity and the computational models are improving continuously. It is
possible that decoding brain activity could have serious ethical and
privacy implications downstream in, say, the 30-year time frame. As an
analogy, consider the current debates regarding availability of genetic
information. Genetic sequencing is becoming cheaper by the year, and it
will soon be possible for everyone to have their own genome sequenced.
This raises many issues regarding privacy and the accessibility of
individual genetic information. The authors believe strongly that no one
should be subjected to any form of brain-reading process involuntarily,
covertly, or without complete informed consent.