My publication at Siggraph in '98 proposed conversion of
brainwave algorithms to create imagery and sound, plus AI to drive the imagery for guiding VR subject
towards a target brainwave state for deepening immersion into virtual
environments.
I've recently been writing ideas surrounding
creation of an image-based language to introduce a new communication
paradigm.
1. Matching mind images to an image/ video bank
2. Mapping image/impression-based communication forms which can be triggered by voice command
Highlights from the article (below):
Her vision is broad and sweeping: it runs from a new generation of
extremely high-resolution, affordable MRI machines for early detection
of cancer, heart disease, and more, to a far-out time (or maybe not so
far-out) when machines can read people’s minds and people can
communicate—with each other and maybe even with animals—via thoughts.
The idea “leverages the tools of our times,” Jepsen says, citing
advances in everything from physics to optoelectronics to consumer
electronics to big data and A.I. that can be combined to shrink the
size, improve the functionality, and lower the cost of MRI. “I could no
longer wait. I’m still writing up the patents. But I am incredibly
excited to strike off on this direction,” she says.“My big bet is we can use that manufacturing infrastructure to create
the functionality of a $5 million MRI machine in a consumer electronics
price-point wearable. And the implications of that are so big.” She
says every doctor’s office in the world could afford these wearable
devices and use them for early detection of neurodegenerative disease,
cancer, cardiovascular disease, internal bleeding, blood clots, and
more.
I had long planned a phone call with Mary Lou Jepsen for this
afternoon—a prep session for a chat I will be doing with her a week from
Monday night at Xconomy’s Napa Summit, where she is the featured dinner speaker. It was to be a normal prep chat until I got to work this morning and learned that CNET, Engadget, and Tech Insider
had all reported that the technology visionary was planning to leave
her post as executive director of engineering for Facebook and Oculus,
to focus on a new startup. It turned out she had talked about her plans
last night during a keynote speech at the Women of Vision Awards banquet
in Santa Clara, CA—and the media outlets had all seized on the news.
“I was actually really surprised anybody picked that up,” Jepsen told
me (showing she doesn’t fully understand what a big deal she is). So I
took advantage of the call to ask her more. Some of our talk was off the
record, but much of it was on the record, including quite a bit about
her new plans and the thinking behind them.
Her vision is broad and sweeping: it runs from a new generation of
extremely high-resolution, affordable MRI machines for early detection
of cancer, heart disease, and more, to a far-out time (or maybe not so
far-out) when machines can read people’s minds and people can
communicate—with each other and maybe even with animals—via thoughts.
The idea “leverages the tools of our times,” Jepsen says, citing
advances in everything from physics to optoelectronics to consumer
electronics to big data and A.I. that can be combined to shrink the
size, improve the functionality, and lower the cost of MRI. “I could no
longer wait. I’m still writing up the patents. But I am incredibly
excited to strike off on this direction,” she says.
The startup, whose name has not previously been released as far as I
can tell, is called Open Water (it could also be OpenWater, “not sure
yet…either is OK for now,” she says). “Peter Gabriel gave me the name.
He is a great advisor,” Jepsen says. In particular, she was inspired by
this article he wrote for Edge.org, called Open Water–The Internet of Visible Thought, in which he credited Jepsen for introducing him “to the potential of brain reading devices.”
Jepsen says she can’t talk about funding and more specific plans for
Open Water yet, and that she will remain at Facebook until August. But
here are some highlights of what she could say:
“What I try to do is make things that everybody knows are utterly,
completely impossible—I try to make them possible,” Jepsen sums up. She
does that by leveraging what she calls her “strange background” that
encompasses physics, computer science, media technology, art, electrical
engineering, and more. “That all comes together for me.” Indeed, you
can find more in this companion piece on that background,
which includes stints at Google X, One Laptop per Child (which she
co-founded), the MIT Media Lab, Intel, her own startups, and more.
In the case of Open Water, part of her motivation is her own health.
“I’m a brain tumor survivor,” she says. She had surgery to remove a
brain tumor in 1995, and since then has taken pills “twice a day every
day for the last 21 years to stay alive.” That has led her to read a lot
on the side about neuroscience—and think about how to advance the
field.
Part of the idea behind Open Water involves taking things at “the
hairy edge of what physics can do,” Jepsen says, and then “using my
substantial capability in consumer electronics” to make them possible at
consumer electronics price points. She says there is a huge potential
in the manufacturing plants in Asia that are primarily used to make
OLEDs, LCDs, and such. Jepsen adds that these consumer electronics
manufacturers have been mostly focused on smartphones for the past
decade or so. But, she says, we’ve reached saturation in mobile phones,
and sales are declining. “What I see,” she says, are “the subcomponent
makers being really hungry for what the new, new thing is.”
“My big bet is we can use that manufacturing infrastructure to create
the functionality of a $5 million MRI machine in a consumer electronics
price-point wearable. And the implications of that are so big.” She
says every doctor’s office in the world could afford these wearable
devices and use them for early detection of neurodegenerative disease,
cancer, cardiovascular disease, internal bleeding, blood clots, and
more.
“It’s such a big idea, it’s what I wanted to do for a decade. It’s
why I went to MIT [Media Lab]. It’s why I went to Google,” she says. “It
turned out that Google really needed me to do some other stuff that was
way more important to Google at the time. I’ve been incubating this
since 2005…and I clearly see how to do it and how to realize it in a few
short years.”
One factor in advancing her idea was work published about five years
ago by a group led by Jack Gallant at U.C. Berkeley, Jepsen says. The
research group used a functional magnetic resonance imaging scanner to
track blood flow and oxygen flow and image the brains of people shown
hundreds of hours of videos. You can read more about it here,
but the main point Jepsen stressed to me was that the work (and
subsequent work) has produced a library or database of sorts of how
brains react to different images. A computer using artificial
intelligence can then use such a database to basically look at MRI brain
images in real time and interpret what people are thinking about or
reacting to. This ability has been demonstrated at dozens of labs to
gauge the brain’s reactions to words, music, math equations, and more,
she says. But the resolution is poor and the process is expensive,
requiring people to lie still in big chambers inside a huge magnet.
“I was really struck by that, so I started thinking this is great,
but we need to up the resolution,” she says. “It’s in my head, I’ve got
this plan. I’ve got these inventions that I’m working on, and my next
step is to let myself pursue it full time.”
It is easy to see the power of these ideas to help make MRI far more
affordable and accessible. But for Jepsen, that is just Phase One. She
talks about the ability to image human thoughts in new ways, for
instance, by helping stroke sufferers who can’t talk find a new way to
communicate via their thoughts. Or for amputees to harness their
thoughts to move prosthetics more naturally.
And then she goes a step or two farther. “Can you imagine a movie
director waking up with an image of a new scene in her head, and just
being able to dump her dream” into a computer, she says. ”It could be so
much more efficient than the way we do it now.” For musicians, she
muses, this could be “a way to get the music out of your head.”
But that’s not all. “Maybe we can communicate with animals, maybe we
can scan animal brains and see what images they are thinking of,” Jepsen
says. “So little is known. Dolphins are supposed to be really
smart—maybe we can collaborate with them.”
It all sounds pretty far-out, I know, and she says so, too. But given
how long Jepsen has had these ideas in her head—and how much work has
been done in brain-machine interfaces—perhaps the world is finally ready
to receive her thoughts.
Reconstructing visual experiences from brain activity evoked by natural movies
Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu & Jack L. Gallant (Current Biology 2011, PDF1.4M).
Quantitative modeling of human brain activity can provide crucial
insights about cortical representations and can form the basis for brain
decoding devices. Recent functional magnetic resonance imaging (fMRI)
studies have modeled brain activity elicited by static visual patterns
and have reconstructed these patterns from brain activity. However,
blood oxygen level-dependent (BOLD) signals measured via fMRI are very
slow, so it has been difficult to model brain activity elicited by
dynamic stimuli such as natural movies. Here we present a new
motion-energy encoding model that largely overcomes this limitation. The
model describes fast visual information and slow hemodynamics by
separate components. We recorded BOLD signals in occipitotemporal visual
cortex of human subjects who watched natural movies and fit the model
separately to individual voxels. Visualization of the fit models reveals
how early visual areas represent the information in movies. To
demonstrate the power of our approach, we also constructed a Bayesian
decoder by combining estimated encoding models with a sampled natural
movie prior. The decoder provides remarkable reconstructions of the
viewed movies. These results demonstrate that dynamic brain activity
measured under naturalistic conditions can be decoded using current fMRI
technology.
Frequently asked questions about this work
Could you give a simple outline of the experiment?
The goal of the experiment was to design a process for decoding
dynamic natural visual experiences from human visual cortex. More
specifically, we sought to use brain activity measurements to
reconstruct natural movies seen by an observer. First, we used
functional magnetic resonance imaging (fMRI) to measure brain activity
in visual cortex as a person looked at several hours of movies. We then
used these data to develop computational models that could predict the
pattern of brain activity that would be elicited by any arbitrary movies
(i.e., movies that were not in the initial set used to build the
model). Next, we used fMRI to measure brain activity elicited by a
second set of movies that were completely distinct from the first set.
Finally, we used the computational models to process the elicited brain
activity, in order to reconstruct the movies in the second set of
movies. This is the first demonstration that dynamic natural visual
experiences can be recovered from very slow brain activity recorded by
fMRI.
Can you give an intuitive explanation of movie reconstruction?
As you move through the world or you watch a movie, a dynamic,
ever-changing pattern of activity is evoked in the brain. The goal of
movie reconstruction is to use the evoked activity to recreate the movie
you observed. To do this, we create encoding models that describe how
movies are transformed into brain activity, and then we use those models
to decode brain activity and reconstruct the stimulus.
Can you explain the encoding model and how it was fit to the data?
To understand our encoding model, it is most useful to think of the
process of perception as one of filtering the visual input in order to
extract useful information. The human visual cortex consist of billions
of neurons. Each neuron can be viewed as a filter that takes a visual
stimulus as input, and produces a spiking response as output. In early
visual cortex these neural filters are selective for simple features
such as spatial position, motion direction and speed. Our motion-energy
encoding model describes this filtering process. Currently the best
method for measuring human brain activity is fMRI. However, fMRI does
not measure neural activity directly, but rather measures hemodynamic
changes (i.e. changes in blood flow, blood volume and blood oxygenation)
that are caused by neural activity. These hemodynamic changes take
place over seconds, so they are much slower than the changes that can
occur in natural movies (or in the individual neurons that filter those
movies). Thus, it has previously been thought impossible to decode
dynamic information from brain activtiy recorded by fMRI. To overcome
this fundamental limitation we use a two stage encoding model. The first
stage consists of a large collection of motion-energy filters that span
a range of positions, motion directions and speeds as the underlying
neurons. This stage models the fast responses in the early visual
system. The output from the first stage of the model is fed into a
second stage that describes how neural activity affects hemodynamic
activity in turn. The two stage processing allows us to model the
relationship between the fine temporal information in the movies and the
slow brain activity signals measured using fMRI. Functional MRI records
brain activity from small volumes of brain tissue called voxels (here
each voxel was 2.0 x 2.0 x 2.5 mm). Each voxel represents the pooled
activity of hundreds of thousands of neurons. Therefore, we do not model
each voxel as a single motion-energy filter, but rather as a bank of
thousands of such filters. In practice fitting the encoding model to
each voxel is a straightforward regression problem. First, each movie is
processed by a bank of nonlinear motion-energy filters. Next, a set of
weights is found that optimally map the filtered movie (now represented
as a vector of about 6,000 filter outputs) into measured brain activity.
(Linear summation is assumed in order to simplify fitting.)
How accurate is the decoder?
A good decoder should produce a reconstruction that a neutral
observer judges to be visually similar to the viewed movie. However, it
is difficult to quantify human judgments of visual similarity. In this
paper we use similarity in the motion-energy domain. That is, we
quantify how much of the spatially localized motion information in the
viewed movie was reconstructed. The accuracy of our reconstructions is
far above chance.
Other studies have attempted reconstruction before. How is your study different?
Previous studies showed that it is possible to reconstruct static
visual patterns (Thirion et al., 2006 Neuroimage; Miyawaki et al., 2008
Neuron), static natural images (Naselaris et al., 2009 Neuron) or
handwriting digits (van Gerven et al. 2010 Neural Computation). However,
no previous study has produced reconstructions of dynamic natural
movies. This is a critical step toward obtaining reconstructions of
internal states such as imagery, dreams and so on.
Why is this finding important?
From a basic science perspective, our paper provides the first
quantitative description of dynamic human brain activity during
conditions simulating natural vision. This information will be important
to vision scientists and other neuroscientists. Our study also
represents another important step in the development of brain-reading
technologies that could someday be useful to society. Previous
brain-reading approaches could only decode static information. But most
of our visual experience is dynamic, and these dynamics are often the
most compelling aspect of visual experience. Our results will be crucial
for developing brain-reading technologies that can decode dynamic
experiences.
How many subjects did you run? Is there any chance that they could have cheated?
We ran three subjects for the experiments in this paper, all
co-authors. There are several technical considerations that made it
advantageous to use authors as subjects. It takes several hours to
acquire sufficient data to build an accurate motion-energy encoding
model for each subject, and naive subjects find it difficult to stay
still and alert for this long. Authors are motivated to be good
subjects, to their data are of high quality. These high quality data
enabled us to build detailed and accurate models for each individual
subject. There is no reason to think that the use of authors as subjects
weakens the validity of the study. The experiment focuses solely on the
early part of the visual system, and this part of the brain is not
heavily modulated by intention or prior knowledge. The movies used to
develop encoding models for each subject and those used for decoding
were completely separate, and there no plausible way that a subject
could have changed their own brain activity in order to improve
decoding. Many fMRI studies use much larger groups of subjects, but they
collect much less data on each subject. Such studies tend to average
over a lot of the individual variability in the data, and the results
provide a poor description of brain activity in any individual subject.
What are the limits on brain decoding?
Decoding performance depends on the quality of brain activity
measurements. In this study we used functional MRI (fMRI) to measure
brain activity. (Note that fMRI does not actually measure the activity
of neurons. Instead, it measures blood flow consequent to neural
activity. However, many studies have shown that the blood flow signals
measured using fMRI are generally correlated with neural activity.) fMRI
has relatively modest spatial and temporal resolution, so much of the
information contained in the underlying neural activity is lost when
using this technique. fMRI measurements are also quite variable from
trial-to-trial. Both of these factors limit the amount of information
that can be decoded from fMRI measurements. Decoding also depends
critically on our understanding of how the brain represents information,
because this will determine the quality of the computational model. If
the encoding model is poor (i.e., if it does a poor job of prediction)
then the decoder will be inaccurate. While our computational models of
some cortical visual areas perform well, they do not perform well when
used to decode activity in other parts of the brain. A better
understanding of the processing that occurs in parts of the brain beyond
visual cortex (e.g. parietal cortex, frontal cortex) will be required
before it will be possible to decode other aspects of human experience.
What are the future applications of this technology?
This study was not motivated by a specific application, but was aimed
at developing a computational model of brain activity evoked by dynamic
natural movies. That said, there are many potential applications of
devices that can decode brain activity. In addition to their value as a
basic research tool, brain-reading devices could be used to aid in
diagnosis of diseases (e.g., stroke, dementia); to assess the effects of
therapeutic interventions (drug therapy, stem cell therapy); or as the
computational heart of a neural prosthesis. They could also be used to
build a brain-machine interface.
Could this be used to build a brain-machine interface (BMI)?
Decoding visual content is conceptually related to the work on
neural-motor prostheses being undertaken in many laboratories. The main
goal in the prosthetics work is to build a decoder that can be used to
drive a prosthetic arm or other device from brain activity. Of course
there are some significant differences between sensory and motor systems
that impact the way that a BMI system would be implemented in the two
systems. But ultimately, the statistical frameworks used for decoding in
the sensory and motor domains are very similar. This suggests that a
visual BMI might be feasible.
At some later date when the technology is developed further, will it be possible to decode dreams, memory, and visual imagery?
Neuroscientists generally assume that all mental processes have a
concrete neurobiological basis. Under this assumption, as long as we
have good measurements of brain activity and good computational models
of the brain, it should be possible in principle to decode the visual
content of mental processes like dreams, memory, and imagery. The
computational encoding models in our study provide a functional account
of brain activity evoked by natural movies. It is currently unknown
whether processes like dreaming and imagination are realized in the
brain in a way that is functionally similar to perception. If they are,
then it should be possible to use the techniques developed in this paper
to decode brain activity during dreaming or imagination.
At some later date when the technology is developed further, will it
be possible to use this technology in detective work, court cases,
trials, etc?
The potential use of this technology in the legal system is
questionable. Many psychology studies have now demonstrated that
eyewitness testimony is notoriously unreliable. Witnesses often have
poor memory, but are usually unaware of this. Memory tends to be biased
by intervening events, inadvertent coaching, and rehearsal (prior
recall). Eyewitnesses often confabulate stories to make logical sense of
events that they cannot recall well. These errors are thought to stem
from several factors: poor initial storage of information in memory;
changes to stored memories over time; and faulty recall. Any
brain-reading device that aims to decode stored memories will inevitably
be limited not only by the technology itself, but also by the quality
the stored information. After all, an accurate read-out of a faulty
memory only provides misleading information. Therefore, any future
application of this technology in the legal system will have to be
approached with extreme caution.
Will we be able to use this technology to insert images (or movies) directly into the brain?
Not in the foreseeable future. There is no known technology that
could remotely send signals to the brain in a way that would be
organized enough to elicit a meaningful visual image or thought.
Does this work fit into a larger program of research?
One of the central goals of our research program is to build
computational models of the visual system that accurately predicts brain
activity measured during natural vision. Predictive models are the gold
standard of computational neuroscience and are critical for the
long-term advancement of brain science and medicine. To build a
computational model of some part of the visual system, we treat it as a
“black box” that takes visual stimuli as input and generates brain
activity as output. A model of the black box can be estimated using
statistical tools drawn from classical and Bayesian statistics, and from
machine learning. Note that this reverse-engineering approach is
agnostic about the specific way that brain activity is measured. One
good way to evaluate these encoding models is construct a corresponding
decoding model, and then assess its performance in a specific task such
as movie reconstruction.
Why is it important to construct computational models of the brain?
The brain is an extremely complex organ and many convergent
approaches are required to obtain a full understanding of its structure
and function. One way to think about the problem is to consider three
different general goals of research in systems/computational
neuroscience. (1) The first goal is to understand how the brain is
divided into functionally distinct modules (e.g., for vision, memory,
etc.). (2) The second goal, contingent on the first, is to determine the
function of each module. One classical approach for investigating the
function of a brain circuit is to characterize neural responses at a
quantitative computational level that is abstracted away from many of
the specific anatomical and biophysical details of the system. This
helps make tractable a problem that would otherwise seem overwhelmingly
complex. (3) The third goal, contingent on the first two, is to
understand how these specific computations are implemented in neural
circuitry. A byproduct of this model-based approach is that it has many
specific applications, as described above.
Can you briefly explain the function of the parts of the brain examined here?
The human visual system consists of several dozen distinct cortical
visual areas and sub-cortical nuclei, arranged in a network that is both
hierarchical and parallel. Visual information comes into the eye and is
there transduced into nerve impulses. These are sent on to the lateral
geniculate nucleus and then to primary visual cortex (area V1). Area V1
is the largest single processing module in the human brain. Its function
is to represent visual information in a very general form by
decomposing visual stimuli into spatially localized elements. Signals
leaving V1 are distributed to other visual areas, such as V2 and V3.
Although the function of these higher visual areas is not fully
understood, it is believed that they extract relatively more complicated
information about a scene. For example, area V2 is thought to represent
moderately complex features such as angles and curvature, while
high-level areas are thought to represent very complex patterns such as
faces. The encoding model used in our experiment was designed to
describe the function of early visual areas such as V1 and V2, but was
not meant to describe higher visual areas. As one might expect, the
model does a good job of decoding information in early visual areas but
it does not perform as well in higher areas.
Are there any ethical concerns with this type of research?
The current technology for decoding brain activity is relatively
primitive. The computational models are immature, and in order to
construct a model of someone’s visual system they must spend many hours
in a large, stationary magnetic resonance scanner. For this reason it is
unlikely that this technology could be used in practical applications
any time soon. That said, both the technology for measuring brain
activity and the computational models are improving continuously. It is
possible that decoding brain activity could have serious ethical and
privacy implications downstream in, say, the 30-year time frame. As an
analogy, consider the current debates regarding availability of genetic
information. Genetic sequencing is becoming cheaper by the year, and it
will soon be possible for everyone to have their own genome sequenced.
This raises many issues regarding privacy and the accessibility of
individual genetic information. The authors believe strongly that no one
should be subjected to any form of brain-reading process involuntarily,
covertly, or without complete informed consent.