pobam.tex

\input preamble
\addbibresource{pobam_.bib}

\title{Physical Principles for Scalable Neural Recording}

\author[1,2]{\ \lift{$\jointfirst\,$}Adam~H.~Marblestone\rlap{,}}
\author[3]{\lift{$\jointfirst\,$}Bradley~M.~Zamft\rlap{,}}
\author[3,4]{Yael~G.~Maguire\rlap{,}}
\author[5]{Mikhail~G.~Shapiro\rlap{,}}
\author[6]{Thaddeus~R.~Cybulski\rlap{,}}
\author[6]{Joshua~I.~Glaser\rlap{,}}
\author[7]{Dario~Amodei\rlap{,}}
\author[3]{P.~Benjamin~Stranges\rlap{,}}
\author[3]{Reza~Kalhor\rlap{,}}
\author[1,8,9]{David~A.~Dalrymple\rlap{,}}
\author[10]{Dongjin~Seo{,}}
\author[10]{Elad~Alon\rlap{,}}
\author[10]{Michel~M.~Maharbiz\rlap{,}}
\author[10,11]{Jose~M.~Carmena\rlap{,}}
\author[10]{Jan~M.~Rabaey\rlap{,}}
\author[$\jointlast$9,12]{Edward~S.~Boyden\rlap{,}}
\author[$\jointlast$1,2,3]{George~M.~Church\rlap{,}}
\author[$\jointlast$13,14]{Konrad~P.~Kording}

\affil[$\jointfirst$]{Joint first authors}
\affil[$\jointlast$]{Joint last authors}

\newcommand\et{{\em \&}}

\definecolor{deemph}{gray}{0.48}
\affil[1]{Biophysics {Program,} Harvard {Univ., Boston,~MA~02115, USA}}
\affil[2]{Wyss Institute {for Biologically Inspired Engineering at} Harvard {Univ., Boston,~MA~02115, USA}}
\affil[3]{{ Dept.\ of Genetics,} Harvard Medical School{, Boston,~MA~02115, USA}}
\affil[4]{Plum Labs LLC, Cambridge,~MA{~02142, USA}}
\affil[5]{Division of Chemistry and Chemical Engineering, California Institute of Technology{, Pasadena,~CA~91125, USA}}
\affil[6]{{Interdepartmental} Neuroscience {Program,} Northwestern Univ.{, Chicago,~IL~60611, USA}}
\affil[7]{Department of Radiology, Stanford University, Palo Alto, ~CA~94305, USA}
\affil[8]{Nemaload, San Francisco, CA{~94107, USA}}
\affil[9]{Media Lab{oratory,} Massachusetts Institute of Technology{, Cambridge,~MA~02139, USA}}
\affil[10]{{Dept.\ of} Electrical Engineering and Computer Sciences{, Univ. of California at} Berkeley{, Berkeley,~CA~94720, USA}}
\affil[11]{Helen Wills Neuroscience Institute, Univ. of California at Berkeley, Berkeley,~CA~94720, USA}
\affil[12]{{Depts.\ of} Brain and Cognitive Sciences {\et\ of} Biological Engineering, Massachusetts Institute of Technology{, Cambridge,~MA~02139, USA}}
\affil[13]{{Depts.\ of} Physical Medicine and Rehabilitation {\et\ of} Physiology, Northwestern Univ.{\ Feinberg} School of Medicine{, Chicago,~IL~60611,~USA}}
\affil[14]{{Sensory Motor Performance Program,} The Rehabilitation Institute of Chicago{, Chicago,~IL~60611,~USA}}

\renewcommand{\maketitlehookc}{{\small\raggedright Correspondence to: \texttt{adam.h.marblestone\,\textnormal{(at)}\,\,gmail.com}}}

\begin{document}
%\fontfamily{ugm}\selectfont
\maketitle
\pagestyle{plain}
\thispagestyle{empty}
%\captionwidth{0.8\linewidth}
%\changecaptionwidth

\begin{fquote}[Freeman Dyson][Imagined Worlds][1997]To understand in depth what is going on in a brain, we need tools that can fit inside or between neurons and transmit reports of neural events to receivers outside. We need observing instruments that are local, nondestructive and noninvasive, with rapid response, high band-width and high spatial resolution\ldots\hfill\ There is no law of physics that declares such an observational tool to be impossible.\end{fquote}

\begin{abstract}
\noindent
Simultaneously measuring the activities of all neurons in a mammalian brain at millisecond resolution is a challenge beyond the limits of existing techniques in neuroscience.
Entirely new approaches may be required, motivating an analysis of the fundamental physical constraints on the problem.
We outline the physical principles governing brain activity mapping using optical, electrical, magnetic resonance, and molecular modalities of neural recording.
Focusing on the mouse brain, we analyze the scalability of each method, concentrating on the limitations imposed by spatiotemporal resolution, energy dissipation, and volume displacement.
Based on this analysis, all existing approaches require orders of magnitude improvement in key parameters.
Electrical recording is limited by the low multiplexing capacity of electrodes and their lack of intrinsic spatial resolution,
optical methods are constrained by the scattering of visible light in brain tissue,
magnetic resonance is hindered by the diffusion and relaxation timescales of water protons,
and the implementation of molecular recording is complicated by the stochastic kinetics of enzymes.
Understanding the physical limits of brain activity mapping may provide insight into opportunities for novel solutions.
For example, unconventional methods for delivering electrodes may enable unprecedented numbers of recording sites,
embedded optical devices could allow optical detectors to be placed within a few scattering lengths of the measured neurons,
and new classes of molecularly engineered sensors might obviate cumbersome hardware architectures.
We also study the physics of powering and communicating with microscale devices embedded in brain tissue and find that, while radio-frequency electromagnetic data transmission suffers from a severe power--bandwidth tradeoff, communication via infrared light or ultrasound may allow high data rates due to the possibility of spatial multiplexing.
The use of embedded local recording and wireless data transmission would only be viable, however, given major improvements to the power efficiency of microelectronic devices.
\end{abstract}

\section{Introduction}
Neuroscience depends on monitoring the electrical activities of neurons within functioning brains~\cite{alivisatos2012brain, bansal2012decoding, gerhard2013successful} and has advanced through steady improvements in the underlying observational tools. The number of neurons simultaneously recorded using wired electrodes, for example, has doubled every seven years since the 1950s, currently allowing electrical observation of hundreds of neurons at sub-millisecond timescales~\cite{stevenson11}. Recording techniques have also diversified: activity-dependent optical signals from neurons endowed with fluorescent indicators can be measured by photodetectors, and radio-frequency emissions from excited nuclear spins allow the construction of magnetic resonance images modulated by activity-dependent contrast mechanisms.
Ideas for alternative methods have been proposed, including the direct recording of neural activities into information-bearing biopolymers~\cite{zamft12,glaser13,kording11a}.

Each modality of neural recording has characteristic advantages and disadvantages.
Multi-electrode arrays enable the recording of \num{\ca 250} neurons at sub-millisecond temporal resolutions.
Optical microscopy can currently record \num{\ca 100000} neurons at a \SI{1.25}{\second} timescale in behaving larval zebrafish using light-sheet illumination~\cite{ahrens13}, or hundreds to thousands of neurons at a \SI{\ca100}{\milli\second} timescale in behaving mice using a 1-photon fiber scope~\cite{ziv13}.
Magnetic resonance imaging (MRI) allows non-invasive whole brain recordings at a \SI{1}{\second} timescale, but is far from single neuron spatial resolution, in part due to the use of hemodynamic contrast.
Finally, molecular recording devices have been proposed for scalable physiological signal recording but have not yet been demonstrated in neurons~\cite{zamft12,glaser13,kording11a}.

\autoref{fig:modalities} illustrates the recording modalities studied here.
While further development of these methods promises to be a crucial driver for future neuroscience research \cite{NeuroscienceThinksBigAndCollaboratively}, their fundamental scaling limits are not immediately obvious. Furthermore, inventing new technologies for scalable neural recording requires a quantitative understanding of the engineering problems that such technologies must solve, a landscape of constraints which should inform design decisions.

\begin{figure}[htbp]
\caption{Four generalized neural recording modalities. (a) \emph{Extracellular electrical recording} probes the voltage due to nearby neurons. (b) \emph{Optical microscopy} detects light emission from activity-dependent indicators. In two-photon laser scanning microscopy, shown here, an excitation beam at 2$\times$ the peak excitation wavelength of the fluorescent indicator is scanned across the sample, while an integrating detector captures the emitted fluorescence. (c) \emph{Magnetic resonance imaging} detects radio-frequency magnetic induction signals from aqueous protons, after weak thermal alignment of the proton spins by a static magnetic field. A resonant radio-frequency pulse tips the spins into a plane perpendicular to the static field, causing the net magnetization to precess. The resulting signals are affected by the local chemical and magnetic environment, which can be altered dynamically by imaging agents in response to neural activity. Activity-dependent contrast agents are necessary to transduce neural activity into an MRI readout, whereas current functional MRI methods rely on blood oxygenation signals which cannot reach single-neuron resolution. (d) \emph{Molecular recording} devices have been proposed, in which a “ticker tape” record of neural activity is encoded in the monomer sequence of a biomolecular polymer – a form of nano-scale local data storage. This could be achieved by coupling correlates of neural activity to the nucleotide misincorporation probabilities of a DNA or RNA polymerase as it replicates or transcribes a known DNA strand.} 
\centering
\includegraphics[width=0.8\linewidth]{figs/Fig1.eps}
\label{fig:modalities}
\end{figure}

Our analysis is predicated on assumptions that enable us to estimate scaling limits.
These include assumptions about basic properties of the brain, which are treated in \anref{sec:constraints}, as well as those pertaining to the required measurement resolution and the limits to which a neural recording method may perturb brain tissue, which are treated in \anref{sec:challenges}.
Together, these considerations form the basis for our estimates of the prospects for scaling of neural recording technologies.
We analyze four modalities of brain activity mapping---electrical, optical, magnetic resonance and molecular---in light of these assumptions, and conclude with a discussion on opportunities for new developments.

Importantly, our assumptions, analyses and the conclusions thereof are intended as \emph{first approximations and are subject to debate}.
We anticipate that as much can be learned from where our logic breaks down as from where it succeeds, and from methods to work around the limits imposed by our assumptions.

\section{Basic Constraints}
\label{sec:constraints}

\paragraph{Mouse brain}
The mouse brain contains \num{\ca 7.5e7} neurons in a volume of \SI{\ca 420}{\milli\meter\cubed}~\cite{vincent10} and weighs about \SI{0.5}{\gram}.
The packing density of neurons varies widely between brain regions. In the below, we will use a cell density of $\rho_{\text{neurons}} \approx \SI{92000}{\per\milli\meter\cubed}$, as measured for mouse cortex \cite{braitenberg1991anatomy}. This corresponds roughly to one neuron per \SI{22}{\micro\meter} voxel. The density of cortical synapses, on the other hand, approaches \SI{1e9}{\per\milli\meter\cubed}, i.e., one synapse per \SI{1}{\micro\meter\cubed} voxel. For comparison, the human brain has roughly \num{8e10} neurons~\cite{azevedo09} in a volume of \SI{1200}{\centi\meter\cubed}~\cite{allen02}.

The human brain consumes \SI{\ca 15}{\watt} of power (performing, at synapses, a rough equivalent of at least \num{1e17} floating point computational operations per second on that power budget, according to one definition~\cite{sarpeshkar10}, although the analogy with digital computers should not be taken literally).
Because power consumption scales approximately linearly with the number of neurons~\cite{houzel11}, the mouse brain is expected to utilize \SI{\ca 15}{\milli\watt}. For comparison, the metabolic rate of the \SIrange{\ca 20}{30}{\gram} mouse is \SIrange{\ca 200}{600}{\milli\watt} depending on its degree of physical activity~\cite{speakman13}.

\paragraph{Neural activities}
Action potentials (spikes) last \SI{\ca 2}{\milli\second}.
The rate of neuronal spiking is highly variable. Some authors have assumed an average rate of \SI{5}{\hertz}~\cite{sarpeshkar10, harris2012synaptic}, but certain neurons spike at \SI{500}{\hertz} or faster~\cite{gittis10}, while many neurons spike much more slowly. For example, cerebellar granule cells, which make up half of the neurons in the brain, have spontaneous firing rates of \SI{\ca 0.5}{\hertz} \cite{chadderton2004integration}. In neocortex, one analysis estimated 0.16 spikes per second per neuron (in primate) as energetically sustainable \cite{lennie2003cost}. There may be as much as a two-fold change in metabolism and hence firing rate across brain states \cite{howarth2012updated}. Certain neurons (possibly up to 90$\%$ for some neuron types in some brain areas) may be effectively silent \cite{Shoham2006, Barth2012345}, e.g., spiking less than once every ten seconds. Some studies have attempted to measure the \emph{distribution} of neural firing rates in various cortical areas (as opposed to just the average rate), and have observed that these distributions are often long-tailed: a small minority of the neurons fires a majority of the spikes \cite{roxin2011distribution, oconnor2010neural, hromadka2008sparse, shafi2007variability}. 

While these estimates of typical firing rates are useful numbers to have in mind, in the below we aim to sample all neurons at \SI{1}{\kilo\hertz} rates (or higher for techniques requiring observation of detailed spike waveforms). This choice is informed by several factors. First, measuring spike \emph{timing} with millisecond precision is relevant for understanding network function, due to the possibilities for timing codes, spike-timing dependent plasticity mechanisms, and other effects relying on temporally-precise spiking patterns \cite{markram2011history, Babadi2013, Taillefumier27032013, Gire2013416}. In this regard, it is also important for a recording method to maintain precise temporal phasing between measurements at different brain locations: activity measurements should be locked to precise global clocks, perhaps with a tolerable phase imprecision between any two measurements in the range of $\frac{1}{2\pi} \times \SI{1}{\milli\second} \approx \SIrange{100}{200}{\micro\second}$. Furthermore, the activities of neurons can be highly correlated locally or across large networks \cite{schneidman2006weak}, suggesting that local activity sensors may be subjected to high instantaneous total firing rates due to simultaneously-active neurons.

\paragraph{Absorption and scattering of radiation}

\begin{figure}[htbp]
\caption{%
Penetration depth (attenuation length) of electromagnetic radiation in water vs. wavelength (data from \cite{jonasz07}).
The approximate diameter of the mouse brain is shown as a black dashed line.
Inset: approximate tissue model based on Mie scattering theory and water absorption. Absorption length of water~\cite{kou93} (blue), approximate tissue scattering length in a simple Mie scattering model (red) and the resulting attenuation length (green) of infrared light (inset reproduced from \cite{kou93}, with permission).}
\label{fig:attenuation}
\centering
\includegraphics[width=0.65\linewidth]{figs/Fig2.eps}
\end{figure}

All existing methods of neural recording utilize electromagnetic waves, from the near-DC frequencies of wired electrical recordings (\SI{\ca 1}{\kilo\hertz}) to the radio-frequencies of wireless electronics and fMRI (MHz--GHz) to visible light in optical approaches (\SI{\ca 500}{\tera\hertz}).
These electromagnetic waves are attenuated in brain tissue by absorption and scattering.
As an approximation to the electromagnetic absorption by brain tissue, we treat the absorption by water, the brain's main constituent (\SIrange{68}{80}{\percent} by mass in humans \cite{dobbing73,fatouros99}).
At visible and near-IR wavelengths, scattering dominates absorption: absorption lengths are in the \SI{\ca 1}{\milli\meter} range, while scattering lengths are \SIrange{\ca 25}{200}{\micro\meter}~\cite{Wilt2009}. The combined effect of absorption and scattering is measured by the attenuation length, the distance over which the signal strength is reduced by a factor of $1/e$ along a path.
\autoref{fig:attenuation} shows the absorption length of water~\cite{kou93}, and the attenuation length in a Mie scattering model (from \cite{horton13}) intended to approximate the scattering properties of cortical tissue (and see \cite{gabriel1996} for tissue skin depth measurements in the \SI{10}{\hertz} to \SI{100}{\giga\hertz} range).
This gives a preliminary indication of which wavelengths can be used to measure deep-brain signals with external detectors. Note that the attenuation length is only one of several relevant metrics: for example, scattering not only causes signal attenuation, but also causes noise and impairs signal separation, so the magnitude of the scattering is a key figure of merit.

\section{Challenges for Brain Activity Mapping}
\label{sec:challenges}
Any activity mapping technology must extract the required information without disrupting normal neuronal activity.
As such, we consider three primary challenges: spatiotemporal resolution and informational throughput, energy dissipation and volume displacement.

\subsection{Spatiotemporal Resolution and Informational Throughput}

A sampling rate of \SI{1}{\kilo\hertz} is necessary to capture the fastest trains of action potentials at single-spike resolution.
A minimal data rate of \num{7.5e10} bits processed per second is then required to record 1 bit per mouse neuron at \SI{1}{\kilo\hertz}.

In electrical recording, higher sampling rates (e.g. \SIrange{10}{40}{\kilo\hertz}) are often necessary to distinguish neurons based on spike shapes when each electrode monitors multiple neurons.
More fundamentally, one bit per neuron sampling at \SI{1}{\kilo\hertz} would likely not be sufficient to reliably distinguish spikes above noise: transmitting \SI{\ca 10}{\bit} samples at \SI{\ca 10}{\kilo\hertz} (full waveform) or \SIrange{\ca 10}{20}{\bit} time-stamps upon spike detection would be more realistic.

Conversely, it may be possible to locally compress measurements of a spike train before transmission.
The degree of compressibility of neural activity data is related to the variability in the distribution of neural responses (e.g., such a distribution may be defined across time bins or repeated stimulus presentations)~\cite{strong98}.
In the blowfly \textit{Calliphora vicina}, the entropy of spike trains has been measured to be up to \SI{\ca 180}{\bit\per\second}, and the information about a stimulus encoded by a spike train was as high as \SI{\ca 90}{\bit\per\second}~\cite{strong98}. Extrapolating from fly to mouse, this would suggest that a compression factor of 5$\times$--10$\times$ should be possible, relative to a \SI{1000}{\bit\per\second} raw binary sampling. 

As a na\"{\i}ve estimate of the entropy as a function of firing rate, one can write the entropy $H$ in \si{\bit\per\second}, assuming \SI{1}{\milli\second} long spikes and $f=\SI{1000}{\hertz}$ sampling rate, as
\[H \approx \left(-P\sub{spike}\cdot\log_2\!\left(P\sub{spike}\right) - \left(1 - P\sub{spike}\right)\cdot\log_2\!\left(1 - P\sub{spike}\right)\right) \cdot f\]

where $P\sub{spike}$ is the probability of spiking during the sampling interval (average firing rate/$f$).
For an average firing rate of \SI{5}{\hertz}, $P\sub{spike}=0.005$ and $H=\SI{45}{\bit\per\second}$, corresponding to a compression factor of $\ca 20\times$.
However, at \SI{500}{\hertz} average firing rate, $P\sub{spike}=0.5$ with $H\approx\SI{1000}{\bit\per\second}$, i.e., there is no compressibility. 
Therefore, compression could conceivably reduce the data transmission burden for activity mapping by 1--2 orders of magnitude, depending on the neurons and activity regimes under consideration. Note that these compressibility calculations have assumed that firing patterns are independent across cells; they represent the temporal compressibility of the spike train from each cell, treated individually. Patterns across cells could conceivably be compressed by a much larger amount, to the extent that there is redundancy between cells. Nevertheless, we use \SI{1}{\bit\per neuron\per\milli\second} or \SI{100}{\giga\bit\per\second} as a ``minimal whole brain data rate'' in what follows.
In many cases, this likely constitutes a lower bound on what is feasible in practice.

\subsection{Energy Dissipation}

Brain tissue can sustain local temperature increases ($\Delta T$) of \SI{\ca 2}{\celsius} without severe damage over a timescale of hours. Indeed, changes of this magnitude may occur naturally in rats in response to varying activity levels \cite{Wolf2008}.
Assuming that the brain is receiving a constant power influx $P\sub{delivered}$ and that the local thermal transport properties of mouse brains are similar to those of humans, we can approximate the temperature change in deep-brain tissue as a function of the applied power \cite{sotero11, Lazzi2005}:
\[\frac{\od T}{\od t} = \left.\left(P\sub{delivered} + P\sub{metabolic} - \rho\sub{blood} C\sub{blood}\,f\sub{blood} \Delta T\right)\right/C\sub{tissue}\]
where $P\sub{metabolic} = \SI{0.0116}{\watt\per\gram}$ is the power per unit mass of basal metabolism, $C\sub{tissue} \approx \SI{3.7}{\joule\per\kelvin\per\gram} \approx 0.88\cdot C\sub{water}$ is the specific heat capacity of brain tissue, $\rho\sub{blood}=\SI{1.05}{\gram\per\centi\meter\cubed}$ is the density of blood, $C\sub{blood} = \SI{3.9}{\joule\per\kelvin\per\gram}$ is the specific heat capacity of blood, $f\sub{blood} = \SI{9.3e-9}{\meter\cubed\per\gram\per\second}$ is the volume flow rate of blood, and $\Delta T$ is the temperature difference between the brain tissue and the blood (at \SI{37}{\celsius}).
A steady-state temperature increase ($\od T/\od t = 0$) of \SI{2}{\celsius} corresponds to dissipation of \SI{\ca 40}{\milli\watt} per \SI{500}{\milli\gram} mouse brain.
Therefore, a recording technique should not dissipate more than \SI{\ca 40}{\milli\watt} of power in a mouse brain at steady state.

This estimate of the power dissipation limit in mouse brains, based on such a simplified model of the brain's thermal transport mechanisms, is likely an under-estimate of the actual maximum steady-state power dissipation.  Radiative heat loss was ignored here since infrared light emitted by deep-brain tissue is quickly re-absorbed by nearby tissue. We have also ignored cooling due to flows in the cerebrospinal ventricles \cite{smith2010brain} and in the glymphatic system \cite{iliff2012paravascular}. We have further assumed that conductive heat loss from the brain surface is negligible compared to the heat extracted volumetrically by blood flow. While this may hold true locally in deep brain voxels and over short timescales (e.g., \SI{<1}{\minute}), further work (e.g., a whole-head model \cite{Lazzi2005, sukstanskii2004analytical}) is needed to define the true limits of sustained volumetric heat production by neural recording systems distributed throughout the mouse brain. Indeed, the characteristic length scale of temperature inhomogeneities in the brain is on the order of millimeters \cite{sukstanskii2006theoretical}, whereas heat exchange with the flowing blood dampens the effects of local perturbations over longer length scales. For large brains, this means that sources and sinks of heat exert only local thermal effects; for a mouse brain on the scale of \SI{<10}{\milli\meter}, however, surface and volumetric effects likely combine to influence temperature changes at any site in the brain \cite{sukstanskii2007theoretical}. Experimentally, increasing the temperature gradient at the brain surface, via a cranial window exposed to ambient air at \SI{\ca 25}{\celsius} (i.e., the common craniotomy technique used to access mouse neocortex), has been shown to dis-regulate brain temperature down to a depth of several millimeters \cite{kalmbach2012brain}. For the above reasons, our estimates of the brain's capacity for heat dissipation should be treated only as first approximations.

Higher power levels, compared to the maximum steady state power, may be introduced into brains transiently.
According to the above equation, if a neural recorder dissipates \SI{\ca 40}{\milli\watt} per \SI{500}{\milli\gram} mouse brain, then the brain approaches the steady-state temperature in \SIrange{2}{3}{\minute}, making shorter experiments potentially feasible. This is in agreement with the estimate from \cite{sukstanskii2006theoretical} of a \SI{\ca 1}{\minute} time constant for brain temperature changes, as well as with experimental measurements showing similar time constants for temperature variations resulting from sustained neural stimulation \cite{mcelligott1967localized, trubel2005regional}. Increasing convective heat loss from the brain by increasing blood flow (e.g. via increased heart rate) or cooling the brain (volumetrically or via its surface \cite{sukstanskii2007theoretical}), the blood, the cerebrospinal fluid (CSF), or the whole animal~\cite{polderman2004}, could increase the allowable transient or steady-state power dissipation.

There are also limits on the power density of radiation applied to brain tissue.
For radio-frequency electromagnetic radiation, the specific absorption rate (SAR) limit on the power density exposed to human tissue is \SI{\ca 10}{\milli\watt\per\centi\meter\squared} \cite{IEEE_RF_standard}, while for ultrasound (which couples less strongly to dissipative loss mechanisms in tissue) the SAR limit is up to 72$\times$ higher \cite{FDA_ultrasound_standard}.
The power density limit for visible and near-IR light exposures are also in the \SIrange{\ca 10}{100}{\milli\watt\per\centi\meter\squared} range for \SI{\ca 1}{\milli\second} long exposures, decreasing as the exposure time lengthens (based on the IEC 60825 formulas~\cite{iec60825}).

High local power dissipation (transient or steady-state) can modify the electrical properties of excitable membranes, altering neuronal activity patterns.
For example, heating of cell membranes and of the surrounding solution by millisecond-long optical pulses leads to changes in membrane electrical capacitance mediated by the ionic double layer~\cite{shapiro12}.
Slower temperature changes (on a scale of seconds) resulting from RF radiation lead to accelerated ion channel and transporter kinetics~\cite{shapiro13}.
Both of these effects are appreciable when the temperature changes are on the order of \SIrange{1}{10}{\degreeCelsius}.

For comparison with current practice, common guidelines for chronic heat exposure from biomedical implants \cite{Wolf2008} use upper limits of \SI{2}{\celsius} temperature change, \SI{40}{\milli\watt\per\centi\meter\squared} heat flux from the surface of implanted brain machine interface (BMI) hardware, and an SAR limit of \[\frac{\sigma E^2}{2 \rho} < \SI{1.6}{\milli\watt\per\gram}\] for electromagnetic energy absorbed by tissue, where $E$ is the peak electric field amplitude of the applied radiation, $\sigma \approx \SI{0.18}{\siemens\per\meter}$ is the electrical conductivity of grey matter and $\rho \approx \SI{1}{\gram\per\centi\meter\cubed}$ is the tissue density \cite{Lazzi2005} (this corresponds to an irradiance of $\epsilon_0 c E^2 / 2 \approx \SI{2.4}{\milli\watt\per\centi\meter\squared}$). A 96-channel BMI system demonstrated in living brains had dissipated areal power density approaching \SI{40}{\milli\watt\per\centi\meter\squared}~\cite{rizk2009}.

\subsection{Sensitivity to Volume Displacement}

To prevent damage to the brain, we assume that a recording technique should not displace \SI{> 1}{\percent} of the brain's volume. \emph{The appropriate damage threshold is not yet established, however, so this constitutes a first guess}. It is possible to insert large numbers of probes throughout multiple brain areas without compromising function. In rats, 96 electrodes of \SI{50}{\micro\meter} diameter were simultaneously inserted across four forebrain structures (cortex, thalamus, hippocampus and putamen)~\cite{Ribeiro2004}. In rhesus macaque, 704 electrodes of diameter \SI{50}{\micro\meter} and average depth \SI{2.5}{\milli\meter} were chronically implanted in cortex~\cite{Nicolelis2003}. Note, however, that the total volume displacement in these experiments was below \SI{0.1}{\percent}, and below \SI{0.01}{\percent}, respectively.  Furthermore, these studies used a low density of electrodes. Thus, detailed limits on the amount and density of inserted material are unknown.

Furthermore, the nature of the volume displacement is important---sheets of instrumentation that sever long-range connectivity, for example, would disrupt normal brain function regardless of the degree of volume displacement.
Conversely, higher volume displacement might be possible if introduced gradually, or during early development, insomuch as the brain can adapt without disrupting natural computation. One important consideration in this regard would be the disruption of blood circulation by inserted material; a high density of implanted material in a brain region could cause stroke due to widespread vascular damage. Recent studies have defined in microscopic detail the complete vascular network of the mouse cortex using high-throughput histology~\cite{Kleinfeld2013}; this type of information could be used to enumerate key vascular pathways which could be spared from damage. To apply this in a particular animal, however, would require a non-destructive method to image the vasculature at a similar resolution; otherwise, only a broad statistical view can be obtained, since the detailed vascular geometry will vary from animal to animal. 

Secondary effects like glial scarring may also pose obstacles to the long-term implantation of large numbers of probes \cite{polikov2005response, ward2009toward}, although methods are being developed to alleviate this \cite{taub2012bioactive, reichert2008molecular, reichert2010indwelling}. In the context of electrical recording, the impact of glial scarring may vary depending on geometry. For example, the recording sites at the tip of a Utah or Duke multi-electrode array are typically viable in chronic recordings of up to 18 months in primates \cite{Nicolelis2003, suner2005reliability}, whereas in array formats with multiple electrodes along each shaft, such as the Michigan array, chronic recordings of up to 4 months have been reported in rats \cite{vetter2004chronic}. Differences in recording lifetime may be due to differences in the pattern of glial encapsulation of the contacts.

\section{Evaluation of Modalities}

We next evaluate neural recording technologies with respect to the above challenges, using the mouse brain as a model system.
\autoref{table:strategies} lists the modalities studied, the assumptions made, the analysis strategies applied, and the conclusions derived.

\begin{table}[htbp]
\caption{Summary of modalities, models, assumptions and conclusions}
\label{table:strategies}
\footnotesize
\tabulinesep=1mm
\newcommand{\iskip}{\par\vspace{3pt}}
\begin{tabu} to\linewidth{>{\itshape}X[2,l]X[2.5]X[4,l]X[5]}
\toprule
\rowfont[C]{\upshape\bfseries\small}
Modality & Analysis Strategy & Assumptions & Conclusions \\
\cmidrule[0.4pt](lr){1-1}
\cmidrule[0.4pt](lr){2-3}
\cmidrule[0.4pt](lr){4-4}

Extracellular electrical recording &
Compute minimal number of recorders based on max distance from recorder to recorded neuron 
\iskip Compute channel capacity limits to spike sorting
&
{Decay profile of extracellular voltage
\iskip Approximate noise levels at recording site}
&
{Maximum recording distance $r\sub{max}\approx\SIrange{100}{200}{\micro\meter}$ from electrode to neuron measured
\iskip $\ca\num{1e5}$ recording sites are required per mouse brain at current noise levels assuming perfect spike sorting
\iskip $\ca\num{1e6}$ recording sites are required at current noise levels at the physical limits of spike sorting
\iskip $\ca\num{1e7}$ recording sites are required using current spike sorting algorithms}
\\

Implanted electrical recorders &
Compute power dissipation of electronic devices that digitally sample neuronal activity &
Physical limit: $\left.\kb T\ln\left(2\right)\right/\si{\bit}$ erased \iskip
Practical limit: $\left.\ca 10 \kb T\right/\si{\bit}$ processed \iskip
Current CMOS digital circuits: $\left.\num{>1e5}\kb T\right/\si{\bit}$ processed &
Requires 2--3 orders of magnitude increase in the power efficiency of electronics relative to current devices to scale to whole-brain simultaneous recordings \iskip
Minimalist architectures could be developed to reduce local data processing overhead
\\

Wireless data transmission &
Compute tradeoff between power dissipation and channel bandwidth using information theory &
Transmitter must supply enough power to overcome noise and path loss &
Transmission at optical or near-optical frequencies is needed to achieve sufficient single-channel data rates using electromagnetic radiation. Radio-frequency (RF) electromagnetic transmission of whole-brain activity data draws excessive power due to bandwidth constraints
\iskip Bandwidth cannot be split over multiple independent RF channels, but IR light or ultrasound may allow spatial multiplexing
\\

Optical imaging &
Relate the scattering and absorption lengths of optical wavelengths in brain tissue to signal-to-noise ratios for optical imaging &
Approximate values of scattering and absorption lengths as a function of wavelength &
Light scattering imposes severe constraints, but strategies exist which could negate the effects of scattering, such as implantable optics, infrared indicators, signal modulation, and online inversion of the scattering matrix
\\

Multi-photon optical imaging &
Compute minimum total excitation light power to excite multi-photon transitions from indicators within each neuron in every imaging frame &
Approximate values of multi-photon cross-sections \iskip
Pulse durations similar to those currently used in multi-photon imaging &
Whole-brain multi-photon excitation will over-heat the brain except in very short experiments, unless ultra-high-cross-section indicators are used
\\

Beam scanning microscopies &
Calculate device and indicator parameters necessary for fast beam repositioning and signal detection &
Fast optical phase modulators could reposition beams at \SI{\ca 1}{\giga\hertz} switching rates \iskip
Fluorescence lifetimes in the \SIrange{0.1}{1.0}{\nano\second} range &
Beam repositioning time limits the speed of current systems but these are far from the physical limits \iskip
Fluorescence lifetimes of indicators constrain design of ultra-fast scanning microscopies
\\

Magnetic resonance imaging &
Calculate spatial and temporal resolution of MRI based on spin relaxation times and spin diffusion &
Proton MRI using tissue water \iskip
Approximate $T_1$ and $T_2$ relaxation times and self-diffusion times for tissue water &
Proton MRI is limited by the $T_1$ relaxation time of water to \SI{\ca 100}{\milli\second} temporal resolution and by the self-diffusion of water to spatial resolutions of \SI{\ca 40}{\micro\meter}. $T_1$ pre-mapping could allow $T_2$ contrast on a \SI{\ca 10}{\milli\second} timescale. Achieving these limits for functional imaging requires going beyond BOLD contrast
\\

Ultrasound &
Calculate spatial resolution, signal strength and bandwidth limits on ultrasound imaging &
Speed of sound in brain \iskip
Attenuation length of ultrasound in brain &
Attenuation of ultrasound by brain tissue and bone may be prohibitive at the \SI{\ca 100}{\mega\hertz} frequencies needed for single-cell resolution ultrasound imaging \iskip
Ultrasound may be viable for spatially multiplexed data transmission from embedded devices~\cite{Seo2013}
\\

Molecular recording &
Compute metabolic load and volume constraint for rapid synthesis of large nucleic acid polymers \iskip
Evaluate temporal resolution in simulated experiments using kinetic models~\cite{glaser13} &
Polymerase biochemical parameter ranges \iskip
Metabolic requirements of genome replication &
Molecular recording devices appear to fall within physical limits but their development poses multiple major challenges in synthetic biology \iskip
Synchronization or time-stamping mechanisms are required for temporal resolution to approach the millisecond scale

\\\bottomrule
\end{tabu}
\end{table}

\subsection{Electrical Recording}

In the oldest strategy for neural recording, an electrode is used to measure the local voltage at a recording site, which conveys information about the spiking activity of one or more nearby neurons.
The number of recording sites may be smaller than the number of neurons recorded since each recording site may detect signals from multiple neurons. As a note for practitioners, we use the term ``electrode'' interchangeably with the terms ``recording site'' or ``contact'', meaning a point-like voltage sensing node: many multi-electrode arrays in common use (e.g., the Duke and Utah arrays) are conductive only at the tip, whereas other designs (such as the Michigan array) have multiple contacts along the shaft. Each shaft in a Michigan array would thus constitute multiple ``electrodes'' or ``recording sites'' in our parlance.
Traditional electrical recording techniques keep active devices such as amplifiers outside the skull and therefore do not pose a heat dissipation challenge; this may change if amplifiers are brought closer to the signal sources to reduce noise.

Slowly varying (e.g., $< \SI{300}{\hertz}$) extracellular potentials (LFPs) \cite{reimann2013biophysically, buzsaki2012origin} on the order of \SIrange{0.1}{1}{\milli\volt}, and fields \cite{anastassiou2010effect} on the order of \SIrange{1}{10}{\milli\volt\per\milli\meter}, are generated by neural activity. While LFPs can be filtered from the higher-frequency signals associated with extracellular voltage spikes, these and other effects necessitate maintaining precise potential references (i.e., ground levels) for voltage measurements distributed widely across the brain.

\subsubsection{Spatiotemporal Resolution}

\paragraph{Limits assuming perfect spike sorting}
We begin with an idealized estimate of the number of electrodes required to record from the entire mouse brain, neglecting the difficulty of assigning observed spikes to specific cells (spike sorting), and focusing only on what is needed to detect spikes from every neuron on at least one electrode. The key variable here is the 
maximum distance between an extracellular electrical recorder and a neuron from which it records spikes.
In a first approximation, this is determined by two factors: the decay of the signal with distance from the spiking neuron and the background noise level at the recording site.
We assume that for an electrode to reliably detect the signal from a given neuron, the magnitude of that neuron's signal must be larger than the electrode's noise level. Note, however, that knowledge of spike shape distributions could potentially be used to extract low-amplitude spikes from noise.

The peak signals of spikes from neurons immediately adjacent to an electrode are in the \SIrange{0.1}{1.0}{\milli\volt} range and scale roughly as $e^{-r/r_0}$, where $r$ is the distance from the cell surface and the $1/e$ falloff distance, $r_0$, has been experimentally measured at \SI{\ca 28}{\micro\meter} in both salamander retina~\cite{segev04} and cat cortex~\cite{gray95}, and computed at \SI{\ca 18}{\micro\meter} in a biophysically realistic simulation~\cite{gold07, anastassiou20132}.
However, this decay is strongly influenced by the detailed geometry of neuronal currents and the properties of the extracellular space (e.g., its inhomogeneity, which may lead to a frequency-dependent falloff of the extracellular potential \cite{bedard2004modeling}), making analytical calculation of the decay rate difficult (at large distances, a much slower $1/r^2$ dipole falloff is expected).

Several sources of background noise enter the recordings.
Johnson noise, which arises from thermal fluctuations in the electrode, is \[V\sub{johnson} = \left(4\kb T Z \BW\right)^{1/2}\]
which for physiological temperature, electrodes of impedance $Z = \SI{0.5}{\mega\ohm}$, and $\BW = \SI{10}{\kilo\hertz}$ bandwidth is $V\sub{johnson} \approx \SI{9}{\micro\volt}$.
The recordings are also affected by interference from other neurons, which has been reported to exceed the Johnson noise, and is non-stationary due to changes in the cells' firing properties~\cite{sahani99}. 
The noise and interference from these sources realistically produces \SIrange{>10}{20}{\micro\volt} of voltage fluctuations~\cite{camunas13}.
Current recording setups thus have signal to interference-plus-noise ratios (SINRs) of \num{<100}, where the SINR is defined as the ratio of the peak voltage from immediately adjacent neurons to the voltage fluctuation floor of the electrode.

A limit on the maximum recording distance is the distance at which the signal from the farthest neuron falls below the noise floor, $r\sub{max} \approx r\sub{0}\ln(\text{SINR})$. For $\text{SINR} \approx 100$, $r\sub{max} \approx \SI{130}{\micro\meter}$. For comparison, recent experimental data from multi-site silicon probes has shown few detectable neurons beyond \SI{\ca 100}{\micro\meter} and none detectable beyond \SI{160}{\micro\meter}~\cite{du11}. 
Recordings in the hippocampal CA1 region could not detect spikes from cells farther than \SI{140}{\micro\meter} from the electrode tip \cite{Henze2000}, even after averaging over observations triggered on an intracellularly recorded spike; in hippocampus, this corresponds to a detection volume containing approximately 1000 neurons \cite{Buzaki2004}. Furthermore, in many studies (in monkeys, rats and mice) using multi-electrode arrays with \SIrange{150}{300}{\micro\meter} inter-electrode spacings, no neuron is seen by more than one electrode \cite{Wessberg2000, Carmena2003, Koralek2012, Jin2010}.

Due to the steep local falloff, even improving the SINR by a factor of 10 only extends the maximal recording distance to $r\sub{max} \approx \SI{190}{\micro\meter}$. Assuming packing of the brain into equal sized cubes of side length $d = \frac{2\sqrt{3}}{3}r\sub{max} \approx \SI{150}{\micro\meter}$ gives $N >$ \num{130000} electrodes for whole brain recording using recording sites with $r\sub{max} \approx \SI{130}{\micro\meter}$. Note that $N$ varies as the third power of $r\sub{max}$ and is therefore highly sensitive to variations in the assumed maximal recording distance; the number of required recorders can range from \num{38000} to \num{210000} as $r\sub{max}$ varies from \SI{190}{\micro\meter} to \SI{110}{\micro\meter}.

These calculations, by assuming perfect spike sorting, greatly underestimate the required number of electrodes in practice.
First, signals from the weakest cells are far weaker than those from the strongest cells and the signals from some cells decay much faster than others~\cite{gray95}.
Second, because of neuronal synchronization, the local noise produced by nearby neurons may sometimes be large.
Third, spike waveforms can vary over the course of a recording session \cite{fee1996variability, stratton2012action}.
Finally, with many neurons per electrode or at high firing rates, spikes from detectable neurons will often temporally overlap, making spike sorting difficult.

\begin{figure}[htbp]
\caption{
The voltage signal to interference-plus-noise ratio (SINR) for neurons immediately adjacent to the recording site sets an approximate upper bound on the distance, $r\sub{max}$, between the recording site and the farthest neuron it can sense (blue), due to the exponential falloff of the voltage SINR with distance.
Assuming at least one electrode per cube of edge length $\frac{2\sqrt{3}}{3}r\sub{max}$ in turn limits the number of neurons per recording site (gold), the total number of recording sites (red) and the maximal diameter of wiring consistent with \SI{<1}{\percent} total brain volume displacement (turquoise).
SINR values for current recording setups are \num{<1e2}.
In practice, the number of neurons per electrode distinguishable by current spike sorting algorithms is only \num{\ca 10}, with an estimated information theoretic limit of \num{\ca 100}, so these curves \emph{greatly under-estimate} the number of electrodes which would be required based on realistic spike sorting approaches in a pure voltage-sensing scenario.
}
\label{fig:snrlimits}
\centering
\includegraphics[width=0.7\textwidth]{figs/Fig3.eps}
\end{figure}

\paragraph{Limits from spike sorting}

The previous calculations have assumed that any spike which is visible above the noise on at least one electrode can be detected and correctly assigned to a particular cell, i.e., that the problem of spike sorting can be solved perfectly.  
However, perfect spike sorting is far beyond current algorithmic capabilities and in fact may not be possible in principle. 

To achieve the scenario described above, with $N=\num{130000}$ recording sites per mouse brain, would require each electrode to sort spikes from all 
$\frac{4}{3} \pi r_{\text{max}}^3 \rho_{\text{neurons}}$ neurons in a sphere of radius $r_{\text{max}} \approx \SI{130}{\micro\meter}$ surrounding the recording site, where $\rho_{\text{neurons}} \approx \SI{92000}{\per\milli\meter\cubed}$ is the density of neurons. This assigns $\ca \num{800}$ neurons to a single electrode. Roughly half (i.e., \num{400}) of these neurons will lie at $>\SI{100}{\micro\meter}$ distance from the electrode, and their signals on the electrode will therefore have voltage SINRs of $ < \num{100} e^{-\SI{100}{\micro\meter}/\SI{28}{\micro\meter}} \approx \num{2.8}$, assuming as above that extracellular spike amplitudes decay exponentially in space.
  
Electrical recording can be viewed as a data transmission problem, with the electrode playing the role of a communication channel (see section 4.4). According to the  Shannon Capacity Theorem~\cite{cover06}, the information capacity $C$ of a single analog channel (with additive white Gaussian noise)  is \[C = \BW \log_2 (1 + S/N)\] where $\BW$ is the bandwidth, $S$ is the signal power (proportional to the square of the voltage), and $N$ is the noise power.  Here the bandwidth is $\BW \approx \SI{10}{\kilo\hertz\per\second}$, and the ratio of peak signal power to noise power of a single spike for the outer 400 cells is no more than $\num{2.8}^{\num{2}}$, or $\num{0.5}\times\num{2.8}^{\num{2}}$ using the RMS signal power instead of the peak.  With \num{400} cells emitting \SI{2}{\milli\second} spikes at \SI{5}{\hertz}, there will be an average of \num{4} cells spiking at a time, for  $S/N \approx 0.5 \times 4\times \num{2.8}^{\num{2}} \approx \num{15.7}$ counting the signal power from all the spikes. The channel capacity is then $C \approx \SI{40}{\kilo\bit\per\second}$. This represents the maximum amount of information (e.g., about which neuron spiked when) that the population of spiking neurons can transmit via the electrode which measures them. To transmit \emph{uniquely identifiable} signals from all \num{400} neurons at millisecond temporal precision, however, requires $\SI{1}{\kilo\bit\per\second} \times 400 = \SI{400}{\kilo\bit\per\second}$, which is $>\num{10}\times$ greater than the channel capacity and is therefore not achievable.  Even with optimal temporal compression of $\ca \SI{5}{\hertz}$ spikes (see section 2), we would need to transmit $\ca 400/20 = \SI{20}{\kilo\bit\per\second}$, which is strictly less than the channel capacity and thus possible in principle, but barely so.  Furthermore, the channel capacity given here is an overestimate, since 2.8 is an upper bound on the SINR of the outer cells. On the other hand, note that the use of a nominal \SI{5}{\hertz} average firing rate here (in the estimates of signal to noise ratio and of temporal compressibility) greatly oversimplifies the distribution of firing rates across neurons, as discussed in section 2 above, so this analysis can only be treated as a first approximation.

Based on these rough estimates, perfect spike sorting may not be possible at $\ca \num{800}$ neurons per electrode, in a sphere of radius \SI{130}{\micro\meter} surrounding a recording site, and at the noise levels typical of current electrodes. In essence, there may not be enough room on the electrode’s voltage trace to discriminate such a large number of weak, noisy signals. Note that these information-theoretic limits still apply even if it is possible to resolve temporally overlapping spikes. In fact, the channel capacity is what ultimately limits the ability of a spike sorting algorithm to resolve such overlapping spikes.

To see the regime in which spike sorting becomes feasible, suppose that each electrode is only responsible for spike sorting from the population of $\ca \num{100}$ neurons nearest to the electrode, i.e., in a sphere of radius $r \approx \SI{64}{\micro\meter}$, assuming the \SI{92000}{\per\milli\meter\cubed} cell density from mouse cortex. The outermost \num{50}$\%$ of these neurons are then positioned $>\SI{50}{\micro\meter}$ from the recording site. For these outermost \num{50} neurons, the voltage SINR is $ < \num{100} e^{-\SI{50}{\micro\meter}/\SI{28}{\micro\meter}} \approx \num{17}$ and $S/N < 0.5 \times \num{17}^{\num{2}} \times (\SI{2}{\milli\second} \times \SI{5}{\hertz} \times 50) \approx \num{72.3}$. The channel capacity is therefore $< \SI{62}{\kilo\bit\per\second}$, whereas \SI{50}{\kilo\bit\per\second} is needed for signal transmission from \num{50} neurons without temporal compression versus \SI{\ca 2.5}{\kilo\bit\per\second} with temporal compression. Even 100 neurons per electrode may therefore still be close to the limits of information transmission through the noisy channel corresponding to a single electrode.

In practice these limits are likely to be highly optimistic, since the set of spikes emerging from a neuronal population is far from an optimally designed code from the perspective of multiplexed signal transmission through a voltage-sensing electrode: the waveforms for different neurons are similarly-shaped rather than orthogonal, the spikes emitted by a given neuron vary somewhat in amplitude and exhibit shape fluctuations (signal-dependent noise), and it is not known in advance what the characteristic signal from each neuron looks like (or even how many neurons there are). 

Indeed, current practice is far from the above information-theoretic limits. At present, spike sorting algorithms operating on data from large-scale (\num{250}-\num{500} electrodes), densely spaced (\SI{\ca 30}{\micro\meter}), 2D multi-electrode arrays can reliably identify and distinguish spikes from nearly all of the \num{200}-\num{300} retinal ganglion cells~\cite{marre12,pillow13} in a small patch of retina, and can also infer approximate cell locations through spatial triangulation of spike amplitudes.  This represents a roughly $1:1$ ratio of cells to electrodes. Electrodes with up to $4$ single units can be found in chronically implanted multi-electrode arrays (in both mouse and primate) \cite{costa2004differential, Nicolelis2003}, where the electrodes are sparse, although the average yield of cells per electrode is closer to $1:1$; if only electrodes with at least one cell are counted, the average rises to $\ca 1.5$--$1.7$ cells per electrode. Optimistically, simulations of neural activity suggest that \num{5}-\num{10} neurons per electrode may be distinguishable using current spike sorting algorithms~\cite{pedreira12,sahani99,camunas13}. A limit of $\ca\num{10}$ neurons per electrode would imply $N=\num{7.5e6}$ electrodes to record from all neurons in the mouse brain, which could be accomplished by positioning recording sites on a cubic lattice with \SI{\ca 40}{\micro\meter} edge length.

Future algorithmic improvements could enable sorting from more than $\ca\num{10}$ cells per electrode, but this becomes increasingly challenging. One simple estimate of a reasonable practical limit, for the regime of many neurons per electrode, would be the largest number of neurons that can be sorted without requiring the frequent resolving of temporally overlapping spikes: if the average neuron fires at \SI{\ca 5}{\hertz} and spikes last \SI{\ca 2}{\milli\second}, then at most roughly \num{100} neurons per electrode can be sorted without requiring overlaps to be resolved.  Note that while some present-day algorithms can successfully resolve overlapping spikes~\cite{marre12,pillow13,segev04,ge2011,prentice2011}, they typically do so only in the case where electrodes are densely spaced and any given spike appears on many electrodes, such that spatial information can be used to resolve the overlap. Resolving overlaps when spikes appear on only one or a few channels is more difficult due to noise and spike-shape variation.

Overall,  $\ca\num{100}$ cells per electrode may be taken as a rough estimate of the limits of spike sorting, and would imply $N = \num{750000}$ electrodes and an edge spacing of \SI{\ca 80}{\micro\meter} if a cubic lattice of recording sites were used.  However, we should not exclude the possibility of game-changers which could alter the nature of the recorded data to improve the available information. For instance, CCD cameras could be attached to multi-electrode arrays to aid in the identification and localization of cells, or directional information on the source of spikes could be obtained at each recording site, for example by measuring the directions of gradients in voltage. Systems that capture such additional information could circumvent the above information-theoretic limits and improve spike sorting.

\subsubsection{Volume Displacement}

We require \SI{<1}{\percent} total volume displacement from $N$ recorders.
Wires from each electrode must make it to the surface of the brain, which implies an average length $\ell\approx\SI{4}{\milli\meter}$ for the mouse brain (depending on assumptions about the wiring geometry).

As a rough approximation, consider each recorder to produce a volume displacement associated with a single cylindrical wire, with length $\ell$ and radius $r$.
Thus $r$ must satisfy \[\pi r^2\ell N\sub{min,rd} < 0.01V\sub{brain}\]
Using $N\sub{min,rd} =$ \num{210000} or \num{38000} recording sites (lower and upper limits from the perfect spike sorting case from above) and $\ell\approx\SI{4}{\milli\meter}$ requires wires of radius $r\sub{max} \approx \SI{6.0}{\micro\meter}$, or \SI{2.5}{\micro\meter}, respectively. 
Alternatively, if $\num{7.5e6}$ electrodes must be used (current spike sorting case from above), the required wire radius is \SI{\ca 200}{\nano\meter}.
While these dimensions are readily achievable using lithographic fabrication, there would be a challenge to produce \emph{isolated} wires of such dimensions at scale (perhaps suggesting the use of wire bundles).
Still, volume constraints per se are unlikely to fundamentally limit whole-mouse-brain electrical recording even in the most pessimistic scenario.

\autoref{fig:snrlimits} illustrates the above considerations as a function of the electrode SINR.

\subsubsection{Implanting Electrodes in the Brain}

There are several technology options for introducing many electrodes into a brain.
For example, flexible nanowire electrodes could, in theory, be threaded through the capillary network~\cite{llinas05}.
Capillaries are present in the brain at a density of \SIrange{2500}{3000}{per \milli\meter\cubed}~\cite{schmidt89}, which equates to one capillary per \SI{73}{\micro\meter}, with each neuron lying within \SI{\ca 200}{\micro\meter} of a capillary~\cite{loffredo08}. The minimum capillary diameter is as small as \SIrange{3}{4}{\micro\meter}, although the average diameter is \SI{\ca 8}{\micro\meter}, comparable to the non-deformed size of the red blood cells \cite{Freitas1999}. Blocking a significant fraction of capillaries could lead to stroke or to unacceptable levels of tissue necrosis/liquifaction.

The cerebrospinal ventricles may also provide a convenient location for recording hardware. Furthermore, neural tissues could be grown around pre-fabricated electrode arrays~\cite{jadhav12}, or silicon probes arrays with many nano-fabricated recording sites per probe~\cite{du11} could be inserted into the brain.

Mechanical forces during insertion and retraction of silicon and tungsten microelectrodes from brain tissue have been measured in rat cortex at \SI{\ca 1}{\milli\newton} for electrodes of \SI{\ca 25}{\micro\meter} radius~\cite{jensen03}.
These forces are comparable to the Euler buckling force $F$ of a \SI{2}{\milli\meter} long cylindrical tungsten rod of $r=$\SI{5}{\micro\meter} radius
\[F=\frac{\pi^2 E I}{(K L)^2} \approx \SI{1}{\milli\newton}\]
where $E=\SI{411}{\giga\pascal}$ is the elastic modulus of tungsten, $I=(\pi/2)r^4$ is the moment of inertia of the wire cross-section, $L\approx\SI{2}{\milli\meter}$ is the length of the wire, and $K$ is the column effective length factor which depends on the boundary conditions and is set to $K=1$ here for simplicity.
This suggests that it may be possible to push structures of \SI{<10}{\micro\meter} diameter into brain tissue (see \cite{najafi90} for related calculations). It might be advantageous to pull rather than push wires into the brain (e.g., using applied fields, or perhaps even cellular oxen~\cite{Weibel23082005} to carry the wires), since the thinnest wires could withstand tension forces much higher than the compressive force at which they buckle (although there may also be ways to circumvent buckling, e.g., via rapid vibration).

\subsubsection{Conclusions and Future Directions}

Electrical recording has the advantage of high temporal resolution, but the large number of required recording sites poses challenges for delivery mechanisms.
Ongoing innovations in electrical recording that could be leveraged for dramatic scaling include the development of highly multiplexed probes, multilayer lithography for routing electrical traces, novel methods to implant large numbers of electrodes, smaller electrode impedances to reduce the Johnson noise, amplifiers with lower input-referred noise levels, spike sorting algorithms capable of handling temporally overlapping spikes and adaptively modeling the noise, and hybrid systems integrating electrical recording with implantable optics or other methods.

One challenge for a purely-electrical recording paradigm pertains to the ability to relate the measured electrical signals to specific cells within a circuit.
As the set of neurons recorded by each electrode grows to encompass a large volume around the electrode, it will become more difficult to attribute the recorded spikes to particular neurons.
Furthermore, given the complex geometries of neuronal processes, it is not obvious how to determine the spatial position or layout of a neuron from its electrical signature on a nearby electrode. A given electrode will be positioned near the axons or dendrites of some neurons, and near the cell bodies of other neurons, complicating data interpretation. If the spatial density of recording sites is increased such that many electrodes sample the same neuron, however, this could enable imaging of neuronal morphology and signal propagation via voltage signals across multiple electrodes \cite{bakkum2013tracking}. Currently, extracellular electrical recording also does not allow extraction of molecular information on the cells being recorded, although intracellular electrophysiological recording methods (e.g., \cite{Kodandaramaiah2012}) might enable this for a limited number of cells.

\subsection{Optical Recording}

Optical techniques measure activity-dependent light emissions from neurons, typically generated by fluorescent indicator proteins, although activity-dependent bioluminescent emissions are an emerging possibility.
Current genetically encoded calcium indicators can only distinguish spikes below \SIrange{\ca 50}{100}{\hertz} firing rates without averaging~\cite{Smetters99} due to slow intra-molecular kinetics and indicator saturation at high firing rates, although significant improvements in speed are ongoing \cite{sun2013fast}. Intracellular calcium rises and drops can occur within \SI{1}{\milli\second} and \SIrange{10}{100}{\milli\second} respectively \cite{higley2008calcium}, which sets the ultimate speed limit for calcium imaging. The field of genetically-encoded high-speed fluorescent voltage indicators is also advancing quickly~\cite{Barnett2012, Kralj2012,gong2013enhanced, Storace2013, cao2013genetically, akemann2013two} and these may find particular use in monitoring sub-threshold events \cite{scanziani2009electrophysiology}.

\subsubsection{Spatiotemporal Resolution}

\paragraph{Multiplexing strategies}
For optical approaches, the light originating from the activity of each neuron must be separated from emissions originating from other points in the brain: this can be accomplished in many ways, leading to a variety of architectures for 3D imaging.
\emph{Epi-fluorescence microscopy} images a plane in the specimen (i.e., with depth of field $\text{DOF} = \frac{2n \lambda}{NA^2}$, where $n$ is the refractive index, $\lambda$ is the wavelength and $NA$ is the numerical aperture of the imaging system \cite{quirin2013instantaneous}) onto a spatially-resolved two-dimensional detector (e.g., a CCD camera). 
The focal plane is then scanned in order to reconstruct 3D images; because the entire 3D volume is illuminated during image acquisition, out-of-focus neurons cause background emissions. 
\emph{Light sheet imaging} is similar to epi-flourescence imaging, except that only neurons near the focal plane are illuminated, reducing out of focus noise. Unfortunately, this requires transparent brains~\cite{ahrens13}. 
Volumetric imaging can also be performed in a single snapshot using \emph{lightfield microscopes} \cite{levoy2009recording, broxton2013wave}, which capture the directions of incoming light rays, trading in-plane resolution for axial resolution, or by using multi-focus microscopes \cite{abrahamsson2012fast}. 
In \emph{multi-photon microscopy}, nonlinearities result in fluorescence excitation occurring only near the focal point of the excitation laser, which is scanned across the sample. 
In \emph{confocal scanning microscopy}, only photons from a point of interest are measured due to geometric constraints (e.g., pinholes). 
Alternatively, 3D imaging can be performed via \emph{wavefront coding}, which extends the depth of field by creating an axially-independent point-spread function using known optical aberrations, in combination with computational deconvolution \cite{dowski1995extended}. 
With a known 3D pattern of excitation light, wavefront coding can be applied to 3D fluorescence microscopy without scanning using a 2D detector array \cite{quirin2013instantaneous}. 
Emerging, alternative strategies rely on \emph{tagging} emissions from different sources with distinguishable modulation patterns \cite{wang12, diebold2013digitally, ducros2013encoded, yin2006frequency, wu2006frequency}, or precisely controlling and tracking the timing of light emissions \cite{cheng2011simultaneous}.
Optical techniques thus achieve signal separation by multiplexing spatially (e.g., direct imaging) or temporally (e.g., beam scanning), or often by a combination of the two.

While optics might seem to require a number of photodetectors comparable to the number of neurons (or a similar number of sampling events in the time domain, e.g., for scanning microscopies), new developments suggest ways of imaging with fewer elements.
For example, compressive sensing or ghost imaging techniques based on random mask projections~\cite{wakin06,studer12,tian11,sun13} might allow a smaller number of photodetectors to be used.
In an illustrative case, an imaging system may be constructed simply from a single photodetector and a transmissive LCD screen presenting a series of random binary mask patterns~\cite{huang13}, where the number of required mask patterns is much smaller than the number of image pixels due to a compressive reconstruction.

\paragraph{Effects of light scattering}
Single-photon techniques limit imaging to a depth of a few scattering lengths at the excitation and emission wavelengths of activity indicators: up to \SIrange{\ca 1}{2}{\milli\meter} for certain infrared wavelengths \cite{horton13,kobat09,Kobat2011} vs. a few hundred microns for visible wavelengths \cite{Wilt2009}.
Activity dependent dyes are currently available only in the visible spectrum; indicators operating in the infrared (see \cite{filonov11,shcherbakova13, Shcherbo2009} for far-red fluorescent proteins) could improve imaging depth.

Multi-photon excitation takes advantage of the deeper penetration of infrared light.
Two or more infrared photons may together excite a fluorophore with an excitation peak in the visible range, leading to the emission of a visible photon.
If only one neuron is illuminated with sufficient intensity to generate multi-photon excitation, all photons captured by the detector originate from that neuron, regardless of the scattering of the outgoing light. Hence, the emission pathway is limited less by scattering than by absorption.
This has resulted in imaging at \SI{>1}{\milli\meter} depth \cite{horton13,kobat09, Kobat2011}.

There are at least five options for overcoming visible light scattering to enable signal separation from deep-brain neurons~\cite{alivisatos2012brain, alivisatos13}:

\begin{enumerate}
\item Infrared light can excite multi-photon fluorescence in an excitation-scanning architecture.
\item Fluorophores with both excitation and emission wavelengths in the infrared could be developed.
\item By knowing the precise form of the scattering, it can be possible to correct for it. Emerging techniques based on beam shaping allow transmission of focused light through random scattering media by inverting the scattering matrix~\cite{conkey12}.
Because the scattering properties change over time, this must be done quickly, possibly faster than the imaging frame rate, necessitating high-speed wavefront modulation. This can currently be achieved with digital micro-mirror devices (DMDs), but not with the phase-only spatial light modulators (SLMs) that are used to prevent power losses in the excitation pathways for nonlinear microscopies, although GHz switching of phase-only modulators appears feasible in principle \cite{alivisatos13}. High speed focusing through turbid media is also achievable using all-optical feedback in a laser cavity~\cite{Nixon2013}, and it is even possible to measure the scattering matrix non-invasively \cite{Chaigne2013} using a photo-acoustic technique, or via all-optical approaches based on speckle correlation \cite{bertolotti2012non}. Similar techniques are available for incoherent light \cite{katz2012looking}.

When using short optical pulses, scattering can lead to temporal distortions that degrade the peak light intensity at a focal spot. The \SI{<100}{\femto\second} pulse durations used in two-photon microscopy, for example, are comparable to the time it takes light to travel \SI{30}{\micro\meter} in vacuum. Fortunately, wavefront shaping techniques can correct for scattering-induced temporal distortions as well \cite{mccabe2011spatio, katz2011focusing}.
\item Light sources and/or detectors could be positioned close to the measured neurons, necessitating the use of embedded optical devices.
This could be done using optical fiber \cite{mahalati13} and/or waveguide \cite{zorzos10,zorzos12} technologies, which are developing rapidly.
For example, single-mode fiber cables can support \SI{>1}{\tera\byte\per\second} data rates \cite{ono1998key, bozinovic2013terabit} with low light loss over hundreds of kilometers \cite{miya1979ultimate}. It is possible to directly image through gradient index of refraction (GRIN) lenses \cite{murray12} or optical fibers \cite{mahalati13,kang10,flusberg05}, which provides one way to multiplex multiple observed neurons per fiber.
\item Light emissions from distinct locations can be tagged with distinguishable time-domain modulation patterns, and the emission time-series for each source can later be decoded from the summed signal resulting from scattering \cite{wang12, diebold2013digitally, ducros2013encoded, yin2006frequency, wu2006frequency, cheng2011simultaneous}. For example, ultrasound encoding \cite{wang12, judkewitz2013speckle}, which frequency-tags light emissions from a known location via a mechanical Doppler shift of the emitter \cite{mahan98}, provides a generic mechanism to sidestep problems of elastic optical scattering, although it requires distinguishing MHz frequency modulations in THz light waves (part per million frequency discrimination). Radio-frequency tagging of light emissions via a digitally synthesized optical approach is also an option and may be applicable to combatting the problem of emission scattering in deep-tissue, multi-point, multi-photon imaging \cite{diebold2013digitally}.
\end{enumerate}

\paragraph{Speed of beam scanning}
The speed of scanning microscopes is currently limited by beam repositioning times (\SI{\ca 0.1}{\micro\second} for spinning disk~\cite{mahalati13,kang10,flusberg05}, \SI{\ca 3}{\micro\second} for piezo-controlled linear scan mirrors, \SI{\ca 10}{\micro\second} for acousto-optic deflectors~\cite{vucinic07}, \SI{\ca 8}{\kilo\hertz} line scans for resonant galvanometer mirrors). The \SI{10}{\micro\second} repositioning time for acousto-optic deflectors is set by the speed of sound in the deflector crystal, while scanning mirrors and spinning disks are limited by inertia.
Note that \SI{0.1}{\micro\second} repositioning time for current spinning-disk confocal techniques would require 10 seconds per frame for whole mouse brain imaging with a single scanned beam $\left(\SI{1e-7}{\second\per site}\times\SI{1e8}{sites\per brain}\right)$. There is therefore a need for a \num{1e4} fold improvement in beam repositioning time and/or beam parallelization in order to achieve \SI{1}{\kilo\hertz} imaging frame rates for whole mouse brains.

One strategy to implement parallelization would exploit (yet to be developed) fast, high-resolution phase modulator arrays to arbitrarily re-shape coherent optical wavefronts for multisite holographic multi-photon excitation in 3D \cite{alivisatos13, papagiakoumou2010scanless, vaziri2012reshaping}. With fast phase modulation (e.g., \SI{\ca 1}{\giga\hertz}), beating each excitation spot at a different frequency could allow a single detector to probe multiple sites in parallel, despite arbitrarily-large scattering of the outgoing light \cite{alivisatos13}. Emerging optical techniques may provide alternative means to implement similar strategies \cite{diebold2013digitally}. Temporal multiplexing of excitation pulses at distinct locations (e.g., via few-nanosecond beam delays) also allows parallelization of the excitation beam while combatting scattering ambiguity of the emitted light \cite{cheng2011simultaneous}. Furthermore, temporal focusing techniques in two-photon microscopy (depth-dependent pulse duration) can excite an entire plane or line within the sample \cite{oron2005scanningless, tal2005improved, sela2013ultra, packer2013targeting}, as well as arbitrary patterns of points \cite{papagiakoumou2010scanless}, potentially allowing fast axial scanning (somewhat analogous to light-sheet techniques used with transparent samples). This method intrinsically corrects for scattering of the excitation light \cite{papagiakoumou2013functional}, although not of the emission light. Like other multi-photon techniques, however, all these methods remain highly dissipative, as discussed below.

Fluorescence lifetimes in the \SIrange{0.1}{1}{\nano\second} range~\cite{striker99} ultimately constrain the design of scanning fluorescence microscopies. A delay of \SI{0.1}{\nano\second} per mouse neuron per frame corresponds to only \SI{100}{\hertz} frame rate without parallelization, implying that parallelization into at least $10$ to $100$ beams is essential. The fluorescence lifetime also limits the achievable modulation frequencies in beat-frequency-multiplexed parallelization strategies \cite{diebold2013digitally}, bit lengths in encoded strategies \cite{ducros2013encoded}, and temporal offsets in temporally-multiplexed strategies \cite{cheng2011simultaneous}, suggesting that parallelization of detectors may be necessary in a strongly scattering environment. Depending on the degree of parallelization, which constrains the achievable dwell times given a fixed frame rate, photon counts may also become a limiting factor for high-speed scanning in some approaches.

\paragraph{Diffraction}
Using the small angle approximation, the diffraction-limited angular resolution of an aperture is $\theta \approx \frac{\Delta x}{y} \approx \frac{\lambda}{D}$, where $\Delta x$ is the spacing which must be resolved, $y$ is the imaging depth, $\lambda$ is the wavelength, and $D$ is the aperture diameter. Thus distinguishing neurons which are \SI{10}{\micro\meter} apart and at a depth of \SI{10}{\milli\meter} requires a lens aperture $D$ of $> \SI{1}{\milli\meter}$ when $\lambda \approx \SI{1}{\micro\meter}$. Diffraction therefore does not appear to be a limiting factor for cellular resolution imaging, except in the context of microscale apertures that might find use in embedded optics approaches.

\subsubsection{Energy Dissipation}

Light that does not leave the brain is ultimately dissipated as heat.
The total light power requirements for optical measurement of neuronal activity using fluorescent indicators depend on factors including
fluorophore quantum efficiency,
absorption cross-section,
activity-dependent change in fluorescence,
background fluorescence,
labeling density,
activation kinetics,
detector noise,
scattering and absorption lengths,
and others. Unfortunately, many of these variables are unknown or highly dependent on particular experimental parameters.

A statistical analysis of photon count requirements for spike detection (in the context of calcium imaging) can be found in~\cite{wilt13}, which derived a relationship between the number of background photon counts ($N\sub{bg}$) and the number of signal photon counts required for high fidelity spike detection given photon shot noise. This scales roughly as $N\sub{signal} > 3\sqrt{2N\sub{bg}}$, even at low absolute photon count rates.
While this analysis governs the number of detected photons, the number of emitted photons will be higher due to losses.
In one example using two-photon excitation, \SI{5}{\percent} of the emitted photons were captured by the photodetector~\cite{kim99}. One implication of photon shot noise is that faster-responding indicators (e.g., voltage indicators which respond in near-real-time to the membrane potential) must be brighter.

\paragraph{Multi-photon excitation}

Multi-photon experiments rely on short laser pulses with high peak light intensities at a focused excitation spot to excite nonlinear transitions~\cite{kim99}.
This imposes an experimentally relevant physical limit: at least one excitation pulse of sufficient intensity per neuron per frame is required in order to excite multi-photon fluorescence during each frame.
Assuming \SI{1}{\kilo\hertz} frame rate and \SI{0.1}{\nano\joule} pulses \cite{cheng2011simultaneous}, delivering only one pulse per neuron per frame would dissipate roughly $\left(\num{1e8}\times\SI{1}{\kilo\hertz}\times\SI{0.1}{\nano\joule}\right) \SI{10}{\watt}$ in the mouse brain, which is clearly prohibitive.
This is a lower bound because, in general, more than one excitation pulse per neuron per frame may be required to excite detectable fluorescence (e.g., one reference reported 12 pulses per spot~\cite{kim99}).
For three-photon excitation, the situation will be even worse as higher peak light intensities are required to excite three-photon fluorescence. 

Could the single-pulse energy be reduced while maintaining efficient two-photon excitation? The number of two-photon (2P) transitions excited per fluorophore per pulse is $n_a = F^2 C / t$, where $F$ is the number of photons per pulse per area in units of \si{photon\per\centi\meter\squared}, $C$ is the two-photon cross-section in units of \si{\centi\meter\tothe{4}\second\per photon}, and $t$ is the pulse duration in seconds.
This can be approximated as
\[n_a = \left(\frac{\frac{E}{h c / \lambda}}{\left(\frac{\lambda}{2 \left(\NA\right)}\right)^2}\right)^2 \frac{C}{t} = \left(\frac{4E\left(\NA\right)^2}{h c \lambda}\right)^2 \frac{C}{t}\]
where $\NA$ is the numerical aperture of the focusing optics, $E$ is the pulse energy and $\lambda$ is the stimulation wavelength.
For a 2P experiment with \SI{100}{\femto\second}, \SI{0.1}{\nano\joule} pulses, assuming a 2P cross section~\cite{masters06, drobizhev2011} of \SI{1e-48}{\centi\meter\tothe{4}\second\per photon} (i.e., 100 Goeppert-Mayer units \cite{Goeppert-Mayer1931}, comparable to that of DsRed2 \cite{drobizhev2011}), $\lambda=\SI{900}{\nano\meter}$ and $\NA=1.0$, $n_a \approx \frac{1}{2}$.
Thus, a few pulses are likely necessary and sufficient to excite 2P fluorescence by each fluorophore within the focal spot. With a 2P cross section above \SI{1e-47}{\centi\meter\tothe{4}\second\per photon} (1000 Goeppert-Mayer units, higher than that of any fluorescent protein that we are aware of \cite{drobizhev2011}), one could reduce the pulse energy by an order of magnitude (and hence $n_a$ by two orders of magnitude) while maintaining $n_a > \frac{1}{20}$, i.e., one in twenty fluorophores excited by each pulse. Reducing the pulse energy much further might lead to unacceptably low excitation levels. Alternatively, shorter pulse durations could increase the light intensity, and hence 2P excitation probability, at fixed pulse energy. 

Quantum dots can have 2P cross sections much higher than those of fluorescent proteins:  water-soluble cadmium selenide–zinc sulfide quantum dots have been reported with 2P cross sections of 47000 Goeppert-Mayer units and are compatible with in-vivo imaging \cite{Larson30052003}. These would allow excitation efficiencies of $n_a > \frac{1}{20}$ at \si{\pico\joule} pulse energies, bringing whole-brain 2P imaging into the \SI{\ca 100}{\milli\watt} range. Thus, the use of quantum dots or other ultra-bright multi-photon indicators could be decisive for supporting the energetic feasibility of multi-photon methods at whole brain scale; there are also plausible strategies for coupling quantum dot fluorescence to neuronal voltage \cite{Marshall2013}. However, some quantum dots have long fluorescence lifetimes~\cite{Dahan2001}, which may constrain scan speed.

For comparison to current practice, in a typical multi-photon experiment on mice, \SI{\ca 50}{\milli\watt} of time-averaged laser power at the sample was used with a dwell time of \SI{\ca 3}{\micro\second}~\cite{wilson07}, corresponding to \SI{\ca 150}{\nano\joule} energy dissipation per spot per frame. This dwell time would allow imaging only \num{\ca 300} neurons at millisecond resolution with a single scanned excitation beam. The average excitation power here is likely already close both to whole-brain thermal dissipation limits, and to photo-damage limits for pulsed two-photon excitation \cite{hopt2001highly,konig1997cellular}.

\subsubsection{Bioluminescence}
To work around the requirement for large amounts of excitation light, bioluminescent rather than fluorescent activity indicators could be used \cite{naumann2010monitoring, martin2007vivo, martin2008vivo}.
Consider a hypothetical activity-dependent bioluminescent indicator emitting at \SI{\ca 1700}{\nano\meter} (IR), in order to evade light scattering.
As a crude estimate, assuming that 100 photons must be collected by the detector per neuron per \SI{1}{\milli\second} frame, and \SI{1}{\percent} light collection efficiency by the detector relative to the emitted photons, \SI{\ca 100}{\micro\watt} of bioluminescent photons emissions are required for the entire mouse brain (using $E_{\text{photon}} = hc/\lambda$).
This would be feasible from the perspective of heat dissipation.
By contrast, in a 1-photon fluorescent scenario, if 100 excitation photons must be delivered into the brain to generate a single fluorescent emission photon, the power requirement becomes \SI{10}{\milli\watt}, which is on the threshold of the steady-state heat dissipation limit.
Therefore, bioluminescent indicators could potentially circumvent problems of heat dissipation even in the 1-photon case.

The widely used bioluminescent protein firefly luciferase is \SI{\ca 80}{\percent} efficient in converting ATP hydrolysis coupled with luciferin oxidation into photon production, yielding \num{\ca 0.8} photons per ATP-luciferin pair consumed~\cite{seliger60}, and has \SI{\ca 90}{\percent} energetic efficiency in converting free energy to light production.
Heat dissipation associated with the luciferase biochemistry itself is therefore not a significant overhead relative to the \SI{100}{\micro\watt} of emitted photons calculated above. 
In the same scenario, however, each neuron would consume \num{\ca 6e8} additional ATP molecules per minute in order to power the bioluminescence, which is within the limits of cellular aerobic respiration rates (\SI{\ca 1}{\femto\mole\ O\textsubscript{2}} per minute per cell~\cite{molter09}, with \num{\ca 30} ATP per 6 O\textsubscript{2}, hence \num{3e9} molecules ATP synthesized per minute from ADP via glucose oxidation), but not by a large margin.
Transient increases in metabolic rate are possible: energy dissipation more than doubles in the mouse during high physical activity~\cite{speakman13}.
Therefore, whole-brain activity-dependent bioluminescence, at speeds high enough to achieve millisecond frame rates, may be metabolically taxing for the cell but is nevertheless plausible as a light generation strategy. Note that we have not treated the energy required to bio-synthesize the luciferin compound, which may create additional overhead (though conceivably luciferin could be provided exogenously).

\subsubsection{Conclusions and Future Directions}

Scattering of visible light in the brain creates a problem of signal-separation from deep-brain neurons.
Multi-photon techniques, which scan an infrared excitation beam, can work around this scattering problem.
However, current multi-photon techniques using fluorescent protein indicators, when applied at whole brain scale, would dissipate too much power to avoid thermal damage to brain tissue.
Systems (such as plasmonic nano-antennas~\cite{blanchard11} or subwavelength metallic gratings~\cite{Harats11}) that could locally excite multi-photon fluorescence without the need for high-energy laser pulses could conceivably ameliorate this issue. 
Importantly, quantum dots show promise as ultra-bright multi-photon indicators, if they can be targeted to neurons and optimized in terms of fluorescence lifetime.
New methods besides multi-photon techniques could also work around the scattering of visible light in the brain.
For example, fluorophores or bio-luminescent proteins could be developed which operate at infrared wavelengths.
A compelling example from nature is the black dragonfish, which generates far red light (\SI{\ca 705}{\nano\meter}) via a multi-step bioluminescent process (using this light to see in deep ocean waters)~\cite{widder84,campbell87}.
A large set of activity indicators with distinguishable colors, generated through a combinatorial genetic recombination mechanism such as BrainBow~\cite{livet07}, could also improve signal separation.
Targeting, via protein tags, of activity indicators to specific locations --- such as the axon, soma, soma and proximal dendrites, distal dendrites, pre-synaptic terminals, post-synaptic terminals, or intact synapses --- could also aid in signal discrimination \cite{arnold2007polarized, el2001polarized, correa2009rapid, jacobs2003soma, vacher2008localization, boeckers2005c, feinberg2008gfp, yamagata2012transgenic}. In addition, implanted optical devices, which place emitters and detectors within a few scattering lengths of the neurons being probed, could potentially obviate the negative effects of scattering and allow visible-wavelength indicators to be used without a need for multi-photon excitation. In principle, excitation and detection do not need to make use of the same modality. For example, photoacoustic microscopy \cite{filonov12} uses pulsed laser excitation to drive ultrasonic emission, leading to optical absorption contrast. Such asymmetric techniques impose fundamentally different requirements from pure-optical techniques relative to fluorophore properties, required light intensities and other parameters.

\subsection{Embedded Active Electronics}

The preceding sections have assumed that electrical or optical signals from the recorded neurons are shuttled out of the brain before digitization and storage, but it is also conceivable to develop embedded electronic systems that locally digitize and then store or transmit (e.g., wirelessly) measurements of the activities of nearby neurons.
This could allow for shorter wires in electrical recording approaches, and for shorter light path lengths in optical recording approaches, as well as for more facile (e.g., non-surgical) delivery mechanisms for the recording hardware.

Integrated circuits have shrunk to a remarkable degree: in about 3 years, following the Moore's law trajectory, it will likely be possible to fit the equivalent of Intel's original 4004 micro-processor in a \SI{10 x 10}{\micro\meter} chip area.
Functional wirelessly powered radio-frequency identification (RFID) chips as small as \SI{50}{\micro\meter} in diameter have been developed~\cite{Usami2007} and tags with chip-integrated antennas function at the \SI{400}{\micro\meter} scale~\cite{ImpinjMonzaFive}.
Integrated neural sensors including analog front ends are also scaling to unprecedented form factors: a \SI{250 x 450}{\micro\meter} wireless implant -- including the antenna, but not including a \SI{\ca 1}{\milli\meter} electrode shank used to separate signal from ground -- draws only \SI{2.5}{\micro\watt} per recording channel~\cite{biederman13}. The system operates at \SI{\ca 1}{\milli\meter} range in air, powered by a transmitter generating \SI{\ca 50}{\milli\watt} of transmitted power.
Note that for a single such embedded recording device, the heat dissipation constraint is set not by the device's own dissipation (\SI{10}{\micro\watt} for four recording channels) but rather by the RF specific absorption rate limit associated with the \SI{50}{\milli\watt} transmit power.

Possibilities may exist for non-surgical delivery of embedded electronics to the brain: remarkably, cells such as macrophages (\SI{\ca 13}{\micro\meter} in size) can engulf structures up to at least \SI{20}{\micro\meter} in diameter~\cite{cannon92} and have been studied as potential delivery vehicles for nano-particle drugs~\cite{Kadiu11}, suggesting that they might be used to deliver tiny microchips. T-cells and other immune cells can trans-migrate across the blood brain barrier \cite{Engelhardt2006} and ghost cells (membranes purged of their contents) engineered to encapsulate synthetic cargo \cite{Cinti2011} can fuse with neurons \cite{Hikawa1989162}. It might even be possible to engineer such cell-based delivery vehicles to form electrical gap junctions \cite{Spruston2001669} with neurons or to act as local biochemical sensors \cite{nguyen2009vivo}.

The real-time transmission bandwidth requirements for neural recording could be significantly reduced if it is only desired to take a ``snapshot'' of neural activity patterns over a limited period of time, but this would require a large amount of local storage. For example, flash memory can store $> \SI{10}{\mega\bit}$ of data in a device \SI{100}{\micro\meter} on a side: a 64 giga-byte microSD card with \SI{1.5}{\centi\meter\squared} area corresponds to 34 mega-bits per $(\SI{100}{\micro\meter})^2$ area.
Even denser forms of memory storage are under development and could perhaps be used in a one-time-write mode in the context of neural recording long before they become commercially viable for use as rewritable media in the electronics industry.

Here we consider the power dissipation associated with embedded electronic recording devices, as well as the constraints on possible methods to power them.
In the next section, we describe how physics constrains the data transmission rates from such devices.

\subsubsection{Power Requirements for Recording}

Any embedded system needs to process data, in preparation for either local storage or wireless transmission.
Physics defines hard limits on the required power consumption associated with data processing (neglecting the possibility of reversible logic architectures~\cite{bennett73}), arising from the entropy cost for erasing a bit of information~\cite{landauer61}:
\[E\sub{Landauer} = \ln(2)\ \kb T\approx \SI{3e-21}{\joule\per\bit} \tag{the Landauer limit}\]
Ambitious yet physically realistic values for beyond-CMOS logic lie in the tens of $\kb T$ per bit processed~\cite{yablonovitch08}.
Scaling \SI{40}{$\kb T$\per\bit} to record raw voltage waveforms at a minimal \SI{1}{\kilo\bit\per\second\per neuron} (e.g. \SI{1}{\kilo\hertz} sampling rate, 1 bit processed per neuron per sample), the total power consumption for whole mouse brain recording could in principle be as low as \SI{\ca 16}{\nano\watt}. While this leaves \num{>1e6}-fold more room (energetically) for increased data processing (more required bit flips per second), or energetic inefficiency of the switching device (greater dissipation per bit), realistic devices in the near-term may in fact require this much overhead, if not more.
This necessitates a more detailed consideration of limiting factors for today's microelectronic devices.

In the context of electrical recording, the first step that must be performed by an embedded neural recording device is digitization of the voltage waveform.
Until \si{\milli\volt}-scale switching devices are developed (see discussion below), it is necessary to amplify the \SIrange{\ca 10}{100}{\micro\volt} spike potential in order to drive digital switching events in downstream gates.
During this sub-threshold amplification step, a CMOS (or BJT) device will dissipate static power (associated with a bias current). 
Importantly, in order to decrease the input-referred voltage noise of this amplification process, it is necessary to increase the bias current and hence the static power dissipation.
For a simple differential transistor amplifier, the minimal bias current scales as
\[I\sub{d} = \frac{\pi}{2} \frac{4\kb T}{V\sub{noise}^2} \frac{\kb T}{q} \BW\]
where $V\sub{noise}$ is the input-referred voltage noise of the amplifier and $q$ is the electron charge.
For an extracellular recording with $\BW = \SI{10}{\kilo\hertz}$ and $V\sub{noise} = \SI{10}{\micro\volt}$, this implies a minimal bias current $I\sub{d}\approx\SI{60}{\nano\ampere}$ or a minimal static power of $\left(I\sub{d} V\sub{dd}\right)\approx\SI{6e-8}{\watt}$ at $V\sub{dd}\approx\SI{1}{\volt}$ operating voltage.
Assuming 10 neurons per recording channel, there are then 7.5 million recording channels for a mouse brain, which gives a power dissipation associated with signal amplification of \SI{\ca 500}{\milli\watt}.
Note that realistic analog front ends (which are subject to $1/f$ noise and require multiple gain stages) draw 6$\times$--10$\times$ greater bias current, quantified by the noise efficiency factor (NEF)~\cite{steyaert87}, to achieve the same input-referred noise levels.

Local on-chip digital computation also incurs an energy cost.
Current CMOS digital circuits consume 5--6 orders of magnitude~\cite{tucker11,koomey11,yablonovitch08,tucker11b} more energy per switching event (\SI{\ca 1}{\femto\joule\per\bit} including charging of the wires~\cite{tucker11}) compared to the Landauer limit (e.g., for a digital CMOS inverter, and ignoring the static power associated with the leakage current).
This corresponds to a \SI{\ca 1}{\femto\farad} total load capacitance at \SI{1}{\volt} operating voltage. For \SI{100}{\giga\hertz} switching rates ($\SI{1e8}{neurons} \times \SI{1}{\kilo\hertz}$) as above, this corresponds to \SIrange{0.01}{0.1}{\milli\watt}.
Realistic architectures, however, will incur overhead in the number of switching events required to store, compress and/or transmit neural signals, likely bringing the power consumption into an unacceptable range (e.g., \num{1000} bits processed per sample would be \SI{100}{\milli\watt} here).
To take a concrete example, commercial RFID tags consume \SI{\ca 10}{\micro\watt}~\cite{rfidsheet}.
At a chip rate of \SI{256}{\kilo\bit\per\second} (with a Miller encoding of 2), this yields \SI{7.8e-11}{\joule\per\bit}, which is \num{\ca 10} orders of magnitude higher than the Landauer limit.
Applying current RFID technology to whole mouse brain recording at \SI{1}{\kilo\bit\per\second\per neuron} would thus draw \SI{\ca 8}{\watt} of power.
Therefore, at least 2--3 orders of magnitude reduction in power consumption will be necessary in order to apply embedded electronics for whole-brain neural recording.

Until recently, the energy efficiency of digital computing has scaled on an exponential improvement curve~\cite{koomey11}.
This was a consequence of Moore's law and Dennard scaling, where both the capacitance of each transistor and its associated interconnect, as well as the operating voltages, were reducing with the device dimensions.
Unfortunately, issues related to device variability and the 3D structures needed to maintain the on-to-off current ratio have largely stopped the reduction in effective capacitance per device; current devices are stuck at \SIrange{\ca 100}{200}{\atto\farad} for a minimum sized transistor.
Furthermore, the exponential increase in leakage current that comes along with the scaling of the threshold voltage in this scenario has precluded substantial further decreases in voltage at a given performance level.
Indeed, for the past several technology generations (since about 2005), CMOS devices have operated at a supply voltage of \SI{\ca 1}{\volt}.

While neural signal processing does not demand very stringent transistor speeds and so reductions below \SI{\ca 1}{\volt} are certainly feasible, a fundamental limitation in scaling the supply voltage still remains.
Specifically, CMOS has a well-defined minimum-energy per bit and an associated minimum-energy operating voltage that is defined by the tradeoff between static (leakage) and dynamic (switching) energy:
as the operating voltage is decreased, the capacitive switching energy decreases, but the ratio of currents in the on and off states, $I\sub{off}/I\sub{on}$, increases exponentially, increasing the energy associated with leakage (this effect is independent of the threshold voltage in the sub-threshold regime).
For practical circuits, the supply voltage that leads to this minimum energy is on the order of \SIrange{300}{500}{\milli\volt}, and thus supply voltage scaling will at most provide 3$\times$--10$\times$ improvement in energy over today's designs.

Thus, a paradigm shift in microelectronic hardware is needed to reduce power by several orders of magnitude if we are to approach the physical limits.
Developing a switching device operating in the \si{\milli\volt} range, rather than the \SI{1}{\volt} range of current transistors, would allow $\left(\SI{1}{\volt}/\SI{1}{\milli\volt}\right)^2=\num{1e6}$ fold reduction in power consumption~\cite{yablonovitch08}.
Electronic circuits constructed using analog techniques~\cite{sarpeshkar98}, which sometimes rely on bio-inspired computational architectures, show promise for reducing energy costs by up to five orders of magnitude~\cite{rapoport09,sarpeshkar98,mandal07}, depending on the nature of the computation and the required level of precision.

\begin{figure}[htbp]
\caption{
Energy cost of elementary operations across a variety of recording and data transmission modalities, expressed in units of the thermal energy (left axis) and as a power assuming \SI{100}{\giga\hertz} switching rate (right axis). The Landauer limit of $\kb T \ln 2$ sets the minimum energy associated with a logically irreversible bit flip. The practical limit will likely lie in the tens of $\kb T$ per bit \cite{yablonovitch08}, comparable to the free energy release for hydrolysis of a single ATP molecule (or addition of a single nucleotide to DNA or RNA). The energy of a single infrared photon is \SI{\ca 50}{$\kb T$}. Single gates in current CMOS chips dissipate \SIrange{\ca 1e5}{1e6}{$\kb T$} per switching event, including the capacitive charging of the wires interconnecting the gates (red curve). The switching energy for the gate, not including wires, is \num{\ca 100}$\times$ lower (blue curve). The power efficiency of CMOS has been on an exponential improvement trend due to the miniaturization of components according to Moore's law (data re-digitized from~\cite{tucker11}), although power efficiency gains have slowed recently. Current RFID chips compute and communicate at \SIrange{\ca 1e9}{1e10}{$\kb T$} (\SI{>10}{\pico\joule}) per bit transmitted, while the total energy cost per floating point operation in a 2010 laptop was \SI{\ca 1e12}{$\kb T$}. The power associated with a minimal low-noise CMOS analog front end for signal amplification corresponds to \SI{\ca 500}{\milli\watt} at whole mouse brain scale. A single two-photon laser pulse at \SI{0.1}{\nano\joule} pulse energy corresponds to \SI{\ca 1e10}{$\kb T$}. For comparison, the \SI{40}{\milli\watt} approximate maximal allowed power dissipation, according to \anref{sec:constraints} above, with its equivalent per-bit energy of \SI{\ca 1e8}{$\kb T$} at the minimal \SI{100}{\giga\bit\per\second} bit rate.
}
\label{fig:cmos}
\centering
\includegraphics[width=0.76\textwidth]{figs/Fig4.eps}
\end{figure}

\autoref{fig:cmos} shows the power consumption per bit processed for several technology classes as well as the corresponding total power consumption required for whole brain readout, assuming a minimal whole-brain bit rate of \SI{100}{\giga\bit\per\second}.

\subsubsection{Powering Embedded Devices}

Embedded systems need power, which could be supplied via electromagnetic or acoustic energy transfer, or could be harvested from the local environment in the brain.

There are two key regimes for wireless electromagnetic power transfer: non-linear device rectification and photovoltaics.
If the single-photon energy is sufficient to allow electrons to move from the valence to the conduction band---that is, $\text{band gap} < h\nu/q$, where $q$ is the electron charge, $h$ is Planck's constant, and $\nu$ is the frequency of the photon---a photovoltaic effect can occur.
Otherwise, electromagnetic energy is converted to voltage by an antenna and non-linear device rectification may occur.

When photon energies  are much lower than the band gap, power conversion is governed by the total RF power and by the impedances of the antenna and the rectifier, rather than by the individual photon energy.
For a monochromatic RF source, there is no thermodynamic or quantum limit to the RF to DC conversion efficiency, other than the resistive losses and threshold voltages for a semiconductor process.
For rectification, when the input voltage to the rectifier is much higher than a semiconductor process threshold, conversion efficiencies of \SI{85}{\percent} have been achieved~\cite{sun02}.
At low input voltages relative to the semiconductor process threshold, efficiencies as high as \SI{25}{\percent} and \SI{2}{\micro\watt} load have been achieved (see \cite{mandal07} for an analysis of power efficiency).
Ultimately, rectification improvements are dependent on the same improvements which will be needed for next-generation low-power computing: \si{\milli\volt} scale switching devices (promising research directions include tunnel FETs~\cite{ionescu11}, electromechanical relays~\cite{liu12} and other options). 

While efficient rectification is thus not a fundamental issue, capturing sufficient RF energy in the first place becomes increasingly challenging as microchips become smaller and more deeply embedded in tissue. Wireless electromagnetic power transfer imposes range constraints due to the loss in power density with distance.
For directional power transfer, placing the receiver at the edge of the transmitter's near field (the Rayleigh distance $\frac{D^2}{4\lambda}$ where $D$ is the transmitter aperture) has advantages in terms of energy capture efficiency~\cite{ozeri10}, whereas for omni-directional antennas it is advantageous to place the receiver as close as possible to the transmitter. If embedded chips are oriented randomly with respect to the transmitter, the radiation patterns of their antennas cannot be highly directional, i.e., their gains $G_r$ (a measure of directionality) must be close to one. In the far field, this lack of directionality limits power capture by the antenna (due antenna reciprocity~\cite{gershenfeld2000physics}): the maximal power $P_A$ available to the chip is \[P_A = \frac{G_r P\sub{rad} \lambda^2}{4\pi}\] where $P\sub{rad}$ is the power density of radiation around the antenna, $\lambda$ is the wavelength and $G_r \approx 1$ for a non-directional antenna \cite{mandal07}.

It may be possible to power devices with pure magnetic fields (which are highly penetrant) via near-field (non-radiative) inductive coupling, which is widely used in systems ranging from biomedical implants to electric toothbrushes, or conceivably by using magneto-electric materials~\cite{Kitagawa2010, Priya2009, Yue2012, Fiebig2005}. For the case of simple inductive coupling, however, the tiny cross-sections of micro-devices limit the amount of power which can be captured: a loop of \SI{10}{\micro\meter} diameter in an applied field of \SI{1}{\tesla} switching at \SI{1000}{\hertz} produces an induced electromotive force of only \SI{0.1}{\micro\volt}. Assuming a copper loop (\SI{\ca 17}{\nano\ohm\meter} resistivity) with \SI{1 x 1}{\micro\meter} cross-section and \SI{40}{\micro\meter} length (around the outer edge of the chip) gives a power ($V^2/R$) of only \SI{\ca 15}{\femto\watt} associated with the induced current. In general, the use of coupled high-$Q$ resonators can increase the range and efficiency of near-field electromagnetic power transfer by orders of magnitude \cite{Karalis2008} compared to non-resonant inductive power transfer and may be particularly relevant for implanted devices \cite{Ho2013}. Unfortunately, at the \SI{\ca 10}{\micro\meter} length scale, the achievable on-chip inductances and capacitances are severely limited, which restricts the operating range of any resonant device to high frequencies $\left( f\sub{resonant} = \left(2\pi\sqrt{LC}\right)^{-1} \right)$ which will be attenuated by tissue. Electromagnetic near-field power transfer though tissue to ultra-miniaturized microchips may thus be inefficient, again due to low capture efficiency of the applied fields by tiny device cross-sections.

Alternatively, if the photon energy is above the silicon band gap ($\lambda < \frac{h c}{qV\sub{th}} \approx \SI{3}{\micro\meter}$ or less for silicon), the chip is essentially acting as a photovoltaic cell.
There is no thermodynamic or quantum limit to the conversion efficiency of light to DC electrical power for monochromatic sources, other than resistive losses and dark currents in the material (\SI{86}{\percent} in GaAs for example~\cite{bett08}). Again, however, capturing sufficient light becomes difficult for tiny devices.
To supply \SI{10}{\micro\watt} (typical of current wirelessly-powered RFID chips) photovoltaically to a \SI{10 x 10}{\micro\meter} (cell sized) chip at \SI{34}{\percent} photovoltaic efficiency requires a light intensity of \SI{\ca 300}{\kilo\watt\per\meter\squared} at the chip, which is prohibitive. Furthermore, in the use of infrared light for photovoltaics, the penetration of the photons through tissue is decreased compared to radio frequencies.

Piezoelectric harvesting of ultrasound energy by micro-devices is a possibility \cite{Seo2013}. The efficiency of electrical harvesting of mechanical strain energy in piezoelectrics can be above \SI{30}{\percent} for materials with high electromechanical coupling coefficients (e.g., PZT)~\cite{safari08, xu12}. The losses in the piezoelectric transduction process are well described by models such as the KLM model~\cite{krimholtz70,castillo03}. 

An alternative to wireless energy transmission is the local harvesting of biochemical energy carriers. Implanted neural recording devices could conceivably be powered by free glucose, the main energy source used by the brain itself.
The theoretical maximum thermodynamic efficiency for a fuel cell in aqueous solution is equal to that of the hydrogen fuel cell: $\Delta G^0/\Delta H^0 = \SI{83}{\percent}$ at \SI{25}{\degreeCelsius}.
Furthermore, if glucose is only oxidized to gluconic acid, the Coulombic (electron extraction) efficiency is at most \SI{8.33}{\percent}~\cite{rapoport12}, which bounds the thermodynamic efficiency.
The blood glucose concentration in rats has been measured at \SI{\ca 7.6}{\milli\Molar}, with an extracellular glucose concentration in the brain of \SI{\ca 2.4}{\milli\Molar}~\cite{silver94}.
A hypothetical highly miniaturized neural recorder with a device area of \SI{25 x 25}{\micro\meter} and efficiency of \SI{80}{\percent}, processing a blood flow rate of \SI{\ca 1}{\milli\meter\per\second}~\cite{ivanov81} could extract $(\SI{80}{\percent})(\SI{7.6}{\milli\Molar})(\SI{25}{\micro\meter})^2(\SI{1}{\milli\meter\per\second})(\SI{2880}{\kilo\joule\per\mole})\approx \SI{11}{\micro\watt}$, which is sufficient for low-power device such as RFID chips~\cite{cho05}.
Unfortunately, current non-microbial glucose fuel cells obtain only \SI{\ca 180}{\micro\watt\per\centi\meter\squared} peak power and \SI{\ca 3.4}{\micro\watt\per\centi\meter\squared} steady state power~\cite{rapoport12}.
Thus there is a need for \num{1e4}- and \num{1e6}-fold improvements in peak and steady state power densities, respectively, for non-microbial glucose fuel cells to power brain-embedded electronics of the complexity of today's RFID chips (or for the corresponding decrease in power requirements, as emphasized above).

\subsubsection{Conclusions and Future Directions}
The power consumption of today's microelectronic devices is more than six orders of magnitude higher than the physical limit for irreversible computing, and 2--3 orders of magnitude higher than would be permissible for use in whole brain millisecond resolution activity mapping, even under favorable assumptions on the required switching rates and neglecting both the power associated with noise rejection in the analog front end and the CMOS leakage current.
Thus, the first priority is to reduce the power consumption associated with embedded electronics.
In principle, methods such as infrared light photovoltaics, RF harvesting via diode rectification, or glucose fuel cells, could supply power to embedded neural recorders, but again, significant improvements in the power efficiency of electronics are necessary to enable this.
Other potential energy harvesting strategies include materials/enzymes harnessing local biological gradients such as in voltage, osmolarity, or temperature.
An analysis of the energy transduction potential of each of these systems is beyond the scope of this discussion.
Fortunately, with many orders of magnitude potential for improvement before physical limits are reached, we may expect that embedded nano-electronic devices will emerge as an energetically viable neural interfacing option at some point in the future.

\subsection{Embedded Devices: Information Theory}

Most recording methods envisioned thus far rely on the real-time transmission of neural activity data out of the brain.
Physics and information theory impose fundamental limits on this process, including a minimum power consumption required to transmit data through a medium.
The most basic of these results hold irrespective of whether the data transmission is wired or wireless, and regardless of the particular physical medium (optical, electrical, acoustic) used as the information carrier.

A communication ``channel'' is a set of transmitters and receivers that share access to a single physical medium with fixed bandwidth.
The bandwidth is the range of frequencies present in the time-varying signals used to transmit information.
In wireless communications, information is transmitted by modulating a carrier wave.
To allow modulation, the frequency of the carrier wave must be higher than the bandwidth: for example, a \SI{400}{\tera\hertz} visible light wave may be modulated at a \SI{100}{\giga\hertz} rate.
The physical medium underlying a channel could be a wire (with a bandwidth set by its capacitive RC time constant), an optical fiber, free space electromagnetic waves over a certain frequency range, or other media.

As a concrete example, consider a police department with \num{100} officers, each possessing a hand-held radio.
The radios transmit vocalizations by modulating an \SI{80}{\mega\hertz} carrier wave at \SI{\ca 10}{\kilo\hertz}.
This constitutes a single shared communications channel with \SI{10}{\kilo\hertz} bandwidth.
Simultaneously, the fire department may communicate via a separate channel, also with a bandwidth of \SI{\ca 10}{\kilo\hertz}, by modulating a \SI{90}{\mega\hertz} carrier wave.
The channels are separate because modulation introduced into one does not affect the other.
If the neighboring town's police department makes the mistake of also operating at 80 MHz carrier frequency, then they share a channel and conflicts will arise.

\subsubsection{Power Requirements for Single-Channel Data Transmission}

We first treat the case in which there is a single channel for transmitting data out of the brain. As discussed above in the context of electrical spike sorting, the Shannon Capacity Theorem~\cite{cover06} sets the maximal bit rate for a channel (assuming additive white Gaussian noise) to
\[R\sub{max} = \BW \log_2 \left(1 + \SNR\right)\]
where $\BW$ is the channel bandwidth and $\SNR$ is the signal-to-noise ratio.
If there is only thermal noise the $\SNR = P/(N_0 \BW)$, where $N_0$ is the thermal noise power spectral density of $\kb T$ \si{\watt\per\hertz} and $P=(\pathloss)P_0$ is the power of the transmitted signals $P_0$, weakened by path loss \pathloss.
Therefore the transmitted power $P_0$ is lower-bounded:
\[P_0 > \kb T\ \BW\ \frac{2^{R\sub{max}/\BW}-1}{\pathloss}\]
as shown in \autoref{fig:rfpower} (bottom).
In a minimal model of a transmitter-receiver system, there thus exists a tradeoff between the required signal power and the bandwidth of the carrier radiation, due to the thermal noise floor, even in the absence of path loss ($\pathloss = 1$).

Path loss weakens the proportion of the power that can reach the detector.
Using the above equation, we can calculate, as a function of bandwidth, the power necessary to transmit a target whole-brain bit rate of \SI{100}{\giga\bit\per\second} through a medium with path loss dependent on the carrier wavelength, as shown in \autoref{fig:rfpower} (top).

\begin{figure}[htbp]
\caption{%
Power requirements imposed by information theory on data transmission through a single (additive white Gaussian noise) channel with carrier frequency $\nu$ (an upper bound on the bandwidth), given thermal noise and path loss.
Bottom: absorption length of water as a function of frequency (blue), minimal power to transmit data at \SIlist{100;1000;10000}{\giga\bit\per\second} (green) as a function of frequency, assuming thermal noise but no path loss.
Top: minimal power to transmit data at \SIlist{100;1000;10000}{\giga\bit\per\second} as a function of frequency, assuming thermal noise and a path loss corresponding to the attenuation by water absorption over a distance of \SI{2}{\milli\meter}.
While formulated for a single channel, at certain wavelengths (e.g., RF) these factors also constrain multiplexed data transmissions between many transmitters and many receivers, depending on capacity of the system for spatial multiplexing.
Horizontal dashed lines: \SI{40}{\milli\watt}, the approximate maximal whole-brain power dissipation in steady state.
}
\label{fig:rfpower}
\centering
\includegraphics[width=0.78\textwidth]{figs/Fig5.eps}
\end{figure}

For RF wavelengths, the radiation penetrates deeply but the achievable data rates are low without excessive power consumption, due to the limited bandwidth.
For wavelengths intermediate between RF and infrared, the penetration depth is low and power must be expended to combat these losses, despite the high carrier bandwidth.
Only in the infrared and visible ranges do the tradeoffs between power, bandwidth and penetration depth allow transmission of \SI{>100}{\giga\bit\per\second} out of the brain through a single channel without unacceptable power consumption.

The analysis above has ignored the effects of noise sources other than thermal noise, but many additional noise sources will increase the amount of power needed to transmit data, via a decrease in the SNR at fixed input power.
For optical transmission in the brain, the noise is dominated by time-correlated ``speckle noise'' below \SI{200}{\kilo\hertz}, which arises mostly from local blood flow~\cite{carp11}.
This correlated noise, which cannot be filtered by simple averaging, could be avoided by modulating optical signals at frequencies above \SI{200}{\kHz}.

\subsubsection{Spatially Multiplexed Data Transmission}

As discussed above, transmitting information through a single channel imposes direct limits on bit rate, carrier frequency and input power.
However, it is conceivable to divide the data transmission burden over many independent channels, i.e., over many pairs of transmitters and receivers, each operating at lower bandwidth (e.g., at radio frequencies).
Indeed, this would be optimal in a scenario where many embedded devices measure and then transmit the activities of nearby neurons.
As a concrete example of such ``spatial multiplexing,'' an effective capacity of \SI{1}{\tera\bit\per\second} could conceivably be obtained by splitting the data over \num{1000} transmitter-receiver pairs each operating at \SI{1}{\giga\bit\per\second}, with the transmitters arranged in a \num{10 x 10 x 10} grid.
Importantly, in order to exceed the above limits for single-channel data transmission, it must be possible for these transmitter receiver pairs to share the same bandwidth and operate simultaneously without conflicts, for example by modulating distinguishable carrier waves or by transferring data over separate wires.
The conditions under which this may occur, however, can be counter-intuitive.
For example, for antennas to operate independently, they must be spaced apart from one another by roughly a wavelength.
For \SI{10}{\giga\hertz} microwaves, the wavelength is \SI{\ca 3}{\centi\meter}, so no more than a handful of microwave transmitters (e.g., operating at frequencies in the \SI{100}{\giga\hertz}--\SI{1}{\tera\hertz} range) can co-occupy the mouse brain while operating independently.

Even with many non-independent transmitters co-occupying the brain and operating simultaneously over the same frequency spectrum, it may be possible under some conditions to ``factor out'' the effects of the coupling and allow an increase in channel capacity relative the single-channel result.
To treat such scenarios, a generalization to Shannon's capacity theorem to multi-input-multi-output (MIMO) channels has shown that the maximal total data rate is
\[R\sub{max} = \BW \cdot \log_2\left| \mat{I} + ( \SNR ) \mat{H}\mat{H}^* \right|\]
where $\mat{I}$ is the identity matrix, $|\cdot|$ denotes the matrix determinant, $\mat{H}$ is the ($M \times N$ for $N$ transmitters and $M$ receivers) channel matrix giving the coupling between the vector of transmitted signals and the vector of received signals and $\mat{H}^*$ denotes the matrix adjoint of $\mat{H}$ \cite{tulino04}. The vector of received signals is then $\vec{y}=\mat{H}\vec{x}+\vec{n}$ where $\vec{x}$ is the vector of transmitted signals and $\vec{n}$ is a noise vector. Any matrix can be written as $\mat{H}=\mat{U}\mat{\Sigma}\mat{V}^*$ where $\mat{U}$ and $\mat{V}$ are unitary matrices, and $\mat{\Sigma}$ is a diagonal matrix whose elements are the \emph{singular values} $\lambda_i$. One can re-write the above equation as
\[R\sub{max} = \BW \cdot \sum_{i=1}^{\min(M,N)} \log_2\left(1+\SNR \cdot \lambda_i^2\right)\]
If the matrix $\mat{H}$ is of full rank, then the capacity for the multi-channel system can increase over the single-input-single-output (SISO) result by $\min(M,N)$ times~\cite{shiu00}.
Note that the rank of the matrix corresponds to the number of non-zero singular values, so an analysis of the singular values of channel matrices can inform us about the multiplexing capacity of the channel.
Furthermore, this multiplexing capacity can in principle be achieved even when the transmitters are not in communication with each other, which could potentially be important for scenarios involving many brain embedded transmitters~\cite{spencer04}.

Transmission through a medium with negligible scattering is the simplest situation to analyze.
In this case, evaluating the matrix $\mat{H}$ requires knowledge of the transmitter-transmitter, transmitter- receiver, and receiver-receiver distances, as well as the orientations and radiation patterns of the antennas (e.g., high gain antennas will have a highly directional radiation pattern).
Depending on these factors, the beam from each transmitter will spread to impinge upon multiple receivers and the effective number of spatially independent beams will be reduced.
With transmitter-transmitter and receiver-receiver distances larger than the wavelength, and highly directional antennas with appropriately chosen orientations, it is possible to increase the channel capacity linearly with $\min(M,N)$.

Random scattering, in a coherent disordered medium where the mean free-path $\ell$ is much larger than the wavelength $\lambda$ and much smaller than the size of the disordered medium, is another condition where the matrix $\mat{H}$ is a random scattering matrix of full rank~\cite{moustakas00,popoff10}.
Intuitively, for the case of two transmitters and two receivers separated by a disordered medium larger than the mean free path:
if transmitter 1 is at least a mean-free path from transmitter 2 (or potentially as close as a few wavelengths~\cite{berkovits91}), the path from transmitter 1 to receiver 1 and the path from transmitter 2 to receiver 2 would be uncorrelated with respect to one another (in terms of physical path, phase, amplitude fluctuations, and other properties).
The rank of the matrix $\mat{H}$ would then be 2.
Devising a code on the transmitter such that the receivers can distinguish between these two uncorrelated streams results in a doubling of the capacity, rather than simply averaging the noise floor, which would provide only a logarithmic capacity gain due to the increased SNR.

Thus, contrary to intuition, a high degree of random scattering can potentially be useful for data transmission, by enabling spatial multiplexing of channels.
This idea has been demonstrated experimentally in the context of ultrasound transmissions~\cite{derode03}.
Biological tissue in the infrared range is well described as such a random scattering medium (e.g., mean free path \SI{\ca 200}{\micro\meter} at \SI{\ca 800}{\nano\meter} \emph{in vivo}).
Therefore infrared light could be used for spatially multiplexed data transmission out of the brain.
At wavelengths $\lambda$ comparable to critical brain dimensions in the mouse, however, an insufficient number of scattering events will occur to create multiple independent pathways for $N$ transmitters.
Mathematically, the matrix $\mat{H}$ will have one highly dominant singular value and a number of much smaller remaining terms, such that the signals appearing at a receiver from two separate transmitters will be highly linearly dependent, differing by only a small phase angle.
Therefore, there will be no capacity gain from multiple transmitters, and distinct transmitters will effectively share a single channel (reducing to the SISO result).

Little is known about the biological interaction with electromagnetic fields at wavelengths much shorter than the critical brain dimensions but beyond the infrared, approximately \SI{100}{\giga\hertz} (\SI{\ca 3}{\milli\meter}) to \SI{100}{\tera\hertz} (\SI{\ca 3}{\micro\meter}) in the mouse.
If multiple scattering occurs and the absorption is low, this may also be a regime conducive to MIMO communications~\cite{bakopoulos09}.
Efficiently generating and processing radiation in this regime by embedded devices is an outstanding problem, however.
The so-called ``THz-gap''~\cite{tonouchi07} exists because (moving towards higher frequencies starting from DC electronics), parasitic capacitances and passive losses limit the maximum frequency at which a field-effect transistor (FET) may oscillate and on the other hand (moving downward in frequency starting from optics), the band-gaps of opto-electronic devices limit the minimum frequency at which quantum transitions occur.
Thus there is no high-power, low-cost, portable, room temperature \si{THz} source available.
Advances in \si{THz} light generation, e.g. through the use of tunneling transistors, could be enabling.

\subsubsection{Ultrasound as a Data Transmission Modality}
An important caveat to these conclusions on wireless data transfer occurs if we consider the use of ultrasound rather than electromagnetic radiation.
Because the speed of sound is dramatically slower than that of light, the wavelength of \SI{10}{\mega\hertz} ultrasound is only \SI{\ca 150}{\micro\meter} (approximating the speed of sound in brain as the speed of sound in water, \SI{\ca 1500}{\meter\per\second}).
Thus, many \SI{10}{\mega\hertz} ultrasound transmitters/receiver could be placed inside a mouse brain while maintaining their spatial separation above the wavelength, and a linear scaling of the MIMO channel capacity with the number of devices is likely possible in this regime, assuming that appropriate antenna gains and orientations can be achieved inside brain tissue. Beam orientation could present a challenge if micro-devices are oriented randomly after implantation.
With an attenuation of \SI{0.5}{\dB\per\centi\meter\per\mega\hertz}~\cite{hoskins10}, the attenuation at \SI{10}{\mega\hertz} is only \SI{5}{\dB\per\centi\meter}.
Thus ultrasound-based transmission of power and data from embedded recording devices may be viable \cite{Seo2013}.

In contrast, direct imaging of neural activity by ultrasound (e.g., using contrast agents which create local variations in tissue elastic modulus or density) may be more difficult.
While the theoretical (diffraction-limited) and currently practical resolutions of  \SI{100}{\mega\hertz} ultrasound are \SI{\ca 15}{\micro\meter}, and \SIrange{15}{60}{\um}~\cite{foster00}, respectfully, at these frequencies, power is attenuated by brain tissue with a coefficient of \SI{\ca 50}{\dB\per\centi\meter}~\cite{hoskins10} (\num{1e5}-fold attenuation per cm), which imposes a penetration limit (e.g., for
measurements with a dynamic range of 80 dB~\cite{foster00}).
Attenuation of ultrasound by bone is stronger still, at \SI{22}{\dB\per\cm\per\MHz}~\cite{hoskins10}.
Attenuation could therefore limit the use of ultrasound as a high-resolution neural recording modality in direct imaging modes, but multiplexed transmission of lower-frequency ultrasound from embedded devices could sidestep this issue.

\subsubsection{Conclusions and Future Directions}

Physics and information theory impose a tradeoff between bandwidth and power consumption in sending data through any communication channel.
Considering only thermal noise and no path loss, achieving \SI{100}{\giga\bit\per\second} data rates through a single channel necessitates either a bandwidth above a few \si{\giga\hertz} or a transmitted power above \SI{\ca 100}{\milli\watt}, the latter of which may be prohibitive from a heat dissipation perspective if the signals are to be generated by dissipative microelectronic devices.
Researchers have proposed to use thousands or millions of tiny~\cite{gomez10} wireless transmitters embedded in the brain to transmit local neural activity measurements to an external receiver via microwave radiation~\cite{dyson09}.
However, based on the above power-bandwidth tradeoff, this will require a bandwidth above a few \si{\giga\hertz}.
At the corresponding carrier frequencies, the penetration depth of the microwave radiation drops significantly, requiring increased power to combat the resulting signal loss.
While one might hope that multiple independent channels could be multiplexed inside the brain, reducing the bandwidth and power requirements for each individual channel, the long wavelengths of microwave radiation compared to the mouse brain diameter suggest that such channels cannot be independent, as is confirmed by an analysis of the multi-input-multi-output (MIMO) channel capacity for this scenario.
Therefore, radio-frequency electromagnetic transmission of whole brain activity data from embedded devices does not appear to be a viable option for brain activity mapping.

On the other hand, an analysis of the channel capacity for IR transmissions in a diffusive medium suggests that, because of its high frequency and decent penetration depth, infrared radiation may provide a viable substrate for transmitting activity data from embedded devices.
For example, data could be transmitted via modulating the multiple-scattering speckle pattern of infrared light by varying the backscatter from an embedded optical device, such as an LCD pixel \cite{komanduri2008reflective}, in an activity-dependent fashion.
Because the speckle pattern is sensitive to the motion of a single scatterer~\cite{berkovits91, pappu2002physical}, coherent multiple scattering could effectively act as an optical amplifier and as a means to create independent communication pathways. Furthermore, multiplexed data transmission via ultrasound is likely possible because of its short wavelength in tissue at reasonable carrier frequencies.
It may also be of interest to explore network architectures~\cite{Bush2011} in which data is transmitted at low transmit power over short distances via local hops between neighboring nodes capable of signal restoration.

\subsection{Magnetic Resonance Imaging}

Magnetic resonance imaging (MRI) uses the resonant behavior of nuclear spins in a magnetic field to non-invasively probe the spatiotemporally varying chemical and magnetic properties of tissues.
Although originally conceived as a means to image anatomy, MRI can be used to observe neural activity provided that correlates of such activity are reflected in dynamic changes in local chemistry or magnetism.

In an MRI study, a strong static field ($B = \SIrange{1}{15}{\tesla}$) is applied to polarize nuclear spins (usually \textsuperscript{1}H), causing them to resonate at a field-dependent Larmor frequency \[f = \frac{\gamma}{2\pi} B\] where $\gamma$ is the gyromagnetic ratio of the nucleus (e.g., \textsuperscript{1}H has a gyromagnetic ratio of \SI{267.522}{\mega\hertz\per\tesla} \cite{codata10} and therefore resonates at \SI{42.577}{\mega\hertz} in a \SI{1}{\tesla} field).
To obtain positional information, spatial field gradients are applied such that nuclei at different positions in the sample resonate at slightly different frequencies.
Sequences of RF pulses and gradients are then applied to the sample, eliciting resonant emissions that contain information about spins' local chemical environment, magnetic field anisotropy and various other properties.

Most functional studies rely on dynamic changes in two forms of relaxation experienced by RF-excited spins.
The first form results from energy dissipation through interactions with other species (e.g. other spins or unpaired electrons), causing the spins to recover their lowest energy state on a timescale, $T_1$, of \SIrange{100}{1000}{\milli\second}~\cite{rooney07}.
The second form of relaxation reflects the dephasing of spin signals in a given sampling volume (voxel) over a timescale, $T_2$, of \SIrange{10}{100}{\milli\second}~\cite{deichmann95} due to non-uniform Larmor frequencies caused, e.g., by the presence of local magnetic field inhomogeneities.

In blood-oxygen level dependent \cite{ogawa1990oxygenation} functional MRI (BOLD-fMRI), the most widely used form of neural MR imaging, increased neural activity in a given brain region alters the vascular concentration of paramagnetic deoxy-hemoglobin, which affects local magnetic field homogeneity and thereby alters $T_2$.
Although the existence of this paramagnetic reporter of oxygen metabolism is fortuitous, the data it provides is only an indirect readout of neural activity \cite{sirotin2009anticipatory, logothetis2008we, jukovskaya2011does}, which is limited in its spatial and temporal resolution to the dynamics of blood flow in the brain's capillary network (\SIrange{1}{2}{\second}). 
The spatial point-spread function of the hemodynamic BOLD response is in the \SI{1}{\milli\meter} range, although sub-millimeter measurements, revealing cortical laminar and columnar features, have been obtained by filtering out the signals from larger blood vessels \cite{bandettini2009functional}.
A significant area of current and future work is aimed at developing new molecular reporters that can be introduced into the brain to transduce aspects of neural signaling such as calcium spikes and neurotransmitter release into MRI- detectable magnetic or chemical signals~\cite{shapiro10,koretsky12,hsieh12}, as described in section 4.5.3, below.

\subsubsection{Spatiotemporal Resolution}

The temporal resolution of MRI is limited by the dynamics of spin relaxation. For sequential MR signal acquisitions to be fully independent, spins must be allowed to recover their equilibrium magnetization on the timescale of $T_1$ (\SIrange{100}{1000}{\milli\second}).
However, if local $T_1$ is static its pre-mapping could enable temporally variant $T_2$ effects to be observed at refresh rates on the faster $T_2$ timescale (\SIrange{10}{100}{\milli\second})~\cite{deichmann95}.
It may also be possible to detect events that occur on a timescale shorter than $T_1$ and $T_2$, if the magnitude of the resulting change in spin dynamics overcomes the lack of independence between acquisitions.
Note that these limitations on the repetition time of the underlying pulse sequence are not eliminated by ``fast'' pulse sequences such as echo-planar imaging (EPI)~\cite{stehling91} and fast low-angle shot (FLASH)~\cite{haase86} or by the use of multiple detector coils~\cite{wiesinger06}.
These techniques accelerate the acquisition of 2D and 3D images, but still require spins to be prepared for readout.

The spatial resolution of current MRI techniques is limited by the diffusion of water molecules during the acquisition time~\cite{glover02}, since contrast at scales above the diffusion length will be attenuated by diffusion.
The RMS distance of a water molecule from its origin, after diffusing in 3D for a time $T\sub{acq}$, is
\[d\sub{rms} = \sqrt{6D\sub{water}T\sub{acq}}\]
where $D\sub{water}=\SI{2300}{\micro\meter\squared\per\second}$ is the self-diffusion coefficient of water.
For $T\sub{acq}\approx\SI{100}{\milli\second}$, $d\sub{rms}\approx\SI{37}{\micro\meter}$, which sets the approximate spatial resolution.
For ultra-short acquisitions at $T\sub{acq}\approx\SI{10}{\milli\second}$, $d\sub{rms}\approx\SI{12}{\micro\meter}$.

More technically, as described above, MRI uses field gradients to encode spatial positions in the RF frequency (wavenumber) components of the emitted radiation.
The quality of the reconstruction of frequency space thus limits the achievable spatial resolution.
The sampling interval of the detector $\Delta t$, and the field gradient $G$, determine the wavenumber increment as
\[\Delta k = \gamma G \Delta t\]
The spatial resolution (here considering only one dimension) is then given by~\cite{glover02}:
\[\Delta x\sub{$k$-space} = \frac{\pi}{\frac{T\sub{acq}}{\Delta t} \Delta k} = \frac{\pi}{T\sub{acq} \gamma G}\]
Note that it is the gradient field, not the polarizing field $B_0$, which determines the resolution. For a gradient field of \SI{100}{\milli\tesla\per\meter} and an acquisition time of \SI{100}{\milli\second}
\[\Delta x\sub{$k$-space} = \frac{\pi}{\left(\SI{100}{\milli\second}\right)\left(\SI{267}{\MHz\per\tesla}\right)\left(\SI{100}{\milli\tesla\per\meter}\right)}\approx \SI{1.17}{\micro\meter}\]

Due to relaxation, however, the emissions from a spin at a given position do not constitute a pure tone with a well-defined frequency. Instead, each spin exhibits a frequency spread, which gives rise to another limit on the spatial resolution~\cite{glover02}:

\[\Delta x\sub{relaxation} = \frac{2}{\gamma G T_2^*}\]
where $T_2^*$ is the shortest relaxation time. Assuming $T_2^*=\SI{5}{\milli\second}$ and $G=\SI{100}{\milli\tesla\per\meter}$, gives
\[\Delta x\sub{relaxation}\approx \SI{14}{\micro\meter}\]
Therefore, for water protons, the resolution limit is set by diffusion over \SI{\ca 100}{\milli\second} acquisition timescales, rather than by k-space sampling or relaxation. For other spin species (e.g., with lower diffusion rate), it may be possible to achieve resolutions limited by frequency discrimination.

Notably, there exists a practical trade-off between spatial resolution, temporal resolution, and sensitivity (SNR). In particular, to achieve high spatial resolution, it is necessary to densely sample $k$-space.
Fast sampling sequences such as FLASH and EPI achieve speed by sampling each point of $k$-space using less signal and often at a lower resolution.
Even at high field strengths (\SI{11.7}{\tesla}), this tradeoff results in practical EPI-fMRI with a spatial resolution of \SI{150 x 150 x 500}{\micro\meter} and a temporal resolution of \SI{200}{\milli\second}~\cite{yu12}.
Achieving much higher spatial resolutions requires longer acquisitions and/or lower temporal sampling.
For example, achieving a \SI{20}{\micro\meter} anatomical resolution in MRI of \emph{Drosophila} embryos required 54 minutes for a small field of view of \SI{2.5 x 2.5 x 5}{\milli\meter}~\cite{null08}.
Furthermore, the flies were administered paramagnetic gadolinium chelates to shorten $T_1$ and thereby the acquisition time.
Separately, frame rates of \SI{50}{\milli\second} have been obtained for dynamic imaging of the human heart, but required the use of strong priors to reduce data collection requirements~\cite{zhang10}.

\subsubsection{Energy Dissipation}

Energy is dissipated into the brain when the excited spins relax to their equilibrium magnetization in the applied field.
The energy associated with this relaxation is of order the Zeeman energy:
\[\Delta E\sub{Zeeman} = \frac{\gamma}{2\pi} h B_0\]
To obtain an upper bound on the heat dissipation of MRI, we first assume that the brain is entirely water, that every proton spin is initially aligned by the field and then excited by the RF pulse, and that all spins relax during a $T_1$ relaxation time of \SI{\ca 600}{\ms}.
In this scenario, even an applied field of as high as \SI{\ca 200}{\tesla} would generate dissipation within the \SI{\ca 50}{\milli\watt} energy dissipation limit.
In reality, the energy dissipation is 4--5 orders of magnitude smaller, because only a tiny fractional excess of the spins are initially aligned by the field (\num{\ca 1e-5} for fields on the order of \SI{1}{\tesla}).
Therefore, thermal dissipation associated with spin excitation in MRI is unlikely to cause problems unless field strengths much greater than the largest currently used fields (\SI{\ca 20}{\tesla}) are invoked, or spins with much higher gyromagnetic ratios are used.

Practically, the main energy consideration in MRI is the absorption by tissues of RF energy applied during imaging pulse sequences and the switching of magnetic field gradients.
Such absorption is often calculated through numerical solutions of the Maxwell Equations taking into account the precise geometry, tissue properties and applied fields for a particular experimental setup~\cite{collins04}.
The typical specific absorption rate (SAR) is well under \SI{10}{\watt\per\kilogram} (or \SI{5}{\milli\watt} per \SI{500}{\milli\gram}), and is restricted by the FDA to less than \SI{3}{\watt\per\kilogram} for human studies.

\subsubsection{Imaging Agents}

All the preceding discussion about spatiotemporal resolution presumes the existence of local time-varying signals (e.g., changes in $T_1$ or $T_2$) corresponding to the dynamics of neural activity.
The hemodynamic BOLD response is the most prominent such signal, the limitations of which are discussed above. There have been studies working towards direct detection of minute (e.g., \SI{\ca 0.2}{\nano\tesla}) magnetic fields associated with action potentials through their effects on MRI phase or magnitude contrast \cite{Bodurka2002, petridou2006direct}, but reliably detecting these fields above the physiological noise will likely require novel strategies \cite{witzel2008stimulus, halpern2010magnetic} and estimates of the feasibility of these methods have been complicated by the lack of a realistic model for the local distribution of neuronal currents. MRI detection of the mechanical displacement of active neurons due to the Lorentz force in an applied magnetic field~\cite{roth2009mechanical} has also been explored, as has the detection of activity-dependent changes in the diffusion of tissue water \cite{tsurugizawa2013water,le2006direct}, possibly due to neuronal or glial~\cite{kitaura2009activity} cell swelling \cite{isokawa2005n, holthoff1996intrinsic}, although strongly diffusion-weighted scans may have disadvantages in terms of SNR \cite{jasanoff2007bloodless}. Manganese influx through voltage-gated calcium channels \cite{lin1997manganese, van2002vivo} generates MRI contrast, but exhibits slow uptake kinetics and even slower efflux, such that manganese  monotonically accumulates in the neurons over time. Conceivably, over-expression of manganese efflux pumps such as the iron transporter ferroportin \cite{madejczyk2012iron} could allow time-dependent activity imaging using manganese contrast.

In the past 15 years, efforts have been undertaken to develop chemical and biomolecular imaging agents that can be introduced into the brain to produce MRI detectable signals corresponding to specific aspects of neural function (analogously to fluorescent dyes and proteins). One critical advantage of using genetically encoded indicators would be the ability to target these indicators to specific cell types \cite{madisen2009robust, luo2008genetic} and/or cellular compartments \cite{arnold2007polarized, el2001polarized, correa2009rapid, jacobs2003soma, vacher2008localization, boeckers2005c, feinberg2008gfp, yamagata2012transgenic}. Notable examples of engineered molecular MRI contrast agents include $T_1$ and $T_2$ sensors of calcium~\cite{atanasijevic06,li99} and a $T_1$ sensor of neurotransmitter release~\cite{shapiro10}.
Depending on their mode of action, these imaging agents can provide temporal resolutions ranging from \SI{10}{\milli\second} to \SI{10}{\second}~\cite{shapiro06}.
However, a major current limitation for fast agents is the requirement that they be present in tissues at \si{\micro\Molar} concentrations, posing major challenges for delivery and genetic expression. Model organisms lacking hemoglobin (e.g., the blowfly), and hence lacking a hemodynamic BOLD response (as is also the case for ex-vivo brain slices), may be particularly useful for in-vivo testing of novel activity-dependent contrast mechanisms, and specialized setups have been constructed to perform MRI at near-cellular spatial resolution in this context (though still requiring several hours to generate whole-brain anatomical images at this resolution) \cite{jasanoff2002vivo}.

\autoref{fig:mriresolution} shows the achievable temporal resolution for various classes of activity-dependent MRI contrast agents as well as the spatial resolution limit due to water proton diffusion.

\begin{figure}[htbp]
\caption{Key factors determining the spatiotemporal resolution of dynamic MRI imaging. (a) Temporal resolution and contrast agent concentration allowing \SI{>5}{\percent} contrast, for different classes of dynamic MRI contrast agent (reproduced from~\cite{shapiro06}, with permission). (b) Diffusion limited spatial resolution for water proton MRI as a function of temporal resolution.}
\label{fig:mriresolution}
\centering
\includegraphics[width=0.78\textwidth]{figs/Fig6.eps}
\end{figure}

\subsubsection{Conclusions and Future Directions}

Moving beyond hemodynamic contrast is crucial for improving the spatiotemporal resolution of fMRI, and several avenues may be available for doing so, especially through the use of novel molecular contrast agents and/or genetic engineering. 
More fundamentally, current MRI techniques rely on the excitation of proton spins in water: this limits imaging to \SI{>100}{\ms} timescales, unless SNR is severely compromised, due to the low polarizability and long $T_1$ relaxation times of proton spins.
There is also a spatial resolution limit of tens of microns over these timescales due to water's fast diffusion. Methods which couple neural activity to non-diffusible, highly polarized spins could, in principle, ameliorate this situation.

\subsection{Molecular Recording}

An alternative to electrical, optical or MRI recording is the local storage of data in molecular substrates.
Each neuron could be engineered to write a record of its own time-varying electrical activities onto a biological macromolecule, allowing off-line extraction of data after the experiment.
Such systems could, in principle, be genetically encoded, and would thus naturally record from all neurons at the same time.

One proposed implementation of such a ``molecular ticker tape'' would utilize an engineered DNA polymerase with a Ca\textsuperscript{2$+$}-sensitive or membrane-voltage-sensitive error-rate~\cite{zamft12} to record time-varying neural activities onto DNA~\cite{glaser13} as patterns of nucleotide misincorporations relative to a known template DNA strand (for alternative local recording techniques see~\cite{friedland09,bonnet13}).
The time-varying signal would later be recovered by DNA sequencing and subsequent statistical analysis~\cite{glaser13}.
DNA polymerases found in nature can add up to \num{\ca 1000} nucleotides per second~\cite{kelman95}, and certain non-replicative polymerases such as DNA polymerase iota have error rates of \SI{>70}{\percent} on template T bases~\cite{frank07}.
Similar strategies could be implemented using RNA polymerases or potentially using other enzyme/hetero-polymer systems.

\subsubsection{Spatiotemporal Resolution}

Polymerases proceed along their template DNA strands in a stochastic, thermally driven fashion; thus, polymerases that are initially synchronized will de-phase with respect to one another over time, occupying a range of positions on their respective templates at the time when a neural impulse occurs. The rate of this de-phasing is a key parameter governing the temporal resolution of molecular recording. By averaging over many simultaneously replicated templates, it is theoretically possible to associate variations in nucleotide misincorporation rate with the times at which these variations occurred, and thus to obtain temporally resolved recordings of the cation concentration \cite{glaser13}.

An analysis of the projected temporal resolution of molecular ticker tapes as a function of polymerase biochemical parameters can be found in~\cite{glaser13}.
This work suggests that molecular ticker tapes require synchronization mechanisms if they are to record at \SI{<10}{\ms} temporal resolution for durations longer than seconds, even when \num{10000} templates per cell are recorded simultaneously, unless engineered polymerases with kinetic parameters beyond the limits of those found in nature can be developed.
Recording at lower temporal resolutions, however, appears feasible using naturalistic biochemical parameters, even in the absence of synchronization mechanisms.

The development of mechanisms to improve synchronization of the ensemble of polymerases within each cell, or to encode time-stamps into the synthesized DNA (e.g., molecular clocks), could improve temporal resolution and decrease the number of required template strands per neuron. Mutation-based molecular clocks over evolutionary timescales are widely used in the field of phylogenetics \cite{Ochman1987}, and new tools from synthetic biology \cite{Elowitz2000} and optogenetics or thermogenetics \cite{Bernstein2012} also suggest strategies for building molecular clocks on faster timescales. As an example sketch of a possible synchronization mechanism, optogenetic methods (e.g., similar to \cite{konermann2013optical}) could be used to halt, and thus re-phase, a sub-population of polymerases at a light-dependent pause site in the template DNA, while another sub-population of polymerases reads through this pause site to maintain temporal continuity of recording; then the second population could be re-synchronized at an orthogonal light-dependent pause site while the first population reads through. Alternatively, some form of optogenetics could be used to directly write bit strings encoding time stamps into the synthesized DNA.  These strategies would require one or two, sufficiently strong global clock signals to be optically broadcast to all neurons. The optics involved would be comparatively simple: this could be done using far fewer optical fibers than would be required for fiber-based activity readout, for instance. Alternatively, if the brain could be flash-frozen at a precisely known time, this could serve as a global time-stamp corresponding to the termination of DNA synthesis (e.g., the DNA 3' end).

Spatial resolution for molecular recording would naturally reach the single cell level. To determine which nucleic acid tape originated from which neuron, static cell-specific DNA barcoding could be used \cite{zador12} to associate the synthesized DNA strands with nodes in a topological connectome map obtained via DNA sequencing.  Fluorescence in-situ DNA sequencing (FISSEQ) \cite{Lee2013InSitu} on serially-sectioned or intact tissue (fixed post-mortem) \cite{chung2013structural} could be used to obtain explicit geometric information.

\subsubsection{Energy Dissipation}
% swap paragraphs?
\paragraph{Nucleotide metabolism}
DNA polymerization imposes a metabolic load on the cell.
Replication of the 3 billion bp human genome takes approximately eight hours in normally dividing cells, which equates to a nucleotide incorporation rate of \SI{\ca 100}{\kHz}.
Therefore, in order not to exceed the metabolic rates associated with normal genome replication, molecular ticker tapes operating at \SI{1}{\kHz} polymerization speed~\cite{kelman95} would be limited to approximately 100 simultaneously replicated templates per cell.
Even more recordings would be possible for RNA ticker tapes.
The mammalian cell polymerizes at least \num{1e11} NTPs per 16-hour cell cycle \cite{jackson00}.
Therefore, \num{\ca 1,700} RNA tickertapes, each operating at \SI{1}{\kHz}, could be placed in a cell before generating a metabolic impact equal to that of the cell's baseline transcription rate.
While these comparisons to baseline physiological levels are reasonable guidelines, it is likely that a neuron can support higher metabolic loads associated with larger numbers of templates.
The maximal rate of neuronal aerobic respiration is \SI{\ca 5}{\femto\mole} of ATP minute via oxidative respiration (see the section on bio-luminescence). Assuming \num{\ca 1} ATP equivalent consumed per nucleotide incorporation, if neuronal metabolism were entirely dedicated to polymerization, it could support the incorporation of up to \num{6e9} nucleotides per minute, or \num{1e5} simultaneously replicated DNA templates at \SI{1}{\kHz}. % ATP *per* minute?

\paragraph{Power dissipation}
Normal DNA and RNA synthesis do not produce problematic energy dissipation and molecular tickertapes will likewise not be highly dissipative, at least in the regime where nucleic acid polymerization rates do not exceed those associated with genome replication or transcription.

\subsubsection{Volume Displacement}

The nucleus of a neuron occupies \SI{\ca 6}{\percent} of a neuron's volume ($(\SI{4}{\um})^3/(\SI{10}{\um})^3$).
Ticker tapes operating at \SI{1}{\kHz} with \num{10000} simultaneously replicated templates could record for \num{300} seconds before the total length of DNA synthesized equals the human genome length.
In the case of RNA polymerase II-based transcription, \SI{2.75}{\hour} of recording by \num{10000} recorders is required to reach the net transcript length in the cell.
Therefore, with appropriate mechanisms to fold/pack the nucleic acids generated by molecular ticker tapes, they would not impose unreasonable requirements on cellular volume displacement over minutes to hours.

\subsubsection{Conclusions and Future Directions}

Molecular recording of neural activity has the advantages of inherent scalability, single-cell precision, and low energy and volume footprints.
Making molecular recording work at temporal resolutions approaching \SI{1}{\kHz}, however, will require multiple new developments in synthetic biology, including protein engineering to create a fast polymerase (\SI{>1}{\kHz}) that strongly couples proxies for neural activity to nucleotide incorporation probabilities.
Synchronization mechanisms would likely be required to perform molecular recording at single-spike temporal resolution.
An attractive potential payoff for molecular approaches to activity mapping is the prospect of seamlessly combining---within a single brain---the readout of activity patterns with the readout of structural connectome barcodes~\cite{zador12, mishchenko2010optical}, transcriptional profiles \cite{Lee2013InSitu} (e.g., to determine cell type) or other (epi-)genetic signatures \cite{sanjana2012activity} which are accessible via high-throughput nucleic acid sequencing.

\section{Discussion}

We have analyzed the physical constraints on scalable neural recording for selected modalities of measurement, data storage, data transmission and power harvesting. Each analysis is based on assumptions -- about the brain, device physics, or system architecture -- which may be violated.
Understanding these assumptions can point towards strategies to work around them, and in some cases we have suggested possible directions for such workarounds.
Even valid assumptions about natural brains may be subject to modification through synthetic biology or external perturbation.
For example, methods for rapidly removing heat from the brain could work around our assumptions about its natural cooling capacity, supporting a range of highly dissipative recording modalities. Likewise, assumptions about the necessary bandwidth for data transmission could be relaxed if some information is stored locally and read out after the fact. 

In some cases, theoretical extensions of our first-order analyses could reveal important insights. The power-bandwidth tradeoffs identified in section 4.4 for electromagnetic data transmission may place limits on the informational throughput of fMRI, for example, or a realistic simulation of heat fluxes in the brain could reveal the true limits of power dissipation. In many other cases, new experiments will be required to move beyond crude estimates of feasibility.

The analysis of physical limits illustrates challenges and opportunities for technology development. While the opportunities can only be touched upon here, and some directions have been treated elsewhere \cite{Dean2013, alivisatos13, alivisatos2012brain}, we anticipate further analyses which could explore design spaces in detail. Here we briefly summarize a sampling of new directions suggested by our analysis.

\paragraph{Electrical recording} The signal to noise ratio for a voltage sensing electrode imposes limits on the number of neurons per electrode from which signals can be detected and spike-sorted, likely requiring roughly one electrode per \num{100} neurons. To go beyond this, pure voltage sensing nodes could be augmented with the ability to directionally resolve distinct sources. For example, the 3D motion of a charged nanoparticle in an electric field, or of a dielectric nanoparticle in an electric field gradient, could be monitored at each recording site \cite{WoodPersonalCommunication}.

\paragraph{Optical recording} While light scattering creates severe limitations on optical imaging, embedded optical microscopies could overcome these limits. Embedded optical imaging systems with high signal multiplexing capacity would be desirable, to minimize the required number and size of implanted optical probes. 

One option might be to use time-of-flight information to multiplex many sensor readouts into a single optical fiber: this could potentially be realized using time-domain reflectometry techniques, commonly used to determine the positions of defects in optical fibers, coupled to neural activity sensors arranged along the fiber, which would modulate the fiber's local absorption or backscatter \cite{WoodPersonalCommunication}. Time-domain reflectometry techniques have already reached \SI{40}{\micro\meter} resolution \cite{Lamy1981fi}.

Alternatively, novel fluorescent or bio-luminescent activity indicators could in principle relax the limits associated with light scattering, either by enabling efficient two-photon excitation at lower light dosages, or through all-infrared imaging schemes. Infrared bio-luminescence may be a particularly high-value target.

\paragraph{Delivery} For both embedded optical and electrical recording strategies, new delivery mechanisms will be needed to scale to whole mammalian brains. Many of the basic parameters for scalable delivery mechanisms are still unknown. For example, can a large number of ultra-thin nano-wire electrodes or optical fibers be delivered via the capillary network? Can cells such as macrophages engulf ultra-miniaturized microchips and transport them into brain tissue? Can the blood brain barrier be locally opened (e.g., using ultrasonic stimulation \cite{hynynen2005local}) to allow targeted delivery of recording probes?

\paragraph{Intrinsic signals} The ideal technique would not require exogenous contrast agents or genetically encoded indicators, instead relying on signals intrinsic to neurophysiology. Neurons exhibit few-nano-meter scale \cite{iwasa1980swelling} membrane displacements (e.g., in response to Maxwell stresses from large local electric field variations) during the action potential \cite{oh2012label}. These can be measured using optical interferometry \cite{fang2004noncontact}, but in principle they could also be monitored acoustically (and related activity-associated membrane swellings have been directly observed by atomic force microscopy \cite{kim2007mechanical} in cultured neurons). Sensors could be embedded in or around tissue to transduce the resulting acoustic vibrations into an electrical or optical readout. This could potentially allow recording at larger distances than the \SI{\ca 130}{\micro\meter} maximum recording radius for a voltage sensing node. Other intrinsic signals include changes in refractive index associated with neural activity, which will modulate the reflection and scattering of light \cite{stepnoski1991noninvasive}. These intrinsic changes in optical properties can be measured with optical coherence tomography (OCT) \cite{lazebnik2003functional}. Local metabolic and hemodynamic signatures are also detectable optically, such as hemoglobin oxygenation (e.g., via functional near-infrared spectroscopy \cite{hoshi2003functional}) and the partial pressure of oxygen \cite{parpaleix2013imaging, lecoq2011simultaneous}. For minimal invasiveness, diffuse optical tomography uses near-infrared light (\SIrange{600}{950}{\nano\meter}), which passes sufficiently-readily through the skin and skull to allow imaging of hemodynamics in cortex \cite{hillman2007optical, joseph2006diffuse, huppert200914}, although currently with limited spatial and temporal resolution.

\paragraph{Data transmission through diffusive media} Unlike radio-frequency electromagnetics, infrared wavelengths may allow spatially multiplexed data transmissions from embedded recording devices, creating multiple independent channels by taking advantage of the stochasticity of light paths in strongly-scattering tissue. Alternatively, techniques are emerging to dynamically measure and invert the optical scattering matrix of a turbid medium, using pure-optical or hybrid techniques.

\paragraph{Ultrasound} Certain wavelengths of ultrasound exhibit potentially-favorable combinations of wavelength (spatial resolution), bandwidth (frequency) and attenuation compared to radio-frequency electromagnetics. Ultrasound could be used as a mechanism for powering and communicating with embedded local recording chips \cite{Seo2013}. Novel indicators \cite{ShapiroGasNanostructures} would likely need to be developed to perform neural activity imaging using pure ultrasound. Hybrid techniques such as photo-acoustic~\cite{filonov12} or ultrasound-encoded optical~\cite{wang12} microscopies are also of interest.

\paragraph{Molecular recording}  For local recording, molecular recording devices could sidestep power constraints on embedded electronics, at the cost of increased engineering complexity. For molecular recording to become practical at temporal resolutions approaching the millisecond scale, sophisticated protein and viral engineering would likely be required to create a high-speed polymerase-based recorder operating in the neuronal cytoplasm. This would also necessitate molecular synchronization or time-stamping mechanisms to maintain phasing between multiple polymerases within a single cell, as well as between different cells.

On the other hand, molecular recording devices operating at slower timescales (e.g., seconds) could perhaps be engineered via more conservative combinations of known mechanisms, such as CREB-mediated signaling to the nucleus \cite{Deisseroth2012} or nuclear-localized calcium sensing \cite{schrodel2013brain}. In either case, the nucleic acid strands resulting from such molecular recorders could be space-stamped with cell-specific viral connectome barcodes \cite{zador12} for later readout by bulk sequencing. Alternatively, the ticker tapes could be read within their anatomical contexts by in-situ sequencing, i.e., nucleic acid sequencing performed inside intact tissue \cite{Lee2013InSitu}.

\paragraph{Combining static and dynamic datasets} Combining dynamic activity information with static structural or molecular information could allow these datasets to disambiguate one another. For example, a diversity of colors for fluorescent activity indicators (i.e., a form of BrainBow~\cite{livet07} calcium imaging) could ease requirements on spatial separation of optical signals, and the color pattern across cells could be mapped post-mortem at single-cell resolution using in-situ microscopy. Generalizing further, in-situ sequencing enables the extraction of vast quantities of molecular data from fixed tissue, in effect allowing observations with a palette of $4^N$ colors, where $N$ is the length of the nucleic acid polymer. It may be possible to harness this exponential informational resource to enhance the readout of dynamic activity information as well, e.g., through molecular recording.

\paragraph{MRI} Current MRI is limited by its reliance on intrinsic hemodynamic contrast mechanisms and on rapidly diffusing aqueous protons. Indicators coupling neural activity to spin relaxation rates are being developed to move beyond hemodynamic contrast. Novel excitation and detection schemes that could sensitize MRI to fast, local, intrinsically activity-dependent mechanisms (e.g., cell swelling, neuronal magnetic fields), while filtering out the slower BOLD response, are also of interest and should initially be tested in organisms or slice preparations lacking hemodynamic responses. Detailed computational models of neuronal currents within a tissue voxel (e.g., in the spirit of \cite{reimann2013biophysically}), and of the resulting mechanical and chemical changes, could be useful for evaluating potential new methods. In principle, MRI could also abandon the use of water protons as the signal sources, although this would pose significant implementation challenges.

\paragraph{Readout methods} New signal processing frameworks such as compressive sensing could reduce bandwidth requirements and inspire new microscope designs exploiting computational imaging principles \cite{raskar2009computational, velten2012recovering, kim2010next, pnevmatikakissparse}. Fast readout mechanisms \cite{lauxtermann2001mega} applied to giga-pixel arrays (e.g., the 3.2 giga-pixel CCD camera planned for the Large Synoptic Survey Telescope, which will have \SI{\ca 1}{\second} readout time) might be adapted to large-scale electrical or optical recording methods. Linear photodiode arrays can achieve 70 kHz line readout rates \cite{PSeriesLinearArrayImager}, and many such linear arrays could be read out in parallel. Optoelectronic methods that convert between time, space and frequency representations of signals \cite{goda2012high, goda2009serial, goda2008amplified, goda2009theory, mahjoubfar2011high, tsia2010performance, goda2013dispersive} could inspire designs for even faster readouts (e.g., \SI{\ca 10}{\mega\hertz} frame rates have been demonstrated in brightfield imaging). Although these methods are not directly compatible with fluorescence measurements due to their use of spectral dispersion, related ideas (e.g., beat frequency multiplexing) may enable fluorescence microscopy at rates above that of CCD-based imaging \cite{diebold2013digitally, ducros2013encoded}, limited ultimately by fluorescence lifetimes, while also exhibiting favorable properties with respect to scattering.

\paragraph{Alternative modalities} X-ray imaging has been used on live cells~\cite{moosmann13} and might find use in neural recording if suitable contrast agents could be devised.
X-rays interact with electron shells via photoelectric absorption and Compton scattering and with band structure in materials.
X-ray phosphors utilize substitutions in an ionic lattice to generate visible or UV light emission upon X-ray absorption~\cite{issler95}.
In principle, some of these mechanisms could be engineered as neural activity sensors, e.g., in an absorption-contrast mode suitable for tomographic reconstruction~\cite{larabell04}. While tissue damage due to ionizing radiation would ultimately be prohibitive (e.g., on a timescale of minutes \cite{WoodPersonalCommunication}), very brief experiments might still be possible.

Likewise, electron spin resonance (ESR) operates at $\ca$100$\times$ higher Larmor frequency compared to proton MRI, which improves polarizability of the spins.
Due to Pauli exclusion, use of this technique requires an indicator with unpaired electrons. These can be found in nitrogen vacancy diamond nano-crystals~\cite{horowitz12} (nano-diamonds), which are also sensitive to voltage~\cite{dolde11} and to magnetic fields~\cite{Hall2012}, and are amenable to optical control and fluorescent readout of the spin state (although the 2P cross-section of the $(N-V)^-$ center appears to be relatively low~\cite{Tse-Luen2007}).

\paragraph{Hybrid systems} New mergers of input, sensing, and readout modalities can work around complex engineering constraints. Electrical or acoustic sensors could be used with optical \cite{sadek2010wiring} (e.g., fiber) or ultrasonic readouts and power supplies. An MRI machine could interact with embedded electrical circuits powered by neural activity \cite{JasanoffInductorsGrant}. Linking electrical recording with embedded optical microscopies or other spatially-resolved methods could circumvent the limits of purely electrical spike sorting. Optical techniques such as holography or 4D light fields could generalize to ultrasound or microwave implementations. Consideration of analogies and synergies between fields suggests a combinatorial space of possibilities.

Our goal here has not been to pick winning technologies (which may not yet have been conceived), but to aid a multi-disciplinary community of researchers in analyzing the problem. 
The challenge of observing the real-time operation of entire mammalian brains requires a return to first principles, and a fundamental reconsideration of the architectures of neural recording systems.
We hope that knowledge of the constraints governing scalable neural recording will enable the invention of entirely new, transformative approaches.

\section{Acknowledgments}

We thank K. Esvelt for helpful discussions on bioluminescent proteins; D. Boysen for help on the fuel cell calculations; R.~Tucker and E.~Yablonovitch (\url{http://www.e3s-center.org}) for helpful discussions on the energy efficiency of CMOS; C.~Xu and C.~Schaffer for data on optical attenuation lengths; T. Dean and the participants in his CS379C course at Stanford/Google, including Chris Uhlik and Akram Sadek, for helpful discussions and informative content in the discussion notes (\url{http://www.stanford.edu/class/cs379c/}); and L.~Wood, R.~Koene, S.~Rezchikov, A.~Bansal, J.~Lovelock, A.~Payne, R.~Barish, N.~Donoghue, J.~Pillow, W.~Shih, P.~Yin and J.~Hewitt for helpful discussions and feedback on earlier drafts.

A.~Marblestone is supported by the Fannie and John Hertz Foundation fellowship.
D.~Dalrymple is supported by the Thiel Foundation.
K.~Kording is funded in part by the Chicago Biomedical Consortium with support from the Searle Funds at The Chicago Community Trust.
E.~Boyden is supported by the National Institutes of Health (NIH), the National Science Foundation, the MIT
McGovern Institute and Media Lab, the New York Stem Cell Foundation Robertson Investigator
Award, the Human Frontiers Science Program, and the Paul Allen Distinguished Investigator in
Neuroscience Award.
B.~Stranges, B.~Zamft, R.~Kalhor and G.~Church acknowledge support from the Office of Naval Research and the NIH Centers of Excellence in Genomic Science.
M.~Shapiro is supported by the Miller Research Institute, the Burroughs~Wellcome Career~Award~at~the~Scientific Interface and the W.M. Keck Foundation.

\printbibliography[notsubtype=hide]

\end{document}