Ivanova, Schrimpf, Anzellotti, Zaslavsky, Fedorenko, and Isik (pp2021) discuss whether mapping models should be linear or nonlinear. This paper is part of a Cognitive Computational Neuroscience 2020 Generative Adversarial Collaboration, with the goal to resolve an important controversy in the field.
The authors usefully define the term mapping model in contradistinction to models of brain function. A mapping model specifies the mapping between a model of brain function (some brain-representational model) and brain-activity measurements. A mapping model can relate brain-activity measurements to different types of brain-representational model: (1) descriptions of the stimuli, (2) descriptions of behavioral responses, (3) activity measurements in other brains or other brain regions, or (4) the units in some layer of a neural network model. Moreover, mapping models can operate in either direction: from the measured brain activity to the features of the representational model (decoding model) or from the model features to the measured brain activity (encoding model). Figures 1 and 2 of the paper very clearly lay out these important distinctions.
To begin addressing the question what mapping models should be used the authors consider three desiderata: (1) predictive accuracy, (2) interpretability, and (3) biological plausibility. Predictive accuracy tends to favor more complex and nonlinear models (assuming we have enough data for fitting), whereas simpler and linear models may be easier to interpret in general. Biological plausibility would appear to be irrelevant if the mapping model is not considered a model of brain function. However, in the context of an encoding model, for example, we may want the mapping model to capture physiological processes such as the hemodynamics and nonphysiological processes such as the averaging in voxels, neither of which may be considered part of the brain-computational process that is the ultimate target of our investigation.
The authors make many reasonable points about linear and nonlinear mapping models and conclude by suggesting that rather than the linear/nonlinear distinction, we should consider more general notions of the complexity of the mapping model. They suggest that researchers consider a range of possible mapping models and estimate their complexity. They discuss three measures of complexity: the number of parameters, the minimum description length, and the amount of fitting data needed for a model to achieve a given level of predictive accuracy.
The paper makes a good contribution by beginning a broader discussion about mapping models and putting the pieces of the puzzle on the table. However, a problem is that the arguments are not developed in the context of clearly defined research goals. The three desiderata (predictive accuracy, interpretability, and biological plausibility) are referred to as “goals” in the paper and further differentiated in Fig. 3:
- predictive accuracy
- compare competing feature sets
- decode features from neural data
- build maximally accurate models of brain activity
- interpretability
- examine individual features
- test representational geometry
- interpret feature sets
- biological plausibility
- incorporate physiological properties of the measurements
- simulate downstream neural readout
A lot of thought clearly went into this structure, which serves to enable insights at a more general level about the mapping model: for all cases where we desire biological plausibility, interpretability, or predictive accuracy. However, the cost of this abstraction is too great. Arguments for particular choices of mapping model are compelling only in the context of more specifically defined research goals that actually motivate researchers to conduct studies.
Neither the three top-level desiderata, nor the more specific objectives really capture the goals that motivate researchers. We don’t do studies to achieve “predictive accuracy”. Rather our goal may be to adjudicate among different computational models that implement hypotheses about brain information processing. The models’ predictive accuracy is used as a performance statistic to inferentially compare the models.
The goal to compare brain-computational models, for example, is difficult to localize in the list. It is related to “comparing competing feature sets”, “building accurate model of brain activity”, “biological plausibility”, and “testing representational geometry”, but each of these captures only part of the goal to test brain-computational models.
On a similar note, I would argue that “decoding features” is not a research goal. The relevant research goal could be defined as “testing a brain region for the presence of particular information” or “testing whether particular information is explicitly encoded in a brain region”.
It would help to start with research goals that really capture scientists motivation for conducting studies that use mapping models, and then to discuss the merits of particular choices of mapping model in each of these contexts. Some research goals are: testing if certain information is present in a region, testing if it is present in a particular format, adjudicating among representational models, and adjudicating among brain-computational models. Starting with these would make it easier for the reader to follow, and would enable the authors to make some of the arguments already made (e.g. that testing for the presence of information can benefit from nonlinear decoders) more compellingly. It might also lead to additional insights.
An important question is how this CCN Generative Adversarial Collaboration (GAC) can lead to progress beyond this position paper. One topic for further study is the suggestion made at the end that a variety of mapping models should be considered and compared in terms of their complexity and predictive accuracy. This suggestion seems potentially important, but would need (1) careful motivation in the context of particular research goals and (2) more research that develops and validates methods for actually exploring the space of mapping models with flexible regularization. This could be the basis for the aim of the GAC to lead to new research that resolves some challenge or controversy.
Specific comments
Is it that simple? Linear mapping models in cognitive neuroscience
When I read the title, I want to ask back: Is what exactly that simple? What is it? I might interpret the question in the context of the research goal I most care about (adjudicating among brain-computational theories). In that context, I guess, I’m on team linear. (I want to confine nonlinearities to the brain-computational model.) But the vagueness entailed by the absence of explicit research goals starts right there in the title.
If the features are pixels, the answer might be different than if the features are semantic stimulus descriptors (e.g. nonlinear for pixels, linear for semantic features if we are looking for their explicit representation in the brain). If the brain responses are single-cell recordings, the answer might be different than if the brain responses are fMRI voxels (in the latter case, we may want the mapping model to capture averaging within voxels). If the goal is to reveal whether particular information is present in a brain region, we might want to use a nonlinear decoding analysis. If the goal is to reveal whether particular information is explicitly encoded in the sense of linear decodability, we might want to use a linear decoding analysis. If the goal is to test a brain-computational model of perception, the answer will depend on whether the mapping model is supposed to serve solely the purpose of mapping model representations to brain representations, or whether it is supposed to be interpreted as part of the brain-computational model (i.e. whether we intend to use the brain-activity data to learn parameters of the computation we are modeling).
Figure 1 is great, because it usefully lays out a number of different scenarios in which mapping models are commonly used. These scenarios each require separate discussion. It might be useful to include a table with a row for each combination of research goal, domain, and data. Given this essential context, we can have a useful discussion about the pros and cons of linear and nonlinear mapping models with particular priors on their parameters.
“1:1 mapping”, “perfect features”
A linear mapping is much more general than a 1:1 mapping, which of these is meant here? The term “perfect features” is used as though it’s clear how it is to be defined. But that’s exactly the question to be addressed: Should we require the brain-computational model units to be related to neural responses by a 1:1 mapping, an orthogonal linear transform (which would imply matching geometries), a sparse linear transform, a general linear transform, or a particular nonlinear transform, or any nonlinear transform (which would imply merely that the model encodes the information present in the neural population).
3.1.3. Build accurate models of brain data. Finally, some researchers are trying to build accurate models of the brain that can replace experimental data or, at least, reduce the need for
experiments by running studies in silico (e.g., Jain et al., 2020; Kell et al., 2018; Yamins et al., 2014).
“Building models of data” may describe a frequent activity. But I’d say it should be motivated by some larger goal (such as testing a theory). It’s also unclear how models can or why they should replace data when the purpose of the latter is to test the former.
3.2.2. Test representational geometry: […] do features X, generated by a known process, accurately describe the space of neural responses Y? Thus, the feature set becomes a new unit of interpretation, and the linearity restriction is placed primarily to preserve the overall geometry of the feature space. For instance, the finding that convolutional neural networks and the ventral
visual stream produce similar representational spaces (Yamins et al., 2014) allows us to infer that both processes are subject to similar optimization constraints (Richards et al., 2019). That said, mapping models that probe the representational geometry of the neural response space do not have to be linear, as long as they correspond to a well-specified hypothesis about the relationship between features and data.
This doesn’t make sense to me. A linear mapping does not in general preserve the representational geometry. A particular class of linear mappings (orthogonal linear transformations) preserve the geometry (distances and inner products, and thus angles).
If a mapping model achieves good predictivity, we can say that a given set of features is reflected in the neural signal. In contrast, if
a powerful mapping model trained on a large set of data achieves poor predictivity, it provides strong evidence that a given feature set is not represented in the neural data.
Absence of evidence is not evidence of absence. “Poor predictivity” doesn’t provide “strong evidence” that the neural population doesn’t encode what we fail to find in the data.
3.3. Biological plausibility. In addition to prediction accuracy and interpretability-related considerations, biological plausibility can also be a factor in deciding on the space of acceptable feature-brain mappings. We discuss two goals related to biological plausibility: simulating linear readout and accounting
for physiological mechanisms affecting measurement.
Figure 2 suggests that be mapping model is not part of the brain model, so why does biological plausibility matter?
Even a relatively ‘constrained’ linear classifier can read out many
features from the data, many of them biologically implausible (e.g., voxel-level ‘biases’ that allow orientation decoding in V1 using fMRI; Ritchie et al., 2019).
If a linear readout from voxels is possible, then a linear readout from neurons should definitely be possible. What does it mean to say the decoded features are biologically implausible? (Many of the other points in this section seem important and solid, though.)
Even with infinite data, certain measurement properties might force us to use a particular mapping class. For instance, Nozari et al. (2020) show that fMRI resting state dynamics are best
modeled with linear mappings and suggest that fMRI’s inevitable spatiotemporal signal averaging might be to blame (although see Anzellotti et al., 2017, for contrary evidence).
Do Nozari et al. have “infinite data”? I also don’t understand what’s meant by saying “resting state dynamics are best modeled with linear mappings”. Are we talking about linear dynamics or linear mapping models? What is the mapping from and to?
3.3.2. Incorporate physiological mechanisms affecting measurement
It’s not just physiological mechanisms, but also other components of the measurement process. For example, the local averaging in fMRI voxels may be accounted for by averaging of the units of a neural network model, which can be achieved in the framework of linear encoding models.
Dear Niko,
Thank you for a comprehensive and helpful review. We have gone through several rounds of discussion within our team to improve and refine the paper. We have resubmitted our manuscript and are looking forward to further discussion.
We are sharing our response to the review here for anyone interested:
https://docs.google.com/document/d/1ucc3jJUjyK6PtVA9t3nbDsNG_8u7hRkm1DN_gD3Wgyo/edit?usp=sharing
We have also updated our manuscript preprint:
https://www.biorxiv.org/content/10.1101/2021.04.02.438248v2
Best,
Anna Ivanova on behalf of all authors