Different categorical divisions become prominent at different latencies in the human ventral visual representation

[R8I7]

 

[Below is my secret peer review of Cichy, Pantazis & Oliva (2014). The review below applies to the version as originally submitted, not to the published version that the link refers to. Several of the concrete suggestions for improvements below were implemented in revision. Some of the more general remarks on results and methodology remain relevant and will require further studies to completely resolve. For a brief summary of the methods and results of this paper, see Mur & Kriegeskorte (2014).]

This paper describes an excellent project, in which Cichy et al. analyse the representational dynamics of object vision using human MEG and fMRI on a set of 96 object images whose representation in cortex has previously been studied in monkeys and humans. The previous studies provide a useful starting point for this project. However, the use of MEG in humans and the combination of MEG and fMRI enables the authors to characterise the emergence of categorical divisions at a level of detail that has not previously been achieved. The general approaches of MEG-decoding and MEG-RSA pioneered by Thomas Carlson et al. (2013) are taken to new level here by using a richer set of stimuli (Kiani et al. 2007; Kriegeskorte et al. 2008). The experiment is well designed and executed, and the general approach to analysis is well-motivated and sound. The results are potentially of interest to a broad audience of neuroscientists. However, the current analyses lack some essential inferential components that are necessary to give us full confidence in the results, and I have some additional concerns that should be addressed in a major revision as detailed below.

 

MAJOR POINTS

(1) Confidence-intervals and inference for decoding-accuracy, RDM-correlation time courses and peak-latency effects

Several key inferences depend on comparing decoding accuracies or RDM correlations as a function of time, but the reliability of these estimates is not assessed. The paper also currently gives no indication of the reliability of the peak latency estimates. Latency comparisons are not supported by statistical inference. This makes it difficult to draw firm conclusions. While the descriptive analyses presented are very interesting and I suspect that most of the effects the authors highlight are real, it would be good to have statistical evidence for the claims. For example, I am not confident that the animate-inanimate category division peaks at 285 ms. This peak is quite small and on top of a plateau. Moreover, the time the category clustering index reaches the plateau (140 ms) appears more important. However, interpretation of this feature of the time course, as well, would require some indication of the reliability of the estimate.

I am also not confident that the RDM-correlation between the MEG and V1-fMRI data really has a significantly earlier peak than the RDM-correlation between the MEG and IT-fMRI data. This confirms our expectations, but it is not a key result. Things might be more complicated. I would rather see unexpected result of a solid analysis than an expected result of an unreliable analysis.

Ideally, adding 7 more subjects would allow random effects analyses. All time courses could then be presented with error margins (across subjects, supporting inference to the population by treating subjects as a random-effect dimension). This would also lend additional power to the fixed-effects inferential analyses.

However, if the cost of adding 7 subjects is considered too great, I suggest extending the approach of bootstrap resampling of the image set. This would provide reliability estimates (confidence intervals) for all accuracy estimates and peak latencies and support testing peak-latency differences. Importantly, the bootstrap resampling would simulate the variability of the estimates across different stimulus samples (from a hypothetical population of isolated object images of the categories used here). It would, thus, provide some confidence that the results are not dependent on the image set. Bootstrap resampling each category separately would ensure that all categories are equally represented in each resampling.

In addition, I suggest enlarging the temporal sliding window in order to stabilise the time courses, which look a little wiggly and might give unstable estimates of magnitudes and latencies across bootstrap samples otherwise – e.g. the 285 ms animate-inanimate discrimination peak. This will smooth the time courses appropriately and increase the power. A simple approach would be to use a bigger time steps as well, e.g. 10- or 20-ms bins. This would provide more power in Bonferroni correction across time. Alternatively, the false-discovery rate could be used to control false positives. This would work equally well for overlapping temporal windows (e.g. 20-ms window, 1 ms steps).

 

(2) Testing linear separability of categories

The present version of the analyses uses averages of pairwise stimulus decoding accuracies. The decoding accuracies serve as measures of representational discriminability (a particular representational distance measure). This is fine and interpretable. The average between minus the average within discriminability is a measure of clustering, which is a stronger result in a sense than linear decodability. However, it would be good to see is linear decoding of each category division reveals additional or earlier effects. While your clustering index essentially implies linear separability, the opposite is not true. For example, two category regions arranged like stacked pancakes could be perfectly linearly discriminable while having no significant clustering (i.e. difference between the within and between category discriminabilities of image pairs). Like this:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Each number indexes a category and each repetition represents an exemplar. The two lines illustrate the pancake situation. If the vertical separation of the pancakes is negligible, they are perfectly linearly discriminable, despite a negligible difference between the average within and the average between distance. It would be interesting to see these linear decoding analyses performed using either indendent response measurements for the same stimuli or an independent (held out) set of stimuli as the test set. This would more profoundly address different aspects of the representational geometry.

For the same images in the test set, pairwise discriminability in a high-dimensional space strongly suggests that any category dichotomy can be linearly decoded. Intuitively, we might expect the classifier to generalise well to the extent that the categories cluster (within distances < between distances) – but this need not be the case (e.g. the pancake scenario might also afford good generalization to new exemplars).

 

(3) Circularity of peak-categorical MDS arrangements

The peak MDS plots are circular in that they serve to illustrate exactly the effect (e.g. animate-inanimate separation) that the time point has been chosen to maximise. This circularity could easily be removed by selecting the time point for each subject based on the other subjects’ data. The accuracy matrices for the selected time points could then be averaged across subjects for MDS.

 

(4) Test individual-level match of MEG and fMRI

It is great that fMRI and MEG data was acquired in the same participants. This suggests an analysis of the consistent reflection of individual idiosyncrasies in object processing in fMRI and MEG. One way to investigate this would be to correlate single-subject RDMs between MEG and fMRI, within and between subjects. If the within-subject MEG-fMRI RDM correlation is greater (at any time point), then MEG and fMRI consistently reflect individual differences in object processing.

 

MINOR POINTS

Why average trials? Decoding accuracy means nothing then. Single-trial decoding accuracy and information in bits would be interesting to see and useful to compare to later studies.

The stimulus-label randomisation test for category clustering (avg(between)-avg(within)) is fine. However, the bootstrap test as currently implemented might be problematic.

“Under the null hypothesis, drawing samples with replacement from D0 and calculating their mean decoding accuracy daDo, daempirial should be comparable to daD0. Thus, assuming that D has N labels (e.g. 92 labels for the whole matrix, or 48 for animate objects), we draw N samples with replacement and compute the mean daD0 of the drawn samples.”

I understand the motivation for this procedure and my intuition is that this test is likely to work, so this is a minor point. However, subtracting the mean empirical decoding accuracy might not be a valid way of simulating the null hypothesis. Accuracy is a bounded measure and its distribution is likely to be wider under the null than under H1. The test is likely to be valid, because under H0 the simulation will be approximately correct. However, to test if some statistic significantly exceeds some fixed value by bootstrapping, you don’t need to simulate the null hypothesis. Instead, you simulate the variability of the estimate and obtain a 95%-confidence interval by bootstrapping. If the fixed value falls outside the interval (which is only going to happen in about 5% of the cases under H0), then the difference is significant. This seems to me a more straightforward and conventional test and thus preferable. (Note that this uses the opposite tail of the distribution and is not equivalent here because the distribution might not be symmetrical.)

Fig. 1: Could use two example images to illustrate the pairwise classification.

It might be good here to see real data in the RDM on the upper right (for one time point) to illustrate.

“non-metric scaling (MDS)”: add multidimensional

Why non-metric? Metric MDS might more accurately represent the relative representational distances.

“The results (Fig. 2a, middle) indicate that information about animacy arises early in the object processing chain, being significant at 140 ms, with a peak at 285 ms.”
I would call that late – relative to the rise of exemplar discriminability.

“We set the confidence interval at p=”
This is not the confidence interval, this is the p value.

“To our knowledge, we provide here the first multivariate (content-based) analysis of face- and body-specific information in EEG/MEG.”

The cited work by Carlson et al. (2012) shows very similar results – but for fewer images.

“Similarly, it has not been shown previously that a content-selective investigation of the modulation of visual EEG/MEG signals by cognitive factors beyond a few everyday categories is possible24,26.”
Carlson et al. (2011, 2012; refs 22,23) do show similar results.

I don’t understand what you mean by “cognitive factors” here.

Fig S3d
What are called “percentiles”, are not percentiles: multiply by 100.

“For example, a description of the temporal dynamics of face representation in humans might be possible with a large and rich parametrically modulated stimulus set as used in monkey electrophysiology 44.”
Should cite Freiwald (43), not Shepard (44) here.

 

LANGUAGE, LOGIC, STYLE, AND GRAMMAR

Although the overall argument is very compelling, there were a number of places in the manuscript where I came across weaknesses of logic, style, and grammar. The paper also had quite a lot of typos. I list some of these below, to illustrate, but I think the rest of the text could use more work as well to improve precision and style.

One stylistic issue is that the paper switches between present and past tense without proper motivation (general statement versus report of procedures used in this study).

Abstract: “individual image classification is resolved within 100ms”

Who’s doing the classification here? The brain or the researchers? Also: discriminating two exemplars is not classification (except in a pattern-classifier sense). So this is not a good way to state this. I’d say individual images are discriminated by visual representations within 100 ms.

“Thus, to gain a detailed picture of the brain’s [] in object recognition it is necessary to combine information about where and when in the human brain information is processed.”

Some phrase missing there.

“Characterizing both the spatial location and temporal dynamics of object processing

demands innovation in neuroimaging techniques”

A location doesn’t require characterisation, only specification. But object processing does not take place in one location. (Obviously, the reader may guess what you mean — but you’re not exactly rewarding him or her for reading closely here.)

“In this study, using a similarity space that is common to both MEG and fMRI,”
Style. I don’t know how a similarity space can be common to two modalities. (Again, the meaning is clear, but please state it clearly nevertheless.)

“…and show that human MEG responses to object correlate with the patterns of neuronal spiking in monkey IT”
grammar.

“1) What is the time course of object processing at different levels of categorization?”
Does object processing at a given level of categorisation have a single time course? If not, then this doesn’t make sense.

“2) What is the relation between spatially and temporally resolved brain responses in a content-selective manner?”
This is a bit vague.

“The results of the classification (% decoding accuracy, where 50% is chance) are stored in a 92 × 92 matrix, indexed by the 92 conditions/images images.”
images repeated

“Can we decode from MEG signals the time course at which the brain processes individual object images?”
Grammatically, you can say that the brain processes images “with” a time course, not “at” a time course. In terms of content, I don’t know what it means to say that the brain processes image X with time course Y. One neuron or region might respond to image X with time course Y. The information might rise and fall according to time course Y. Please say exactly what you mean.
“A third peek at ~592ms possibly indicates an offset-response”
Peek? Peak.

“Thus, multivariate analysis of MEG signals reveal the temporal dynamics of visual content processing in the brain even for single images.”
Grammar.

“This initial result allows to further investigate the time course at which information about membership of objects at different levels of categorization is decoded, i.e. when the subordinate, basic and superordinate category levels emerge.”
Unclear here what decoding means. Are you suggesting that the categories are encoded in the images and the brain decodes them? And you can tell when this happens by decoding? This is all very confusing. 😉

“Can we determine from MEG signals the time course at which information about membership of an image to superordinate-categories (animacy and naturalness) emerges in the brain?”
Should say “time course *with* which”. However, all the information is there in the retina. So it doesn’t really make sense to say that the information emerges with a time course. What is happening is that category membership becomes linearly decodable, and thus in a sense explicit, according to this time course.

“If there is information about animacy, it should be mirrored in more decoding accuracy for comparisons between the animate and inanimate division than within the average of the animate and inanimate division.”

more decoding accuracy -> greater decoding accuracy

“within the average” Bad phrasing.

“utilizing the same date set in monkey electrophysiology and human MRI”
–> stimulus set

“Boot-strapping labels tests significance against chance”
Determining significance is by definition a comparison of the effect estimate against what would be expected by chance. It therefore doesn’t make sense to “test significance against chance”.

“corresponding to a corresponding to a p=2.975e-5 for each tail”
Redundant repetition.

“Given thus [?] a fMRI dissimilarity matrices [grammar] for human V1 and IT each, we calculate their similarity (Spearman rank-order correlation) to the MEG decoding accuracy matrices over time, yielding 2nd order relations (Fig. 4b).”

“We recorded brain responses with fMRI to the same set of object images used in the MEG study *and the same participants*, adapting the stimulation paradigm to the specifics of fMRI (Supplementary Fig. 1b).”
Grammar

The effect is significant in IT (p<0.001), but not in V1 (p=0.04). Importantly, the effect is significantly larger in IT than in V1 (p<0.001).

p=0.04 is also significant, isn’t it? This is very similar to Kriegeskorte et al. (2008, Fig. 5A), where the animacy effect was also very small, but significant in V1.

“boarder” -> “border”

 

Advertisements

The selfish scientist’s guide to preprint posting

Preprint posting is the right thing to do for science and society. It enables us to share our results earlier, speeding up the pace of science. It also enables us to catch errors earlier, minimising the risk of alerting the world to our findings (through a high-impact publication) before the science is solid. Importantly, preprints ensure long-term open access to our results for scientists and for the public. Preprints can be rapidly posted for free on arXiv and bioRxiv, enabling instant open access.

Confusingly for any newcomer to science who is familiar with the internet, scientific journals don’t provide open access to papers in general. They restrict access with paywalls and only really publish (in the sense of to make publicly available) a subset of papers. The cost of access is so high that even institutions like Harvard and the UK’s Medical Research Council (MRC) cannot afford paying for general access to all the relevant scientific literature. For example, as MRC employees, members of my lab do not have access to the Journal of Neuroscience, because our MRC Unit, the Cognition and Brain Sciences Unit in Cambridge, cannot afford to subscribe to it. The University of Cambridge pays more than one million pounds in annual subscription fees to Elsevier alone, a single major publishing company, as do several other UK universities. Researchers who are not at well-funded institutions in rich countries are severely restricted in their access to the literature and cannot fully participate in science under the present system.

Journals administer peer review and provide pretty layouts and in some cases editing services. Preprints complement journals, enabling us to read about each other’s work as soon as it’s written up and without paywall restrictions. With the current revival of interest in preprints (check out ASAPbio), more and more scientists choose to post their papers as preprints.

All major journals including Nature, Science, and most high-impact field-specific journals support the posting of preprints. Preprint posting is in the interest of journals because they, too, would like to avoid publication of papers with errors and false claims. Moreover, the early availability of the results boosts early citations and thus the journal’s impact factor. Check out Wikipedia’s useful overview of journal preprint policies. For detailed information on each journal’s precise preprint policy, refer to the excellent ROMEO website at the University of Nottingham’s SHERPA project on the future of scholarly communication (thanks to Carsten Allefeld for pointing this out).

All the advantages of using preprints to science and society are good and well. However, we also need to think about ourselves. Does preprint posting mean that we give away our results to competitors, potentially suffering a personal cost for the common good? What is the selfish scientist’s best move to advance her personal impact and career? There is a risk of getting scooped. However, this risk can be reduced by not posting too early. It turns out that posting a preprint, in addition to publication in a journal, is advisable from a purely selfish perspective, because it brings the following benefits to the authors:

  • Open access: Preprints guarantee open access, enhancing the impact and ultimate citation success of our work. This is a win for the authors personally, as well as for science and society.
  • Errors caught: Preprints help us catch errors before wider reception of the work. Again this is a major benefit not only to science, society, and journals, but also to the authors, who may avoid having to correct or retract their work at a later stage.
  • Earlier citation: Preprints grant access to our work earlier, leading to earlier citation. This is beneficial to our near-term citation success, thus improving our bibliometrics and helping our careers — as well as boosting the impact factor of the journal, where the paper appears.
  • Preprint precedence: Finally, preprints can help establish the precedence of findings. A preprint is part of the scientific record and, though the paper still awaits peer review, it can help establish scientific precedence. This boosts the long-term citation count of the paper.

In computer science, math, and physics, reading preprints is already required to stay abreast of the literature. The life sciences will follow this trend. As brain scientists working with models from computer science, we read preprints and, if we judge them to be of high-quality and relevance, we cite them.

My lab came around to routine preprint posting for entirely selfish reasons. Our decision was triggered by an experience that drove home the power of preprints. A competing lab had posted a paper closely related to one of our projects as a preprint. We did not post preprints at the time, but we cited their preprint in the paper on our project. Our paper appeared before theirs in the same journal. Although we were first, by a few months, with a peer-reviewed journal paper, they were first with their preprint. Moreover, our competitors could not cite us, because we had not posted a preprint and their paper had already been finalised when ours appeared. Appropriately, they took precedence in the citation graph – with us citing them, but not vice versa.

Posting preprints doesn’t only have advantages. It is also risky. What if another group reads the preprint, steals the idea, and publishes it first in a high-impact journal? This could be a personal catastrophe for the first author, with the credit for years of original work diminished to a footnote in the scientific record. Dishonorable scooping of this kind is not unheard of. Even if we believe that our colleagues are all trustworthy and outright stealing is rare, there is a risk of being scooped by honorable competitors. Competing labs are likely to be independently working on related issues. Seeing our preprint might help them improve their ongoing work; and they may not feel the need to cite our preprint for the ideas it provided. Even if our competitors do not take any idea from our preprint, just knowing that our project is ready to enter the year-long (or multiple-year) publication fight might motivate them to accelerate progress with their competing project. This might enable them to publish first in a journal.

The risk of being scooped and the various benefits vary as a function of the time of preprint posting. If we post at the time of publication in a journal, the risk of being scooped is 0 and the benefit of OA remains. However, the other benefits grow with earlier posting. How do benefits and costs trade off and what is the optimal time for posting a preprint?

As illustrated in the figure below, this selfish scientist believes that the optimal posting time for his lab is around the time of the first submission of the paper. At this point, the risk of being scooped is small, while the benefits of preprint precedence and early citation are still substantial. I therefore encourage the first authors in my lab to post at the time of first submission. Conveniently, this also minimises the extra workload required for the posting of the preprint. The preprint is the version of the paper to be submitted to a journal, so no additional writing or formatting is required. Posting a preprint takes less than half an hour.

I expect that as preprints become more widely used, incentives will shift. Preprints will more often be cited, enhancing the preprint-precedence and early-citation benefits. This will shift the selfish scientist’s optimal time of preprint posting to an earlier point, where an initial round of responses can help improve the paper before a journal vets it for a place in its pages. For now, we post at the time of the first submission.

 

 

preprint benefits afo posting time

Benefits and costs to the authors of posting preprints as a function of the time of posting. This figure considers the benefits and costs of posting a preprint at a point in time ranging from a year before (-1) to a year after (1, around the time of appearance in a journal) initial submission (0). The OA benefit (green) of posting a preprint is independent of the time of posting. This benefit is also available by posting the preprint after publication of the paper in a journal. The preprint-precedence (blue) and early-citation (cyan) benefits grow by an equal amount with every month prior to journal publication that the paper is out as a preprint. This is based on the assumption that the rest of the scientific community, acting independently, is chipping away at the novelty and citations of the paper at a constant rate. When the paper is published in a journal (assumed at 1 year after initial submission), the preprint no longer accrues these benefits, so the lines reach 0 benefit at the time of the journal publication. Finally, the risk of being scooped (red) is large when the preprint is posted long before initial submission. At the time of submission, it is unlikely that a competitor starting from scratch can publish first in a journal. However, there is still the risk that competitors who were already working on related projects accelerate these and achieve precedence in terms of journal publication as a result. The sum total (black) of the benefits and the cost associated with the risk of being scooped peaks slightly before the time of the first submission to a journal. The figure serves to illustrate my own rationale for posting around the time of the first submission of a paper to a journal. It is not based on objective data, but on subjective estimation of the costs and benefits for a typical paper from my own lab.