How can we incentivize Post-Publication Peer Review?

Open review of “Post-Publication Peer Review for Real”
by Koki Ikeda, Yuki Yamada, and Kohske Takahashi (pp2020)


Our system of pre-publication peer review is a relict of the age when the only way to disseminate scientific papers was through print media. Back then the peer evaluation of a new scientific paper had to precede its publication, because printing (on actual paper if you can believe it) and distributing the physical print copies is expensive. Only a small selection of papers could be made accessible to the entire community.

Now the web enables us to make any paper instantly accessible to the community at negligible cost. However, we’re still largely stuck with pre-publication peer review, despite its inherent limitations: to a small number of preselected reviewers who operate in isolation from and without the scrutiny of the community.

People familiar with the web who have considered, from first principles, how a scientific peer review system should be designed tend to agree that it’s better to make a new paper publicly available first, so the community can take note of the work and a broader set of opinions can contribute to the evaluation. Post-publication peer review also enables us to make the evaluation transparent: Peer reviews can be open responses to a new paper. Transparency promises to improve reviewers’ motivation to be objective, especially if they choose to sign and take responsibility for their reviews.

We’re still using the language of a bygone age, whose connotations make it hard to see the future clearly:

  • A paper today is no longer made of paper — but let’s stick with this one.
  • A preprint is not something that necessarily precedes the publication in print media. A better term would be “published paper”.
  • The term publication is often used to refer to a journal publication. However, preprints now constitute the primary publications. First, a preprint is published in the real sense: the sense of having been made publicly available. This is in contrast to a paper in Nature, say, which is locked behind a paywall, and thus not quite actually published. Second, the preprint is the primary publication in that it precedes the later appearance of the paper in a journal.

Scientists are now free to use the arXiv and other repositories (including bioRxiv and PsyArXiv) to publish papers instantly. In the near future, peer review could be an open and open-ended process. Of course papers could still be revised and might then need to be re-evaluated. Depending on the course of the peer evaluation process, a paper might become more visible within its field, and perhaps even to a broader community. One way this could happen is through its appearance in a journal.

The idea of post-publication peer review has been around for decades. Visions for open post-publication peer review have been published. Journals and conferences have experimented with variants of open and post-publication peer review. However, the idea has yet to revolutionize the scientific publication system.

In their new paper entitled “Post-publication Peer Review for Real”, Ikeda, Yamada, and Takahashi (pp2020) argue that the lack of progress with post-publication peer review reflects a lack of motivation among scientists to participate. They then present a proposal to incentivize post-publication peer review by making reviews citable publications published in a journal. Their proposal has the following features:

  • Any scientist can submit a peer review on any paper within the scope of the journal that publishes the peer reviews (the target paper could be published either as a preprint or in any journal).
  • Peer reviews undergo editorial oversight to ensure they conform to some basic requirements.
  • All reviews for a target paper are published together in an appealing and readable format.
  • Each review is a citable publication with a digital object identifier (DOI). This provides a new incentive to contribute as a peer reviewer.
  • The reviews are to be published as a new section of an existing “journal with high transparency”.

Ikeda at al.’s key point that peer reviews should be citable publications is solid. This is important both to provide an incentive to contribute and also to properly integrate peer reviews into the crystallized record of science. Making peer reviews citable publications would be a transformative and potentially revolutionary step.

The authors are inspired by the model of Behavioral and Brain Sciences (BBS), an important journal that publishes theoretical and integrative perspective and review papers as target articles, together with open peer commentary. The “open” commentary in BBS is very successful, in part because it is quite carefully curated by editors (at the cost of making it arguably less than entirely “open” by modern standards).

BBS was founded by Stevan Harnad, an early visionary and reformer of scientific publishing and peer review. Harnad remained editor-in-chief of BBS until 2002. He explored in his writings what he called “scholarly skywriting“, imagining a scientific publication system that combines elements of what is now known as open-notebook science and research blogging with preprints, novel forms of peer review, and post-publication peer commentary.

If I remember correctly, Harnad drew a bold line between peer review (a pre-publication activity intended to help authors improve and editors select papers) and peer commentary (a post-publication activity intended to evaluate the overall perspective or conclusion of a paper in the context of the literature).

I am with Ikeda et al. in believing that the lines between peer review and peer commentary ought to be blurred. Once we accept that peer review must be post-publication and part of a process of community evaluation of new papers, the prepublication stage of peer review falls away. A peer review, then, becomes a letter to both the community and to the authors and can serve any combination of a broader set of functions:

  • to explain the paper to a broader audience or to an audience in an adjacent field,
  • to critique the paper at the technical and conceptual level and possibly question its conclusions,
  • to relate it to the literature,
  • to discuss its implications,
  • to help the authors improve the paper in revision by adding experiments or analyses and improving the exposition of the argument in text and figures.

An example of this new form is the peer review you are reading now. I review only papers that have preprints and publish my peer reviews on this blog. This review is intended for both the authors and the community. The authors’ public posting of a preprint indicates that they are ready for a public response.

Admittedly, there is a tension between explaining the key points of the paper (which is essential for the community, but not for the authors) and giving specific feedback on particular aspects of the writing and figures (which can help the authors improve the paper, but may not be of interest to the broader community). However, it is easy to relegate detailed suggestions to the final section, which anyone looking only to understand the big picture can choose to skip.

Importantly, the reviewer’s judgment of the argument presented and how the paper relates to the literature is of central interest to both the authors and the community. Detailed technical criticism may not be of interest to every member of the community, but is critical to the evaluation of the claims of a paper. It should be public to provide transparency and will be scrutinized by some in the community if the paper gains high visibility.

A deeper point is that a peer review should speak to the community and to the authors in the same voice: in a constructive and critical voice that attempts to make sense of the argument and to understand its implications and limitations. There is something right, then, about merging peer review and peer commentary.

While reading Ikeda et al.’s review of the evidence that scientists lack motivation to engage in post-publication peer review, I asked myself what motivates me to do it. Open peer review enables me to:

  • more deeply engage the papers I review and connect them to my own ideas and to the literature,
  • more broadly explore the implications of the papers I review and start bigger conversations in the community about important topics I care about,
  • have more legitimate power (the power of a compelling argument publicly presented in response to the claims publicly presented in a published paper),
  • have less illegitimate power (the power of anonymous judgment in a secretive process that decides about publication of someone else’s work)
  • take responsiblity for my critical judgments by subjecting them to public scrutiny
  • make progress with my own process of scientific insight
  • help envision a new form of peer review that could prove positively transformative

In sum, open post-publication peer review, to me, is an inherently more meaningful activity than closed pre-publication peer review. I think there is plenty of motivation for open post-publication peer review, once people overcome their initial uneasiness about going transparent. A broader discussion of researcher motivations for contributing to open post-publication peer review is here.

That said, citability and DOIs are essential, and so are the collating and readability of the peer reviews of a target paper. I hope Ikeda et al. will pursue their idea of publishing open post-publication peer reviews in a journal. Gradually, and then suddenly, we’ll find our way toward a better system.


Suggestions for improvements

(1) The proposal raises some tricky questions that the authors might want to address:

  • Which existing “journal with high transparency” should this be implemented in?
  • Should it really be a section in an existing journal or a new journal (e.g. the “Journal of Peer Reviews in Psychology”)?
  • Are the peer reviews published immediately as they come in, or in bulk once there is a critical mass?
  • Are new reviews of a target paper to be added on an ongoing basis in perpetuity?
  • How are the target papers to be selected? Should their status as preprints or journal publications make any difference?
  • Why do we need to stick with the journal model? Couldn’t commentary sections on preprint servers solve the problem more efficiently — if they were reinvented to provide each review also as a separate PDF with beautiful and professional layout, along with figure and LaTeX support and, critically, citability and DOIs?

Consider addressing some of these questions to make the proposal more compelling. In particular, it seems attractive to find an efficient solution linked to preprint servers to cover large parts of the literature. Can the need for editorial work be minimized and the critical incentive provided through beautiful layout, citability, and DOIs?


(2) Cite and and discuss some of Stevan Harnad’s contributions. Some of the ideas in this edited collection of visions for post-publication peer review may also be relevant.


A recent large-scale survey reported that 98% of researchers who participated in the study agreed that the peer-review system was important (or extremely important) to ensure the quality and integrity of science. In addition, 78.8% answered that they were satisfied (or very satisfied) with the current review system (Publon, 2018) . It is probably true that peer-review has been playing a significant role to control the quality of academic papers (Armstrong, 1997) . The latter result, however, is rather perplexing, since it has been well known that sometimes articles could pass through the system without their flaws being revealed (Hopewell et al., 2014) , results could not be
reproduced reliably (e.g. Open Science Collaboration, 2015) , decisions were said to be no better than a dice roll (Lindsey, 1988; Neff & Olden, 2006) , and inter-reviewer agreement was estimated to be very low (Bornmann et al., 2010).

(3) Consider disentangling the important pieces of evidence in the above passage a little more. “Perplexing” seems the wrong word here: Peer review can be simultaneously the best way to evaluate papers and imperfect. It would be good to separate mere evidence that mistakes happen (which appears unavoidable), from the stronger criticism that peer review is no better than random evaluations. A bit more detail on the cited results suggesting it is no better than random would be useful. Is this really a credible conclusion? Does it require qualifications?


The low reliability across reviewers is especially disturbing and raises serious concerns about the effectiveness of the system, because we now have empirical data showing that inter-rater agreement and precision could be very high, and they robustly predict the replicability of previous studies, when the information about others’ predictions are shared among predictors (Botvinik-Nezer et al., 2020; Camerer et al., 2016, 2018; Dreber et al., 2015; Forsell et al., 2019) . Thus, the secretiveness of the current system could be the unintended culprit of its suboptimality.

(4) Consider revising the above passage. Inter-reviewer agreement is an important metric to consider. However, even zero correlation between reviewer’s ratings does not imply that the reviews are random. Reviewers may focus on different criteria. For example, if one reviewer judged primarily the statistical justification of the claims and another primarily the quality of the writing, the correlation between their ratings could be zero. However, the average rating would be a useful indicator of quality. Averaging ratings in this context does not serve merely to reduce the noise in the evaluations, it also serves to compromise between different weightings of the criteria of quality.

Interaction among reviewers that enables them to adjust their judgments can fundamentally enhance the review process. However, inter-rater agreement is an ambiguous measure, when the ratings are not independent.


However, BBS commentary is different from them in terms of that it employs an “open” system so that anyone can submit the commentary proposal at will (although some commenters are arbitrarily chosen by the editor). This characteristic makes BBS commentary much more similar to PPPR than other traditional publications.

(5) Consider revising. Although BBS commentaries are nothing like traditional papers (typically much briefer statements of perspective on a target paper) and are a form of post-publication evaluation, they are also very distinct in form and content. I think Stevan Harnad made this point somewhere.


Next and most importantly, the majority of researchers find no problem with their incentives to submit an article as a BBS commentary, because they will be considered by many researchers and institutes to be equivalent to a genuine publication and can be listed on one’s CV. Therefore, researchers have strong incentives to actively publish their reviews on BBS .

(6) Consider revising. It’s a citable publication, yes. However, it’s in a minor category, nowhere near a primary research paper or a full review or perspective paper.


There seem to be at least two reasons for this uniqueness. Firstly, BBS is undoubtedly one of the most prestigious journals in psychology and its
related areas, with a 17.194 impact factor for the year 2018. Secondly, the commentaries are selected by the editor before publication, so their quality is guaranteed at least to some extent. Critically, no current PPPR has the features comparable to these in BBS.

(7) Consider revising. While this is true for BBS, I don’t see how a journal of peer reviews that is open to all articles within a field, including preprints, as the target papers could replicate the prestige of BBS. This passage doesn’t seem to help the argument in favor of the new system as currently proposed. However, you might revise the proposal. For example, I could imagine a “Journal of Peer Commentary in Psychology” applying the BBS model to editorially selected papers of broad interest.


To summarize, we might be able to create a new and better PPPR system by simply combining the advantages of BBS commentary – (1) strong incentive for commenters and (2) high readability – with those of the current PPPRs – (3) unlimited target selection and (4) unlimited commentary accumulation -. In the next section, we propose a possible blueprint for the implementation of these ideas, especially with a focus on the first two, because the rest has already been realized in the current media.

(8) Consider revising. The first two points seem at a strong tension with the second two points. Strong incentive to review requires highly visible target publications, which isn’t possible if target selection is unlimited. High readability also appears compromised when reviews come in over a long period and there is no limit to their number. This should at least be discussed.

Among the features that seem critical to the successful implementation of PPPR, strong incentives for commenters is probably the most important factor. We speculated that BBS has achieved this goal by providing the commentaries a status equivalent to a standard academic paper. Furthermore, this is probably realized by the journal’s two unique characteristics: its academic prestige and the selection of commentaries by the editor. Based on these considerations, we propose the following plans for the new PPPR system.

(9) Consider revising. As discussed above, the commentaries do not quite have “equivalent” status to a standard academic paper.


Is the radial orientation-preference map in V1 an artefact of “vignetting”?

[I6 R8]

The orientation of a visual grating can be decoded from fMRI response patterns in primary visual cortex (Kamitani & Tong 2005, Haynes & Rees 2005). This was surprising because fMRI voxels in these studies are 3 mm wide in each dimension and thus average over many columns of neurons that respond to different orientations. Since then, many studies have sought to clarify why fMRI orientation decoding works so well.

The first explanation given was that even though much of the contrast of the neuronal orientation signals might cancel out in the averaging within each voxel, any given voxel might retain a slight bias toward certain orientations if it didn’t sample all the columns exactly equally (Kamitani & Tong 2005, Boynton 2005). By integrating the evidence across many slightly biased voxels with a linear decoder, it should then be possible to guess, better than chance, the orientation of the stimulus.

Later work explored how random orientation biases might arise in the voxels. If each voxel directly sampled the orientation columns (computing an average within its cuboid boundaries), then decoding success should be very sensitively dependent on the alignment of the voxels between training and test sets. A shift of the voxel grid on the scale of the width of an orientation column would change the voxel biases and abolish decoding success. Several groups have argued that the biases might arise at the level of the vasculature (Gardner et al. 2009, Kriegeskorte et al. 2009). This would make the biases enabling orientation decoding less sensitive to slight shifts of the voxel grid. Moreover, if voxels reflected signals sampled through the fine-grained vasculature, then it would be the vasculature, not the voxel grid that determines to what extent different spatial frequencies of the underlying neuronal activity patterns are reflected in the fMRI patterns (Kriegeskorte et al. 2009).

Another account (Op de Beeck 2010, Freeman et al. 2011) proposed that decoding may rely exclusively on coarse-scale spatial patterns of activity. In particular, Freeman BrouwerHeeger and Merriam (2011) argued that radial orientations (those aligned with a line that passes through the point of fixation) are over-represented in the neural population. If this were the case, then a grating would elicit a coarse-scale response pattern across its representation in V1, in which the neurons representing edges pointing (approximately) at fixation are more strongly active. There is indeed evidence from multiple studies for a nonuniform representation of orientations in V1 (Furmanski & Engel 2000, Sasaki et al., 2006, Serences et al. 2009, Mannion et al. 2010), perhaps reflecting the nonuniform probability distribution of orientation in natural visual experience. The over-representation of radial orientations might help explain the decodability of gratings. However, opposite-sense spirals (whose orientations are balanced about the radial orientation) are also decodable (Mannion et al. 2009, Alink et al. 2013). This might be due to a simultaneous over-representation of vertical orientations (Freeman et al. 2013, but see Alink et al. 2013).

There’s evidence in favor of a contribution to orientation decoding of both coarse-scale (Op de Beeck 2010, Freeman et al. 2011, Freeman et al. 2013) and fine-scale components of the fMRI patterns (e.g. Shmuel et al. 2010, Swisher et al. 2010, Alink et al. 2013, Pratte et al. 2016, Alink et al. 2017).

Note that both coarse-scale and fine-scale pattern accounts suggest that voxels have biases in favor of certain orientations. A entirely novel line of argument was introduced to the debate by Carlson (2014).

Carlson (2014) argued, on the basis of simulation results, that even if every voxel sampled a set of filters uniformly representing all orientations (i.e. without any bias), the resulting fMRI patterns could still reflect the orientation of a grating confined to a circular annulus (as standardly used in the literature). The reason lies in “the interaction between the stimulus region and the empty background” (Carlson 2014), an effect of the relative orientations of the grating and the edge of the aperture (the annulus within which the grating is visible). Carlson’s simulations showed that the average response of a uniform set of Gabor orientation filters is larger where the aperture edge is orthogonal to the grating. He also showed that the effect does not depend on whether the aperture edge is hard or soft (fading contrast). Because the voxels in this account have no biases in favor of particular orientations, Carlson aptly referred to his account as an “unbiased” perspective.

The aperture edge adds edge energy. The effect is strongest when the edge is orthogonal to the carrier grating orientation. We can understand this in terms of the Fourier spectrum. Whereas a sine grating has a concentrated representation in the 2D Fourier amplitude spectrum, the energy is more spread out when an aperture limits the extent of the grating, with the effect depending on the relative orientations of grating and edge.

For an intuition on how this kind of thing can happen, consider a particularly simple scenario, where a coarse rectangular grating is limited by a sharp aperture whose edge is orthogonal to the grating. V1 cells with small receptive fields will respond to the edge itself as well as to the grating. When edge and grating are orthogonal, the widest range of orientation-selective V1 cells is driven. However, the effect is present also for sinusoidal gratings and soft apertures, where contrast fades gradually, e.g. according to a raised half-cosine.

An elegant new study by Roth, Heeger, and Merriam (pp2018) now follows up on the idea of Carlson (2014) with fMRI at 3T and 7T. Roth et al. refer to the interaction between the edge and the content of the aperture as “vignetting” and used apertures composed of either multiple annuli or multiple radial rays. These finer-grained apertures spread the vignetting effect all throughout the stimulated portion of the visual field and so are well suited to demonstrate the effect on fMRI patterns.

Roth et al. present simulations (Figure 1), following Carlson (2014) and assuming that every voxel uniformly samples all orientations. They confirm Carlson’s account and show that the grating stimuli the group used earlier in Freeman et al. (2011) are expected to produce the stronger response to radial parts of the grating, where the aperture edge is orthogonal to the grating — even without any over-representation of radial orientations by the neurons.

Freeman et al. (2011) used a relatively narrow annulus (inner edge: 4.5º, outer edge: 9.5º eccentricity from fixation), where no part of the grating is far from the edge. This causes the vignetting effect to create the appearance of a radial bias that is strongest at the edges but present even in the central part of the annular aperture (Figure 1, bottom right). Roth et al.’s findings suggest that the group’s earlier result might reflect vignetting, rather than (or in addition to) a radial bias of the V1 neurons.

Screen Shot 05-09-18 at 10.53 PM
Figure 1: Vignetting explains findings of Freeman et al. (2011). Top: Voxel orientation preferences and pRF locations. Each element represents a voxel, its position represents the visual-field location of the voxel’s population receptive field (pRF), the orientation of the line segment represents the voxel’s preferred orientation. The size and color of each element reflects the degree to which the voxel showed a reliable orientation-dependent response (coherence). The pattern suggests that many voxels prefer radial orientations, i.e. those pointing at fixation. Bottom: Roth et al. (pp2018), following Carlson (2014), applied a Gabor model to the stimuli of Freeman et al. (2011). They then simulated voxels pooling orientation-selective responses without any bias in favor of particular orientations. The simulation shows that the apparent radial bias arises as an artefact of the edge-effects described by Carlson (2014), termed “vignetting” by Roth et al. (pp2018). Dashed lines show the edges of the stimulus.


Roth et al. use simulations also to show that their new stimuli, in which the aperture consists of multiple annuli or multiple radial rays, predict coarse-scale patterns across V1. They then demonstrate in single subjects measured with fMRI at 3T and 7T that V1 responds with the globally modulated patterns predicted by the account of Carlson (2014).

The study is beautifully designed and expertly executed. Results compellingly demonstrate that, as proposed by Carlson (2014), vignetting can account for the coarse-scale biases reported in Freeman et al. (2011). The paper also contains a careful discussion that places the phenomenon in a broader context. Vignetting describes a family of effects related to aperture edges and their interaction with the contents of the aperture. The interaction could be as simple as the aperture edge adding edge energy of a different orientation and thus changing orientation-selective response. It could also involve extra-receptive-field effects such as non-isotropic surround suppression.

The study leaves me with two questions:

  • Is the radial orientation-preference map in V1, as described in Freeman et al. (2011), entirely an artefact of vignetting (or is there still also an over-representation of radial orientations in the neuronal population)?
  • Does vignetting also explain fMRI orientation signals in studies that use larger oriented gratings, where much of the grating is further from the edge of the aperture, as in Kamitani & Tong (2005)?

The original study by Kamitani and Tong (2005) used a wider annular aperture reaching further into the central region, where receptive fields are smaller (inner edge: 1.5°, outer edge: 10° eccentricity from fixation). The interior parts of the stimulus may therefore not be affected by vignetting. Importantly, Wardle, Ritchie, Seymour, and Carlson (2017) already investigated this issue and their results suggest that vignetting is not necessary for orientation decoding.

It would be useful to analyze the stimuli used by Kamitani & Tong (2005) with a Gabor model (with reasonable choices for the filter sizes). As a second step, it would be good to reanalyze the data from Kamitani & Tong (2005), or from a similar design. The analysis should focus on small contiguous ROIs in V1 of the left and right hemisphere that represent regions of the visual field far from the edge of the aperture.

Going forward, perhaps we can pursue the issue in the spirit of open science. We would acquire fMRI data with maximally large gratings, so that regions unaffected by vignetting can be analyzed (Figure 2). The experiments should include localizers for the aperture margins (transparent blue) and for ROIs perched on the horizontal meridian far from the aperture edges (transparent red). The minimal experiment would contain two grating orientations (45º and -45º as shown at the bottom), each presented with many different phases. Note that, for the ROIs shown in Figure 2, these two orientations mimimize undesired voxel biases due to radial and vertical orientation preferences (both gratings have equal angle to the radial orientation and equal angle to the vertical orientation). Note also that these two orientations have equal angle to the aperture edge, thus also minimizing any residual long-range vignetting effect that acts across the safety margin.

The analysis of the ROIs should follow Alink et al. (2017): In each ROI (left hemisphere, right hemisphere), we use a training set of fMRI runs to define two sets of voxels: 45º-preferring and -45º-preferring voxels. We then use the test set of fMRI runs to check, independently for the two voxel sets, whether the preferences replicate. We could implement a sensitive test along these lines by training and testing a linear decoder on just the 45º-preferring voxels, and then another linear decoder on just the -45º-preferring voxels. If both of these decoders have significant accuracy on the test set, we have established that voxels of opposite selectivity intermingle within the same small ROI, indicating fine-grained pattern information.

Figure 2: Simple stimuli for benchmarking fMRI acquisition schemes (3T vs 7T, resolutions, sequences) and assessing the grain of fMRI pattern information. Top: Gratings should be large enough to include a safety margin that minimizes vignetting effects. Studies should include localizers for the V1 representations of the regions shown in red, representing regions on the left and right that are perched on the horizontal meridian and far from the edges of the aperture. For these ROIs, gratings of orientations 45º and -45º (bottom) are (1) balanced about the radial orientation (minimizing effects of neuronal overrepresentation of radial orientations), (2) balanced about the vertical orientation (minimizing effects of neuronal overrepresentation of vertical orientations), and (3) balanced about the orientation of the edge (minimizing any residual long-range vignetting effects).

A more comprehensive experiment would contain perhaps 8 or 16 equally spaced orientations and a range of spatial frequencies balanced about the spatial frequency that maximally drives neurons at the eccentricity of the critical ROIs (Henriksson et al. 2008).

More generally, a standardized experiment along these lines would constitute an excellent benchmark for comparing fMRI acquisition schemes in terms of the information they yield about neuronal response patterns. Such a benchmark would lend itself to comparing different spatial resolutions (0.5 mm, 1 mm, 2 mm, 3 mm), different fMRI sequences, and different field strengths (3T, 7T) across different sites and scanner models. The tradeoffs involved (notably between functional contrast to noise and partial volume sampling) are difficult to estimate without directly testing each fMRI acquisition scheme for the information it yields (Formisano & Kriegeskorte 2012). A standard pattern-information benchmark for fMRI could therefore be really useful, especially if pursued as an open-science project (shared stimuli and presentation protocol, shared fMRI data, contributor coauthorships on the first three papers using someone’s openly shared components).

Glad we sorted this out. Who’s up for collaborating?
Time to go to bed.


  • Well-motivated and elegant experimental design and analysis
  • 3T and 7T fMRI data from a total of 14 subjects
  • Compelling results demonstrating that vignetting can cause coarse-scale patterns that enable orientation decoding


  • The paper claims to introduce a novel idea that requires reinterpretation of a large literature. The claim of novelty is unjustified. Vignetting was discovered by Carlson et al. (2014) and in Wardle et al. (2017), Carlson’s group showed that it may be one, but not the only contributing factor enabling orientation decoding. Carlson et al. deserve clearer credit throughout.
  • The experiments show that vignetting compromised the stimuli of Freeman et al. (2011), but they don’t address whether the claim by Freeman et al. of an over-representation of radial orientations in the neuronal population holds regardless.
  • The paper doesn’t attempt to address whether decoding is still possible in the absence of vignetting effects, i.e. far from the aperture boundary.

Particular comments and suggestions

While the experiments and analyses are excellent and the paper well written, the current version is compromised by some exaggerated claims, suggesting greater novelty and consequence than is appropriate. This should be corrected.


“Here, we show that a large body of research that purported to measure orientation tuning may have in fact been inadvertently measuring sensitivity to second-order changes in luminance, a phenomenon we term ‘vignetting’.” (Abstract)

“Our results demonstrate that stimulus vignetting can wholly determine the orientation selectivity of responses in visual cortex measured at a macroscopic scale, and suggest a reinterpretation of a well-established literature on orientation processing in visual cortex.” (Abstract)

“Our results provide a framework for reinterpreting a wide-range
of findings in the visual system.” (Introduction)

Too strong of a claim of novelty. The effect beautifully termed “vignetting” here was discovered by Carlson (2014), and that study deserves the credit for triggering a reevaluation of the literature, which began four years ago. The present study does place vignetting in a broader context, discussing a variety of mechanisms by which aperture edges might influence responses, but the basic idea, including that the key factor is the interaction between the edge and the grating orientation and that the edge need not be hard, are all introduced in Carlson (2014). The present study very elegantly demonstrates the phenomenon with fMRI, but the effect has also previously been studied with fMRI by Wardle et al. (2017), so the fMRI component doesn’t justify this claim, either. Finally, while results compellingly show that vignetting was a strong contributor in Freeman et al. (2011), they don’t show that it is the only contributing factor for orientation decoding. In particular, Wardle et al. (2017) suggests that vignetting in fact is not necessary for orientation decoding.


“We and others, using fMRI, discovered a coarse-scale orientation bias in human V1; each voxel exhibits an orientation preference that depends on the region of space that it represents (Furmanski and Engel, 2000; Sasaki et al., 2006; Mannion et al., 2010; Freeman et al., 2011; Freeman et al., 2013; Larsson et al., 2017). We observed a radial bias in the peripheral representation of V1: voxels that responded to peripheral locations near the vertical meridian tended to respond most strongly to vertical orientations; voxels along the peripheral horizontal meridian responded most strongly to horizontal orientations; likewise for oblique orientations. This phenomenon had gone mostly unnoticed previously. We discovered this striking phenomenon with fMRI because fMRI covers the entire retinotopic map in visual cortex, making it an ideal method for characterizing such coarse-scale representations.” (Introduction)

A bit too much chest thumping. The radial-bias phenomenon was discovered by Sasaki et al. (2006). Moreover, the present study negates the interpretation in Freeman et al. (2011). Freeman et al. (2011) interpreted their results as indicating an over-representation of radial orientations in cortical neurons. According to the present study, the results were in fact an artifact of  vignetting and whether neuronal biases played any role is questionable. Freeman et al. used a narrower annulus than other studies (e.g. Kamitani & Tong, 2005), so may have been more susceptible to the vignetting artifact. The authors suggest that a large literature be reinterpreted, but apparently not their own study for which they specifically and compellingly show how vignetting probably affected it.


“A leading conjecture is that the orientation preferences in fMRI measurements arise primarily from random spatial irregularities in the fine-scale columnar architecture (Boynton, 2005; Haynes and Rees, 2005; Kamitani and Tong, 2005). […] On the other hand, we have argued that the coarse-scale orientation bias is the predominant orientation-selective signal measured with fMRI, and that multivariate decoding analysis methods are successful because of it (Freeman et al., 2011; Freeman et al., 2013). This conjecture remains controversial because the notion that fMRI is sensitive to fine-scale neural activity is highly attractive, even though it has been proven difficult to validate empirically (Alink et al., 2013; Pratte et al., 2016; Alink et al., 2017).” (Introduction)

This passage is a bit biased. First, the present results question the interpretation of Freeman et al. (2011). While the authors’ new interpretation (following Carlson, 2014) also suggests a coarse-scale contribution, it fundamentally changes the account. Moreover, the conjecture that coarse-scale effects play a role is not controversial. What is controversial is the claim that only coarse-scale effects contribute to fMRI orientation decoding. This extreme view is controversial not because it is attractive to think that fMRI can exploit fine-grained pattern information, but because the cited studies (Alink et al. 2013, Pratte et al. 2016, Alink et al. 2017, and additional studies, including Shmuel et al. 2010 and Swisher et al. 2010) present evidence in favor of a contribution from fine-grained patterns. The way the three studies are cited would suggest to an uninformed reader that they provide evidence against a contribution from fine-grained patterns. More evenhanded language is in order here.


“the model we use is highly simplified; for example, it does not take into account changes in spatial frequency tuning at greater eccentricities. Yet, despite the multiple sources of noise and the simplified assumptions of the model, the correspondence between the model’s prediction and the empirical measurements are highly statistically significant. From this, we conclude that stimulus vignetting is a primary source of the course[sic] scale bias.”

This argument is not compelling. A terrible model may explain a portion of the explainable variance that is minuscule, yet highly statistically significant. In the absence of inferential comparisons among multiple models and model checking (or a noise ceiling), better to avoid such claims.


“One study (Alink et al., 2017) used inner and outer circular annuli, but added additional angular edges, the result of which should be a combination of radial and tangential biases. Indeed, this study reported that voxels had a mixed pattern of selectivity, with a considerable number of voxels reliably preferring tangential gratings, and other voxels reliably favoring radial orientations.” (Discussion)

It’s true that the additional edges between the patches (though subtle) complicate the interpretation of the results of Alink et al. (2017). It would be good to check the strength of the effect by simulation. Happy to share the stimuli if someone wanted to look into this.


Minor points

Figure 4A, legend: Top and bottom panels mislabeled as showing angular and radial modulator results, respectively.

course -> coarse

complimentary -> complementary


The four pillars of open science

An open review of Gorgolewski & Poldrack (PP2016)

the 4 pillars of open science.png

The four pillars of open science are open data, open code, open papers (open access), and open reviews (open evaluation). A practical guide to the first three of these is provided by Gorgolewski & Poldrack (PP2016). In this open review, I suggest a major revision in which the authors add treatment of the essential fourth pillar: open review. Image: The Porch of the Caryatids (Porch of the Maidens) of the ancient Greek temple Erechtheion on the north side of the Acropolis of Athens.


Open science is a major buzz word. Is all the talk about it just hype? Or is there a substantial vision that has a chance of becoming a reality? Many of us feel that science can be made more efficient, more reliable, and more creative through a more open flow of information within the scientific community and beyond. The internet provides the technological basis for implementing open science. However, making real progress with this positive vision requires us to reinvent much of our culture and technology. We should not expect this to be easy or quick. It might take a decade or two. However, the arguments for openness are compelling and open science will prevail eventually.

The major barriers to progress are not technological, but psychological, cultural, and political: individual habits, institutional inertia, unhealthy incentives, and vested interests. The biggest challenge is the fact that the present way of doing science does work (albeit suboptimally) and our vision for open science has not merely not yet been implemented, but has yet to be fully conceived. We will need to find ways to gradually evolve our individual workflows and our scientific culture.

Gorgolewski & Poldrack (PP2016) offer a brief practical guide to open science for researchers in brain imaging. I was expecting a commentary reiterating the arguments for open science most of us have heard before. However, the paper instead makes good on its promise to provide a practical guide for brain imaging and it contains many pointers that I will share with my lab and likely refer to in the future.

The paper discusses open data, open code, and open publications – describing tools and standards that can help make science more transparent and efficient. My main criticism is that it leaves out what I think of as a fourth essential pillar of open science: open peer review. Below I first summarise some of the main points and pointers to resources that I took from the paper. Along the way, I add some further points overlooked in the paper that I feel deserve consideration. In the final section, I address the fourth pillar: open review. In the spirit of a practical guide, I suggest what each of us can easily do now to help open up the review process.


1 Open data

  • Open-data papers more cited, more correct: If data for a paper are published, the community can reanalyse the data to confirm results and to address additional questions. Papers with open data are cited more (Piwowar et al. 2007, Piwowar & Vision 2013) and tend to make more correct use of statistics (Wicherts et al. 2011).
  • Participant consent: Deidentified data can be freely shared without consent from the participants in the US. However, rules differ in other countries. Ideally, participants should consent to their data being shared. Template text for consent forms is offered by the authors.
  • Data description: The Brain Imaging Data Structure (BIDS) (Gorgolewski et al. 2015) provides a standard (evolved from the authors’ OpenfMRI project; Poldrack et al. 2013) for file naming and folder organisation, using file formats such as NifTI, TSV and JSON.
  • Field-specific brain-imaging data repositories: Two repositories accept brain imaging data from any researcher: FCP/INDI (for resting state fMRI only) and OpenfMRI (for any datasets that includes MRI data).
  • Field-general repositories: Field-specific repositories like those mentioned help standardise sharing for particular types of data. If the formats offered are not appropriate for the data to be shared, field-general repositories, including FigShare, Dryad, or DataVerse can be used.
  • Data papers: A data paper is a paper that focusses on the description of a particular data set that is publicly accessible. This helps create incentives for ambitious data acquisitions and to enable researchers to specialise in data acquisition. Journals publishing data papers include: Scientific Data, Gigascience, Data in Brief, F1000Research, Neuroinformatics, and Frontiers in Neuroscience.
  • Processed-data sharing: It can be useful to share intermediate or final results of data analysis. With the initial (and often somewhat more standardised) steps of data processing out of the way, processed data are often much smaller in volume and more immediately amenable to further analyses by others. Statistical brain-imaging maps can be shared via the authors’ website.


2 Open code

  • Code sharing for transparency and reuse: Data-analysis details are complex in brain imaging, often specific to a particular study, and seldom fully defined in the methods section. Sharing code is the only realistic way of fully defining how the data have been analysed and enabling others to check the correctness of the code and effects of adjustments. In addition, the code can be used as a starting point for the development of further analyses.
  • Your code is good enough to share: A barrier to sharing is the perception among authors that their code might not be good enough. It might be incompletely documented, suboptimal, or even contain errors. Until the field finds ways to incentivise greater investment in code development and documentation for sharing, it is important to lower the barriers to sharing. Sharing imperfect code is preferable to not sharing code (Barnes 2010).
  • Sharing does not imply provision of user support: Sharing one’s code does not imply that one will be available to provide support to users. Websites like org can help users ask and answer questions independently (or with only occasional involvement) of the authors.
  • Version Control System (VCS) essential to code sharing: VCS software enables maintenance of complex code bases with multiple programmers and versions, including the ability to merge independent developments, revert to previous versions when a change causes errors, and to share code among collaborators or publicly. An excellent, freely accessible, widely used, web-based VCS platform is com, introduced in Blischak et al. (2016).
  • Literate programming combines code and results and text narrative: Scripted automatic analyses have the advantage of automaticity and reproducibility (Cusack et al. 2014), compared to point-and-click analysis in an application with a graphical user interface. However, the latter enables more interactive interrogation of the data. Literate programming (Knuth 1992) attempts to make coding more interactive and provides a full and integrated record of the code, results, and text explanations. This provides a fully computationally transparent presentation of results, makes the code accessible to oneself later in time, and to collaborators and third parties, with whom literate programs can be shared (e.g. via GitHub). Software supporting this includes: Jupyter (for R, Python and Julia), R Markdown (for R) and matlabweb (for MATLAB).


3 Open papers

  • Open notebook science: Open science is about enhancing the bandwidth and reducing the latency in our communication network. This means sharing more and at earlier stages, not only our data and code, but ultimately also our day-to-day incremental progress. This is called open notebook science and has been explored, by Cameron Neylon and Michael Nielson among others. Gorgolewski & Poldrack don’t comment on this beautiful vision for an entirely different workflow and culture at all. Perhaps open notebook science is too far in the future? However, some are already practicing it. Surely, we should start exploring it in theory and considering what aspects of open notebook science we can integrate into our workflow. It would be great to have some pointers to practices and tools that help us move in this direction.
  • The scientific paper remains a critical component of scientific communication: Data and code sharing are essential, but will not replace communication through permanently citable scientific papers that link (increasingly accessible) data through analyses to novel insights and relate these insights to the literature.
  • Papers should simultaneously achieve clarity and transparency: The conceptual clarity of the argument leading to an insight is often at a tension with the transparency of all the methodological details. Ideally, a paper will achieve both clarity and transparency, providing multiple levels of description: a main narrative that abstracts from the details, more detailed descriptions in the methods section, additional detail in the supplementary information, and full detail in the links to the open data and code, which together enable exact reproduction of the results in the figures. This is an ideal to aspire to. I wonder if any paper in our field has fully achieved it. If there is one, it should surely be cited.
  • Open access: Papers need to be openly accessible, so their insights can have the greatest positive impact on science and society. This is really a no brainer. The internet has greatly lowered the cost of publication, but the publishing industry has found ways to charge higher prices through a combination of paywalls and unreasonable open-access charges. I would add that every journal contains unique content, so the publishing industry runs hundreds of thousands of little monopolies – safe from competition. Many funding bodies require that studies they funded be published with open access. We need political initiatives that simply require all publicly funded research to be publicly accessible. In addition, we need publicly funded publication platforms that provide cost-effective alternatives to private publishing companies for editorial boards that run journals. Many journals are currently run by scientists whose salaries are funded by academic institutions and the public, but whose editorial work contributes to the profits of private publishers. In historical retrospect, future generations will marvel at the genius of an industry that managed for decades to employ a community without payment, take the fruits of their labour, and sell them back to that very community at exorbitant prices – or perhaps they will just note the idiocy of that community for playing along with this racket.
  • Preprint servers provide open access for free: Preprint servers like bioRxiv and arXiv host papers before and after peer review. Publishing each paper on a preprint server ensures immediate and permanent open access.
  • Preprints have digital object identifiers (DOIs) and are citable: Unlike blog posts and other more fleeting forms of publication, preprints can thus be cited with assurance of permanent accessibility. In my lab, we cite preprints we believe to be of high quality even before peer review.
  • Preprint posting enables community feedback and can help establish precedence: If a paper is accessible before it is finalised the community can respond to it and help catch errors and improve the final version. In addition, it can help the authors establish the precedence of their work. I would add that this potential advantage will be weighed against the risk of getting scooped by a competitor who benefits from the preprint and is first to publish a peer-reviewed journal version. Incentives are shifting and will encourage earlier and earlier posting. In my lab, we typically post at the time of initial submission. At this point getting scooped is unlikely, and the benefits of getting earlier feedback, catching errors, and bringing the work to the attention of the community outweighs any risks of early posting.
  • Almost all journals support the posting of preprints: Although this is not widely known in the brain imaging and neuroscience communities, almost all major journals (including Nature, Science, Nature Neuroscience and most others) have preprint policies supportive of posting preprints. Gorgolewski & Poldrack note that they “are not aware of any neuroscience journals that do not allow authors to deposit preprints before submission, although some journals such as Neuron and Current Biology consider each submission independently and thus one should contact the editor prior to submission.” I would add that this reflects the fact that preprints are also advantageous to journals: They help catch errors and get the reception process and citation of the paper going earlier, boosting citations in the two-year window that matters for a journal’s impact factor.


4 Open reviews

The fourth pillar of open science is the open evaluation (OE, i.e. open peer review and rating) of scientific papers. This pillar is entirely overlooked in the present version of the Gorgolewski & Poldrack’s commentary. However, peer review is an essential component of communication in science. Peer review is the process by which we prioritise the literature, guiding each field’s attention, and steering scientific progress. Like other components of science, peer review is currently compromised by a lack of transparency, by inefficiency of information flow, and by unhealthy incentives. The movement for opening the peer review process is growing.

In traditional peer review, we judge anonymously, making inherently subjective decisions that decide about the publication of our competitors’ work, under a cloak of secrecy and without ever having to answer for our judgments. It is easy to see that this does not provide ideal incentives for objectivity and constructive criticism. We’ve inherited secret peer review from the pre-internet age (when perhaps it made sense). Now we need to overcome this dysfunctional system. However, we’ve grown used to it and may be somewhat comfortable with it.

Transparent review means (1) that reviews are public communications and (2) that many of them are signed by their authors. Anonymous reviewing must remain an option, to enable scientists to avoid social consequences of negative judgments in certain scenarios. However, if our judgment is sound and constructively communicated, we should be able to stand by it. Just like in other domains, transparency is the antidote to corruption. Self-serving arguments won’t fly in open reviewing, and even less so when the review is signed. Signing adds weight to a review. The reviewer’s reputation is on the line, creating a strong incentive to be objective, to avoid any impression of self-serving judgment, and to attempt to be on the right side of history in one’s judgment of another scientist’s work. Signing also enables the reviewer to take credit for the hard work of reviewing.

The arguments for OE and a synopsis of 18 visions for how OE might be implemented are given in Kriegeskorte, Walther & Deca (2012). As for other components of open science, the primary obstacles to more open practices are not technological, but psychological, cultural, and political. Important journals like eLife and those of the PLoS family are experimenting with steps toward opening the review process. New journals including, the Winnower, ScienceOpen, and F1000 Research already rely on postpublication peer review.

We don’t have to wait for journals to lead us. We have all the tools to reinvent the culture of peer review. The question is whether we can handle the challenges this poses. Here, in the spirit of Gorgolewki & Poldrack’s practical guide, are some ways that we can make progress toward OE now by doing things a little differently.

  • Sign peer reviews you author: Signing our reviews is a major step out of the dark ages of peer review. It’s easier said than done. How can we be as critical as we sometimes have to be and stand by our judgment? We can focus first on the strengths of a paper, then communicate all our critical arguments in a constructive manner. Some people feel that we must sign either all or none of our reviews. I think that position is unwise. It discourages beginning to sign and thus de facto cements the status quo. In addition, there are cases where the option to remain anonymous is needed, and as long as this option exists we cannot enforce signing anyway. What we can do is take anonymous comments with a grain of salt and give greater credence to signed reviews. It is better to sign sometimes than never. When I started to sign my reviews, I initially reserved the right to anonymity for myself. After all this was a unilateral act of openness; most of my peers do not sign their reviews. However, after a while, I decided to sign all of my reviews, including negative ones.
  • Openly review papers that have preprints: When we read important papers as preprints, let’s consider reviewing them openly. This can simultaneously serve our own and our collective thought process: an open notebook distilling the meaning of a paper, why its claims might or might not be reliable, how it relates to the literature, and what future steps it suggests. I use a blog. Alternatively or additionally, we can use PubMed Commons or PubPeer.
  • Make the reviews you write for journals open: When we are invited to do a review, we can check if the paper has been posted as a preprint. If not, we can contact the authors, asking them to consider posting. At the time of initial submission, the benefits tend to outweigh the risks of posting, so many authors will be open to this. Preprint posting is essential to open review. If a preprint is available, we can openly review it immediately and make the same review available to the journal to contribute to their decision process.
  • Reinvent peer review: What is an open review? For example, what is this thing you’re reading? A blog post? A peer review? Open notes on the essential points I would like to remember from the paper with my own ideas interwoven? All of the above. Ideally, an open review helps the reviewer, the authors, and the community think – by explaining the meaning of a paper in the context of the literature, judging the reliability of its claims, and suggesting future improvements. As we begin to review openly, we are reinventing peer review and the evaluation of scientific papers.
  • Invent peer rating: Eventually we will need quantitative measures evaluating papers. These should not be based on buzz and usage statistics, but reflect the careful judgement of peers who are experts in the field, have considered the paper in detail, and ideally stand by their judgment. Quantitative judgments can be captured in ratings. Multidimensional peer ratings can be used to build a plurality of paper evaluation functions (Kriegeskorte 2012) that prioritise the literature from different perspectives. We need to invent suitable rating systems. For primary research papers, I use single-digit ratings on multiple scales including reliability, importance, and novelty, using capital letters to indicate the scale in the following format: [R7I5].


Errors are normal

As we open our science and share more of it with the community, we run the risk of revealing more of our errors. From an idealistic perspective that’s a good thing, enabling us learn more efficiently as individuals and as a community. However, in the current game of high-impact biomedical science there is an implicit pretense that major errors are unlikely. This is the reason why, in the rare case that a major error is revealed despite our lack of transparent practices, the current culture requires that everyone act surprised and the author be humiliated. Open science will teach us to drop these pretenses. We need to learn to own our mistakes (Marder 2015) and to be protective of others when errors are revealed. Opening science is an exciting creative challenge at many levels. It’s about reinventing our culture to optimise our collective cognitive process. What could be more important or glamorous?


Additional suggestions for improvements in revision

  • A major relevant development regarding open science in the brain imaging community is the OHBM’s Committee on Best Practices in Data Analysis and Sharing (COBIDAS), of which author Russ Poldrack and I are members. COBIDAS is attempting to define recommended practices for the neuroimaging community and has begun a broad dialogue with the community of researchers (see weblink above). It would be good to explain how COBIDAS fits in with the other developments.
  • About a third of the cited papers are by the authors. This illustrates their substantial contribution and expertise in this field. I found all these papers worthy of citation in this context. However, I wonder if other groups that have made important contributions to this field should be more broadly cited. I haven’t followed this literature closely enough to give specific suggestions, but perhaps it’s worth considering whether references should be added to important work by others.
  • As for the papers, the authors are directly involved in most of the cited web resources OpenfMRI, NeuroVault, This is absolutely wonderful, and it might just be that there is not much else out there. Perhaps readers of this open review can leave pointers in the comments in case they are aware of other relevant resources. I would share these with the authors, so they can consider whether to include them in revision.
  • Can the practical pointers be distilled into a table or figure that summarises the essentials? This would be a useful thing to print out and post next to our screens.
  • “more than fair” -> “only fair”



I have the following relationships with the authors.

relationship number of authors
acquainted 2
collaborated on committee 1
collaborated on scientific project 0



Barnes N (2010) Publish your computer code: it is good enough. Nature. 467: 753. doi: 10.1038/467753a

Blischak JD, Davenport ER, Wilson G. (2016) A Quick Introduction to Version Control with Git and GitHub. PLoS Comput Biol. 12: e1004668. doi: 10.1371/journal.pcbi.1004668

Cusack R, Vicente-Grabovetsky A, Mitchell DJ, Wild CJ, Auer T, Linke AC, et al. (2014) Automatic analysis (aa): efficient neuroimaging workflows and parallel processing using Matlab and XML. Front Neuroinform. 2014;8: 90. doi: 10.3389/fninf.2014.00090

Gorgolewski KJ, Auer T, Calhoun VD, Cameron Craddock R, Das S, Duff EP, et al. (2015) The Brain Imaging Data Structure: a standard for organizing and describing outputs of neuroimaging experiments [Internet]. bioRxiv. 2015. p. 034561. doi: 10.1101/034561

Gorgolewski KJ, Varoquaux G, Rivera G, Schwarz Y, Ghosh SS, Maumet C, et al. (2015) a webbased repository for collecting and sharing unthresholded statistical maps of the human brain. Front Neuroinform. Frontiers. 9. doi: 10.3389/fninf.2015.00008

Knuth DE (1992) Literate programming. CSLI Lecture Notes, Stanford, CA: Center for the Study of Language and Information (CSLI).

Kriegeskorte N, Walther A, Deca D (2012) An emerging consensus for open evaluation: 18 visions for the future of scientific publishing Front. Comput. Neurosci

Kriegeskorte N (2012) Open evaluation: a vision for entirely transparent post-publication peer review and rating for science. Front. Comput. Neurosci., 17

Marder E (2015) Living Science: Owning your mistakes DOI: eLife 2015;4:e11628

Piwowar HA, Day RS, Fridsma DB (2007) Sharing detailed research data is associated with increased citation rate. PLoS One. 2007;2: e308. doi: 10.1371/journal.pone.0000308

Piwowar HA, Vision TJ (2013) Data reuse and the open data citation advantage. PeerJ. 1: e175. doi: 10.7717/peerj.175

Poldrack RA, Barch DM, Mitchell JP, Wager TD, Wagner AD, Devlin JT, et al. (2013) Toward open sharing of taskbased fMRI data: the OpenfMRI project. Front Neuroinform. 2013;7: 1–12. doi: 10.3389/fninf.2013.00012

Wicherts JM, Bakker M, Molenaar D (2011) Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. Tractenberg RE, editor. PLoS One. 6: e26828. doi: 10.1371/journal.pone.0026828