Cognitive Neuroscience Society

The Journal of Cognitive Neuroscience

  • Facebook
  • RSS
  • Twitter
MENUMENU
  • Home
  • Annual Meeting
        • General Information
          • What to Expect at CNS 2023
          • CNS 2023 Mobile App
          • CNS 2023 Giveaway
          • CNS 2023 Giveaway Winners
          • Accessibility at CNS
          • General Information
          • Code of Conduct
          • Dates and Deadlines
          • Hotel Reservations
          • Poster Guidelines
          • Poster Printing Discount
          • Annual Meeting Workshop Policy & Application
          • Exhibit with us!
        • Program
          • Thank you to our Partners
          • CNS 2023 Program Booklet
          • Schedule Overview
          • Program-at-a-Glance
          • CNS 30th Anniversary Dance Party
          • Keynote Address
          • George A. Miller Awardee
          • Distinguished Career Contributions Awardee
          • Young Investigator Awardees
          • CNS at 30: Perspectives on the Roots, Present, and Future of Cognitive Neuroscience
          • Invited-Symposium Sessions
          • Symposium Sessions
          • Data Blitz Session Schedule
          • Poster Schedule & Session Information
          • JoCN Travel Fellowship Award
          • GSA/PFA Award Winners
          • Workshops, Socials & Special Events
        • Registration
          • Registration
          • Registration FAQ
          • Registration Policies, Cancellations & Refunds
        • News/Press
          • CNS 2023 Press Room
          • CNS 2022 Blog
          • CNS 2021 Blog
          • CNS 2020 Blog
        • Submissions
          • 2023 Poster Printing Discount
          • Submission Requirements
          • Submit a Poster
          • Submit a Symposium
          • GSA or PFA Application
          • Data Blitz
          • Frequently Asked Submission Questions
        • Archive
          • CNS 2020 Conference Videos
          • CNS 2019 Conference Videos
          • CNS 2018 Conference Videos
          • CNS 2017 Conference Videos
          • CNS 2016 Conference Videos
          • CNS 2015 Conference Videos
          • Previous Meetings Programs & Abstracts
  • About CNS
    • Boards and Committees
    • CNS Statement: Black Lives Matter
  • Membership
    • Information and Benefits
    • Join or Renew Membership
    • Membership FAQs
    • Member Discounts
    • Newsletter
      • Submit an Announcement
      • Current Newsletter
      • Newsletter FAQs
      • Past Newsletters
  • Awards
    • George A. Miller Award
    • Fred Kavli Distinguished Career Contributions Award
    • Young Investigator Award
    • Young Investigator Award Nominations
    • 2023 YIA Nomination Form
    • JoCN Travel Fellowship Award
  • News Center
    • CNS Blog
    • CNS 2023 Press Room
    • CNS 2022 Blog
    • CNS 2021 Blog
    • CNS 2020 Blog
    • CNS 2019 Blog
    • Blog Archives
    • Quick Tips for Getting Started on Twitter
    • Media Contact
  • My CNS
  • Contact Us
post

Journal Club on “Cluster Failure: Why fMRI Inferences for Spatial Extent Have Inflated False-Positive Rates

By David Mehler

In a recent blog post, I summarized some important findings of the recent study “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates” by Anders Eklund and colleagues and also debunked some myths that have been circulating around the paper. Here, I would like to discuss the insights of the paper in more statistical terms and also condense some insights from journal clubs I have held on the paper. My aim is to provide readers who would like to understand the paper’s implication in more detail with a summary of the background, as well as practical recommendations and some take home messages based on the paper’s main results.

fMRI researchers are usually either interested in the activity of certain brain areas (so called region of interests, ROI), or whole brain analyses (and sometimes in both). In an ROI analysis, we only test a hypothesis about a particular brain region or between regions (e.g. “Is the motor cortex more active during arm movements?”). In contrast, in whole brain analyses we test which brain areas show task related brain activity (e.g. “Where in the brain do we find differences in activity during arm movements?”). Different techniques exist (e.g. univariate and multivariate), but most analyses use the “mass univariate approach” in which a test is conducted for every voxel. The insights by Eklund and colleagues are only relevant for univariate whole brain analyses in which one can end up with 100,000! tests or more.

Univariate whole brain analyses thus require a reliable method to correct for multiple testing to avoid that the type I error inflates massively beyond the 5%: by definition we falsely reject the null hypothesis, e.g. in 5% of 100,000 tests and so the errors mount up leading to a large number of false positives. We could simply correct for the number of tests (e.g. with Bonferroni), however the exact number of independent tests is not clear because the activity of a given voxel is not independent of its neighbors (e.g. due to common vascular supply, similar computations performed, and importantly also similar noise).

Also in cognitive neuroscience, we are usually not interested in the activity of single voxels but rather in the activity of brain regions, and we thus seek to find clusters of task related activity. Therefore, instead of correcting on voxel level (which can be too conservative), we can correct for the minimum number of adjacent voxels we’d expect to find not just by pure chance. These methods are called cluster wise inference methods and they are widely used to correct for multiple testing. These routines work out the minimum size a cluster must have so that we still control the type I error rate at 5%. Eklund and colleagues compared in particular the reliability in controlling for type I error rates between two cluster wise inference methods – parametric cluster-extent and non-parametric permutation test.

How do these two approaches differ? The most important difference between these two methods is that parametric approaches assume the distribution of the statistic of interest under the null hypothesis. Depending on the type of data we work with (e.g. the level and shape of noise which can turn out a bit “strange” for fMRI) our assumptions might not be quite right. Non-parametric permutation based procedures, in contrast, are set up so that they find the statistic under the null hypothesis by permuting the labels and reiterating an analysis many thousand times.

Importantly, parametric cluster-extent based methods procedures consist of two steps. First, a cluster-defining threshold (CDT; e.g. CDT p=0.01 or CDT p=0.001) is applied to the statistical map to retain only supra-threshold voxels (software packages usually have a default value set, but also allow users to reset it). Second, based on retained supra-threshold voxels, a cluster-level extent threshold is estimated from the data which is supposed to give the minimum cluster size that is considered significant.

So what did Eklund and colleagues exactly look at? They used resting-state data (from the Connectome project) to compare the reliability of (parametric and nonparametric) multiple testing procedures as implemented in some main software packages for different testing parameters (importantly CDT p=0.01 and CDT p=0.001). The rationale to use resting-state data was to “fake” experimental differences based on data that should not give experimental differences but has real fMRI noise properties. Thus, finding clusters in more than 5% of all analyses (for a given combination of parameters) would indicate that the used multiple testing correction method was problematic and led to higher false positives than we deem acceptable.

In total, they conducted about 3,000,000 group analyses, which is an incredible data volume they were able to handle using a special cluster and an in-house developed software (BROCCOLI) that provides parallel processing on the cluster. For each of the different parameters that they tested, 1,000 group analyses were carried out. The benchmark for reliability was defined as the number of any cluster found in any of these analyses divided by 1,000, i.e. the false positive rate.

The authors made six main observations:

  • As suggested already in another recent paper by a different group, a too high threshold (i.e. CDTs p=0.01) leads to unacceptably high false positive rates (i.e. too high type I error rates). In fact, this was the 70% cited on some science blogs.
  • Also a more conservative threshold (i.e. CDT p=0.001) is not sufficient if the cluster extent is not estimated but an arbitrary ad-hoc cluster extent is chosen instead (as already demonstrated by the salmon study that demonstrated that with the wrong statistics one can even “find” a cluster in a dead fish).
  • Importantly, the same threshold (i.e. CDT p=0.001) in combination with cluster-extent estimation gives much better results, though still slightly biased.
  • Non-parametric permutation tests seemed most reliable and usually controlled type I errors at the expected 5% rate.
  • Voxel wise inference was too conservative (as expected).

These findings were less surprising and rather largely in line with previous work. However, another interesting finding by the authors was that the assumptions of the null distribution for fMRI data are partly wrong, e.g. it does not take the assumed Gaussian shape. This issue has certainly contributed to the error rates that were found in the study. Essentially, the activity of neighboring voxels is even more similar than previously thought, simply because of the way MR physics work. The software packages that were tested have since been corrected, so if you are using one of these packages better make sure you have the latest update. Further, neighboring voxels vary in the degree of similarity depending on the site of the brain (e.g. because of tissue differences and head shape) and false positives indeed clustered more in certain areas. Also, this finding has stimulated some exciting developments.

So given these results, shall we from now on only and exclusively use permutation testing with packages (e.g. SnPM or FSL Randomise/PALM)? This depends on a few aspects and I would argue that depending on the context, cluster wise inference methods are often at least equally good and sometimes even a better option. Here, I list five arguments in support of cluster-based inference methods (when used with an appropriate CDT):

  • One pragmatic aspect can be computation resources/time available because permutation testing requires far more time.
  • The effect size one is interested in and the sample size available play a role because for small effects the sensitivity is often better in parametric tests (using CDT p=0.001 in combination with an estimated cluster-extent), but potentially at the cost of slightly higher false positives. Importantly, the probability of finding false positives (i.e. the type I error rate) is not the same as the percentage of false positives for a given study! For instance, in experiments with large effects and many true positives, multiple testing correction associated with a 70% chance of finding false positives would overall still result in a relatively small percentage of false positives.
  • The slightly higher false positive rates reported by Eklund and colleagues for CDT p=0.001 seem tolerable, one might want to risk/afford to not correct at the nominal 5% (i.e. slightly biased result, but greater sensitivity)
  • This bias is probably less than the results in the paper reported for CDT p=0.001 suggest if they were now replicated with the corrected software packages that do not assume a Gaussian shape of the null-distribution and thus provide more stringent control.
  • Lastly, a compelling recent analysis suggests that parametric cluster-inference with a CDT p=0.001 seem to correct at the nominal 5% when one takes the proportion (i.e. false discovery rate, FDR) rather than any false positive (family-wise error, FWER, as employed by Eklund and colleagues) as a benchmark.

Of interest, a recent white paper by one of the co-authors (Tom Nichols) and other leading scientists in the field discusses good practices to assure better quality control data analysis and handling. As a rather general point, another recent paper has highlighted the importance of effect sizes for fMRI research, which have so far largely been neglected in favor of reporting mainly statistical values.

The study by Eklund and colleagues has clearly shown that certain practices in fMRI research are flawed, while others provide reliable FWER control. The paper has stimulated vivid discussions and new developments and made many fMRI researchers more aware of the assumptions made when conducting fMRI data analysis (for more information, here an OHBM statement). In conclusion, the findings of Eklund and colleagues are important, but should not be overstated and rather be seen as part of a larger self-correcting (learning) process currently happening in the field including better reporting standards for neuroimaging.

—

David Mehler is an MD-PhD candidate in medicine and neuroscience at Cardiff University and University of Münster. He uses neuroimaging techniques (fMRI, EEG) to investigate neurofeedback training in healthy participants and patients with a focus on motor rehabilitation.

Recent Posts

  • Poverty: What’s the Brain Got to Do With It?
  • Unraveling Graceful Human Learning Over Time
  • Looking Forward to Understand Working Memory
  • From the Neurology Clinic to the Lab and Back Again: Addressing Frontal Lobe Syndromes
  • When Philosophical Questions Turn to Neuroscience Experimentation

Blog Archives

Quick Tips for Getting Started on Twitter

Cognitive Neuroscience Society
c/o Center for Mind and Brain
267 Cousteau Place, Davis, CA 95618
916-955-6080: for CNS Membership Questions
805-450-7490: for annual meeting questions about- registration, posters, symposium
916-409-5069: Fax Line
email: meeting@cogneurosociety.org

Recent Posts

  • Poverty: What’s the Brain Got to Do With It?
  • Unraveling Graceful Human Learning Over Time
  • Looking Forward to Understand Working Memory
  • From the Neurology Clinic to the Lab and Back Again: Addressing Frontal Lobe Syndromes
  • When Philosophical Questions Turn to Neuroscience Experimentation

Archives

Blog Archives

Previous Meeting Programs and Abstracts

Past Newsletters

All contents © Cognitive Neuroscience Society 1995-2019

Add to Calendar

Add to Calendar
04/16/2022 11:00 AM
04/16/2022 12:00 PM
America/Los_Angeles
How Prior Knowledge Shapes Encoding of New Memories
Description of the event
Grand Ballroom A
Create an Account

Login Utility