Contents
BACKGROUND THEORY
A standard nonparametric test is exact, in that the false positive rate is exactly equal to the specified α level. Using randomise with a GLM that corresponds to one of the following simple statistical models will result in exact inference:
One sample t-test on difference measures Two sample t-test One-way ANOVA Simple correlation Use of almost any other GLM will result in approximately exact inference. In particular, when the model includes both the effect tested (e.g., difference in FA between two groups) and nuisance variables (e.g., age), exact tests are not generally available. Permutation tests rely on an assumption of exchangeability; with the models above, the null hypothesis implies complete exchangeability of the observations. When there are nuisance effects, however, the null hypothesis no longer assures the exchangeability of the data (e.g. even when the null hypothesis of no FA difference is true, age effects imply that you can't permute the data without altering the structure of the data).
Permutation tests for the General Linear Model
For an arbitrary GLM randomise uses the method of Freeman & Lane (1983). Based on the contrast (or set of contrasts defining an F test), the design matrix is automatically partitioned into tested effects and nuisance (confound) effects. The data are first fit to the nuisance effects alone and nuisance-only residuals are formed. These residuals are permuted, and then the estimated nuisance signal is added back on, creating an (approximate) realization of data under the null hypothesis. This realization is fit to the full model and the desired test statistic is computed as usual. This process is repeated to build a distribution of test statistics equivalent under the null hypothesis specified by the contrast(s). For the simple models above, this method is equivalent to the standard exact tests; otherwise, it accounts for nuisance variation present under the null. Note, that randomise v2.0 and earlier used a method due to Kennedy (1995). While both the Freedman-Lane and Kennedy methods are accurate for large n, for small n the Kennedy method can tend to false inflate significances. For a review of these issues and even more possible methods, see Anderson & Robinson (2001) and Winkler et al (2014) [references below].
For simple models, like the 2-sample t-test, regression with a single covariate, and 1-sample t-test on differences, a permutation test has exact control of the desired false positive rate with very simple assumptions. For more complex cases, like an arbitrary contrast on a linear regression model with 2 or more covariates, "exactness" can't be guaranteed, but exhaustive simulations under various (punishing, small sample sizes) cases has shown Freeman-Lane to be highly accurate in practice (Anderson & Robinson, 2001; Winkler et al. 2014).
Conditional Monte Carlo Permutation Tests
A proper "exact" test arises from evaluating every possible permutation. Often this is not feasible, e.g., a simple correlation with 12 scans has nearly a half a billion possible permutations. Instead, a random sample of possible permutations can be used, creating a Conditional Monte Carlo (CMC) permutation test. On average, the CMC test is exact and will give similar results to carrying out all possible permutations.
If the number of possible permutations is large, one can show that a true, exhaustive P-value of p will produce P-values between p ± 2√(p(1-p)/n) about 95% of the time, where n is the number of CMC permutations. The table below shows confidence limits for p=0.05 for various n. At least 5,000 permutations are required to reduce the uncertainty appreciably, though 10,000 permutations are required to reduce the margin-of-error to below 10% of the nominal alpha.
n |
Confidence limits |
100 |
0.0500 ± 0.0436 |
500 |
0.0500 ± 0.0195 |
1,000 |
0.0500 ± 0.0138 |
5,000 |
0.0500 ± 0.0062 |
10,000 |
0.0500 ± 0.0044 |
50,000 |
0.0500 ± 0.0019 |
In randomise the number of permutations to use is specified with the -n option. If this number is greater than or equal to the number of possible permutations, an exhaustive test is run. If it is less than the number of possible permutations a Conditional Monte Carlo permutation test is performed. The default is 5000, though if time permits, 10000 is recommended.
Counting Permutations
Exchangeabilty under the null hypothesis justifies the permutation of the data. For n scans, there are n! (n factorial, n×(n-1)×(n-2)×...×2) possible ways of shuffling the data. For some designs, though, many of these shuffles are redundant. For example, in a two-sample t-test, permuting two scans within a group will not change the value of the test statistic. The number of possible permutations for different designs are given below.
Model |
Sample Size(s) |
Number of Permutations |
One sample t-test on difference measures |
n |
2^{n} |
Two sample t-test |
n_{1}, n_{2} |
(n_{1}+n_{2})! / ( n_{1}! × n_{2}! ) |
One-way ANOVA |
n_{1},...,n_{k} |
(n_{1}+n_{2}+ ... + n_{k})! / ( n_{1}! × n_{2}! × ... × n_{k}! ) |
Simple correlation |
n |
n! |
Note that the one-sample t-test is an exception. Data are not permuted, but rather their signs are randomly flipped. For all designs except a one-sample t-test, randomise uses a generic algorithm which counts the number of unique possible permutations for each contrast. If X is the design matrix and c is the contrast of interest, then Xc is sub-design matrix of the effect of interest. The number of unique rows in Xc is counted and a one-way ANOVA calculation is used.
REFERENCES
Winkler AM, Ridgway GR, Webster MA, Smith SM, Nichols TE. Permutation inference for the general linear model. NeuroImage, 2014;92:381-397.
Anderson MJ, Robinson J. Permutation Tests for Linear Models. Aust New Zeal J Stat Stat. 2001;43(1):75-88.