Group ICA and Dual Regression Practical

Independent component analysis at the group level (Group ICA) is used to identify whole brain resting state networks (RSNs) that are common across the group.

Dual regression is a tool that we can use as part of a group-level resting state analysis to identify the subject-specific contributions to the group level ICA. The output of dual regression is a set of subject-specific spatial maps and time courses for each group level component (spatial map) that can be then compared across subjects/groups.

Running Group ICA: Setting up and running temporal concatenation group ICA.
Low versus high dimensional group ICA: Looking at how the ICA dimensionality (number of components) affects the results.
Using dual regression to investigate group differences: Estimating group level ICs, and comparing ICs across groups.

Before running Group ICA: recap

To recap, here is what we covered in the last practical to prepare for the Group ICA:

Brain-extract T1 and prepare fieldmaps in preparation for the preprocessing and single-subject ICA

Input: whole brain T1, whole brain mag fieldmap, phase fieldmap
Output: brain-extracted T1, brain-extracted mag fieldmap, phase fieldmap in rad/s

Pre-process single-subject data and run single-subject ICA via the MELODIC or FEAT GUI

Input: 4D resting state file, brain-extracted T1, brain-extracted mag fieldmap, phase fieldmap in rad/s
Output: pre-processed rfMRI data (filtered_func_data.nii.gz), single-subject components (filtered_func_data.ica/melodic_IC.nii.gz), transformations/warps for registration to standard space

Clean single-subject pre-processed data (manually or using FIX/AROMA)

Input: pre-processed rfMRI data (filtered_func_data.nii.gz), single-subject components (filtered_func_data.ica/melodic_IC.nii.gz)
Output: pre-processed, cleaned rfMRI data (filtered_func_data_clean.nii.gz)

Input: pre-processed, cleaned rfMRI data (filtered_func_data_clean.nii.gz), transformations/warps for registration to standard space
Output: pre-processed, cleaned rfMRI data in standard space (filtered_func_data_clean_standard.nii.gz)

Create a txt file listing the filepath to the standard space cleaned single subject data

Input: pre-processed, cleaned rfMRI data in standard space (filtered_func_data_clean_standard.nii.gz) for each subject
Output: text file (e.g. inputlist_new.txt)

Running Group ICA

As mentioned in the last practical, if you have applied cleaning at the single subject level (which is recommended!), you currently cannot use the MELODIC GUI to run the group ICA, because the GUI would use the un-cleaned images. Therefore, we need to use the melodic command line to run group ICA.

The ${FSLDIR} part of this command finds the location where FSL is installed on your computer, so you can type ${FSLDIR}/data/standard/MNI152_T1_2mm instead of e.g. /usr/local/fsl/data/standard/MNI152_T1_2mm - this is particularly useful if you are not sure where FSL is installed.

An example group ICA has been run before you using the following command (don't run this - it takes too long for this practical):

melodic -i inputlist_new.txt -o groupICA15 \
  --tr=0.72 --nobet -a concat \
  -m $FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz \
  --report --Oall -d 15

(Note that the input files that are listed in inputlist_new.txt are not present in the course data set - refer to the ICA practical for details on how they can be created).

Type melodic into the terminal and use the command usage to work out what each flag in the command above means. How many group components will be generated in this anaysis? Check your answer here

If you wanted to extract 200 components, which part of the command would you change?

Output of group ICA

Change directory to ~/fsl_course_data/rest/ICA/groupICA15/ to look at the output of the group MELODIC run with the command above.

The key output from the group MELODIC is the melodic_IC.nii.gz. This is a 4D image where each volume corresponds to an ICA component. This melodic_IC.nii.gz can be used as a group level template (spatial basis) to feed into dual regression (more on this later).

Use fslinfo on the groupICA15/melodic_IC.nii.gz image to check the dimensionality matches what you expect.

Low versus high dimensional group ICA

In this section, you will have a look at the melodic_IC components from example group ICAs run with low and high dimensionality.

Change directory into ~/fsl_course_data/rest/ICA/low_high_ICA_dim

We have run group ICA with 25 dimensions and with 50 dimensions for you, using the following commands:

melodic -i inputlist_new.txt -o GroupICA_25_s0_n820_MB8_HCP \
  --tr=0.72 --nobet -a concat \
  -m $FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz \
  --report --Oall -d 25

  melodic -i inputlist_new.txt -o GroupICA_50_s0_n820_MB8_HCP \
    --tr=0.72 --nobet -a concat \
    -m $FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz \
    --report --Oall -d 50

You can compare the group ICA maps calculated with different dimensionalities (25 vs 50) by loading them in FSLeyes:

fsleyes -std \
  melodic_IC_25_s0_n820_MB8_HCP.nii.gz -un -dr 30 100 -n 25 \
  -cm red-yellow -nc blue-lightblue \
  melodic_IC_50_s0_n820_MB8_HCP.nii.gz -un -dr 30 100 -n 50 \
  -cm red-yellow -nc blue-lightblue &

Make sure that the two images are unlinked (that the buttons next to each image's name is toggled off).

Now go to the ‘View’ menu at the top and add a second ortho view, so we can look at the 25 and 50 images side-by-side. Next, make sure that the window on the left is showing the 25 results (by clicking the toggle button next to 50), and make sure that the window on the right is showing the 50 results (by clicking the toggle button next to 25).

In the left view, select the 25 image, and change the volume control to 5. Then in the right view, select the 50 image, and look at volumes 32, 33 and 35. Make sure to navigate to somewhere inside the component (for example around voxel location 45 45 65). As you can see, the original network in the 25 dimensional ICA shown on the left of your screen is split into three separate components at a dimensionality of 50, namely a left and right lateralised and a medial region. Another example is to compare component 2 in the 25-dimensional decomposition to components 5, 9, 11 and 14 in the 50-dimensional decomposition.

When would you run low dimensional group ICA? When would you run high dimensional group ICA? Why?

Using Dual Regression to investigate group differences

A dual regression analysis is used to map the RSNs (i.e. group-level components or an external template or set of ROIs) back into individual subjects data, e.g. in order to examine between-group difference in the RSNs. We will use the group ICA generated from melodic in the first section (Running Group ICA) as spatial basis to input into dual regression.

Dual regression works in three stages, each with its own output:

Stage 1 - using group-ICA spatial maps, subject-specific time courses are estimated from the input standard space cleaned single-subject data (filtered_func_data_clean_standard.nii.gz); this step conducts a multivariate spatial regression
Stage 2 - using the subject-specific time courses output from stage 1, subject-specific spatial maps are estimated from the input standard space cleaned single-subject data (filtered_func_data_clean_standard.nii.gz); this step conducts a multivariate temporal regression
Stage 3 - using the subject-specific spatial maps estimated in stage 2 and the design matrix and contrast files, cross-subject (group) analysis is performed

Before running dual regression

Before we can run dual regression, we need to have:

A list of single subject cleaned, preprocessed, 4D rfMRI data in standard space
A set of RSNs that we want to estimate at the subject level
A group level design to perform the required group comparisons

NB: the input list for dual regression is the same as that used for input to group MELODIC.

In the ~/fsl_course_data/rest/ICA/ directory, we have already created the inputlist_new.txt containing the filepaths to each of our 6 controls and 6 tumour patients filtered_func_data_clean_standard.nii.gz.

We also have our group level components (~/fsl_course_data/rest/ICA/groupICA15/melodic_IC) obtained in the first section (Running Group ICA)

We just need to set up the group-level design to perform group comparisons. In this example, we are using data from 12 subjects - six patients with a tumour and six healthy controls (we are grateful to Natalie Voets for providing the datasets). In this analysis, we want to compare resting-state connectivity between our six patients with a tumour and the six healthy controls using an unpaired t-test.

Use the Glm GUI to set up a group-level design with 4 contrasts to model the mean for controls, the mean for patients, as well as con > pat and pat > con. The input directories containing the cleaned and registered single-subject data are labelled sub-control0* for control subjects and sub-0* for tumour patients (take this naming into consideration when setting up your design).

Click here to check you have set up your design correctly.

Running dual regression

Type dual_regression into the terminal to see details of the command usage. Using the data prepared above (subject files' list, groupICA15/melodic_IC group ICA maps and design files), work out the command we used to run this dual regression. Click here to check the command you have come up with is correct.

     dual_regression groupICA15/melodic_IC 1 \
       design/unpaired_ttest.mat design/unpaired_ttest.con 5000 \
       groupICA15.dr `cat inputlist_new.txt`

In the command above, what does the 1 indicate? What does 5000 refer to?

As the dual regression analysis is too long to run in the practical session, we have aleady ran the dual regression analysis for you. The output is located in ~/fsl_course_data/rest/ICA/groupICA15.dr

Output of dual regression

Move back into the ~/fsl_course_data/rest/ICA directory.

Type ls groupICA15.dr into the terminal to view the output of the dual regression.

What do the dr_stage1_subject[#SUB].txt files contain?

What is the difference between the files named dr_stage2_ic[#IC].nii.gz and those named dr_stage2_subject[#SUB].nii.gz? Use fslinfo to help if you need to.

The dr_stage3_*.nii.gz files are the output of randomise. What do the numbers 0-14 indicate? What do the numbers 1-4 indicate? Which images are the corrected p-value output (actually 1-p, for convenience of display)?

Viewing the results of dual regression

To view the results from the dual regression analysis, type the following into the terminal:

fsleyes -std groupICA15/melodic_IC \
  -un -cm red-yellow -nc blue-lightblue -dr 4 15 \
  groupICA15.dr/dr_stage3_ic0007_tfce_corrp_tstat3.nii.gz \
  -cm green -dr 0.95 1 &

Make sure you are viewing the dr_stage3_* statistical map over the appropriate volume of melodic_IC - set the volume of the melodic_IC image to the same number of the IC shown in the randomise statistical image loaded (e.g dr_stage3_ic?????).

What is the contrast that was tested with tstat 3 (the statistical map that has been loaded into FSLeyes)?

The difference between the two groups is very small (because we only had 12 subjects and therefore not much statistical power). To find the result, go to voxel location [63, 81, 54]. You may want to change the minimum threshold at the top to 0.9 to show the results at a slightly more lenient p-value.

What is the name of the file that I would need to look at if I were interested in contrast 2 for network 12? Check your answer here

When you were looking at the dual regression results, the minimum threshold of the tfce_corrp_tstat3 image that was loaded was set to either 0.95 or 0.9. What do these values mean?

They are z-statistics Incorrect! As the name of the file suggests, you are looking at a corrp image (i.e. the values are p-values that have been corrected for multiple comparisons). The p-values are shown as 1 minus the p-value to help make it easier to visualise. Therefore a threshold of 0.95 means that you are looking at results p<0.05 corrected, and a threshold of 0.9 shows results p<0.1 corrected.
They are p-values, so the image is showing all voxels with a p-value lower than 0.95 Incorrect! These images contain 1 minus the p-value to help make it easier to visualise. Therefore a threshold of 0.95 means that you are looking at results p<0.05 corrected, and a threshold of 0.9 shows results p<0.1 corrected.
The values are 1 minus the p-value, so a threshold of 0.95 means the image is showing all voxels with a p-value lower than 0.05 Correct! To make it easier to visualise, the results are saved as 1 minus the p-value. Note that you are looking at the corrp image, so these p-values have been corrected for multiple comparisons.

Are there any statistical problems that I need to account for if I am interested in the results for more than one network? What about if I'm only interested in one network?

If I am only interested in the results for one network, can I just run dual regression only on that network? Why/why not?

A quick script that is useful for checking the maximum of every 1-p-value image across a set of dual regression stage 3 outputs is below. You can try to run it on the dual regression output we provided you with in groupICA15.dr. If the maximum value in any given image is not above 0.95, you know that nothing survived thresholding:

cd groupICA15.dr
for i in dr_stage3_ic00??_tfce_corrp_tstat?.nii.gz ; do
    echo $i `fslstats $i -R`;
done

What is the fslstats command doing? What do the questionmarks do? What do the backticks (the weird looking quotes around the fslstats command) do? What is $i, and what does it refer to?

(Optional) Running dual regression and randomise separately

The standard dual_regression command automatically runs randomise to perform between-group statistical comparisons for *every* component that is input to dual regression, *when provided with a design*. However, you can also run stage 1 and 2 of dual regression only and separately run randomise after dual regression has finished.

Type dual_regression into the command line to recap the usage and work out how would you change the command you used above to run only stage 1 and stage 2 of dual regression.

here

dual_regression groupICA15/melodic_IC 1 -1 0
       groupICA15.dr `cat inputlist_new.txt`

Can you think of a couple of situations when you might not want to run randomise automatically within dual_regression?

Please remember that the next session will require MATLAB or OCTAVE (a free equivalent of MATLAB). If you don't have MATLAB or OCTAVE installed on your laptop (and in the virtual workstation if you are using one), please make sure you have it installed by the next session. Ask one of the tutors if you have any questions.

The End.