ICA Practical

Melodic is the tool in FSL that decomposes single FMRI data sets into time-courses and spatial maps using Independent Component Analysis (ICA).

Contents:

Low versus high dimensional group ICA
Looking at how the ICA dimensionality (number of components) affects the results.
Single session ICA analysis: Classifying and removing noise components
Manual and automated classification of ICs as signal or noise.
Group analysis: using dual regression to investigate group differences
Estimating group level ICs, and comparing ICs across groups.
Some scripting (optional)
Some useful UNIX tricks to make life easier.

Melodic has a simple GUI (type Melodic or Melodic_gui at a command line), which has been structured to be similar to the FEAT GUI. The GUI will allow you to set all pre-processing and ICA options. It will call all the FSL command line tools for preprocessing and then use the melodic command line tool in order to generate the ICA decomposition.

Note, therefore that the melodic command-line program will not do the preprocessing for you that the GUI-based scripting will do. The GUI will allow you to change some of the most common default options for the preprocessing and the basic options for the final ICA decomposition. To get full control over the latter, use the command line version:

melodic --help

In the next sections you will first have a look at example low and high dimensional group ICA decompositions, as an example of how networks are split at higher dimensionalities. Secondly, you will take a look at using single-subject ICA to classify and remove noise components. Then, you are going to take a look at some example group-analysis results from running a dual regression analysis. The last section is an optional section on scripting (which is very useful if you will be running your own analyses).


Low versus high dimensional group ICA

You can compare the group ICA maps calculated with different dimensionalities (25 vs 50) by loading them in FSLeyes:

cd ~/fsl_course_data/rest/ICA/
fsleyes -std \
  melodic_IC_25_s0_n820_MB8_HCP.nii.gz -un -dr 30 100 -n 25 \
  -cm red-yellow -nc blue-lightblue \
  melodic_IC_50_s0_n820_MB8_HCP.nii.gz -un -dr 30 100 -n 50 \
  -cm red-yellow -nc blue-lightblue &

Now go to the ‘View’ menu at the top and add a second ortho view, so we can look at the 25 and 50 images side-by-side. Next, make sure that the window on the left is showing the 25 results (by clicking the toggle button next to 50), and make sure that the window on the right is showing the 50 results (by clicking the toggle button next to 25).

In the left view, select the 25 image, and change the volume control to 5. Then in the right view, select the 50 image, and look at volumes 32, 33 and 35. Make sure to navigate to somewhere inside the component (for example around voxel location 45 45 65). As you can see, the original network in the 25 dimensional ICA shown on the left of your screen is split into three separate components at a dimensionality of 50, namely a left and right lateralised and a medial region. Another example is to compare component 2 in the 25-dimensional decomposition to components 5, 9, 11 and 14 in the 50-dimensional decomposition.


Single session ICA analysis: Classifying and removing noise components

In this part of the practical you are going to label components as either signal or noise. To do this, please take a look at all three pieces of information (i.e. the spatial map, time course and power spectrum of the time course) for each component.

Figure 1 Figure 2

To get an understanding of the influence of data quality and quantity on the performance of ICA, you are going to take a look at two different examples. We will use FSLeyes in Melodic mode to simultaneously show the spatial map, time course and power spectrum for each component:

Manual labelling of multi-band fMRI

fsleyes --scene melodic -ad Rest_MB6.feat/filtered_func_data.ica &

Make sure melodic_IC is selected and click on the Load labels button in the melodic IC classification window and open the labels.txt file from the ~/fsl_course_data/rest/ICA/Rest_MB6.feat directory (it will show a message saying the the label file does not refer to the melodic directory, which is because we renamed it - click on Apply the labels to the current overlay). You will now see labels for all the components except for ten which are labeled as Unknown (in yellow).

Please have a look at these 10 and label each one as either Unclassified noise or as Signal (note that FSLeyes allows you to include more informative labels if you wish). Remember that the components you are looking at were obtained from a 15-minute run of multiband data, and no smoothing was applied during preprocessing. Once you have classified all of the components, please save your results by overwriting the labels.txt file and close FSLeyes.

Manual labelling of EPI fMRI

Now close FSLeyes and open the following dataset:

fsleyes --scene melodic -ad Rest_EPI.ica/filtered_func_data.ica &

Load the labels.txt file (in ~/fsl_course_data/rest/ICA/Rest_EPI.ica) in the same manner as before, and classify the 10 unknown components. Keep in mind that this is an older dataset that has much fewer timepoints, bigger voxel size and that smoothing has been performed during preprocessing. Again, save the results by overwriting labels.txt and close FSLeyes.

You will now "clean up" your data by removing the components that you classified as noise from your data. This is done using fsl_regfilt, which will regress the time courses of the noise components from the data.

The help information from fsl_regfilt explains what the command does: "Data de-noising by regressing out part of a design matrix using simple OLS regression on 4D images". In our case, what does the design matrix contain?
Incorrect! Remember that this is resting state data, so there are no task regressors!
Correct! The timeseries from the components that are labelled as noise will be regressed out of the dataset.
Incorrect! This is a single-subject ICA analysis, so there are no subject labels at this point.

We first need to get a list of IDs of the noise components:

tail -1 Rest_EPI.ica/labels.txt

Now we can pass this list of numbers to fsl_regfilt (in the command below, replace 1,2,3 with the output of the above command, making sure to remove the square brackets and replace them with quotes ''):

fsl_regfilt -i Rest_EPI.ica/filtered_func_data.nii.gz \
            -d Rest_EPI.ica/filtered_func_data.ica/melodic_mix \
            -o Rest_EPI.ica/filtered_func_data_clean.nii.gz \
            -f 1,2,3 
digraph G { rankdir=LR node [shape=box]; 1 [label="fsl_regfilt"]; 2 [label="input image (-i Rest_EPI.ica/filtered_func_data.nii.gz)"]; 3 [label="ICA timeseries (-d Rest_EPI.ica/filtered_func_data.ica/melodic_mix)"]; 4 [label="list of noise component numbers (-f 1,2,3) change 1,2,3 to correct list"]; 5 [label="output filename (-o Rest_EPI.ica/filtered_func_data_clean.nii.gz)"]; 2 -> 1; 3 -> 1; 4 -> 1; 1 -> 5; }

Automatic IC classificiation

To avoid manually labelling all of the components for every single subject, tools have been developed that aim to automatically identify components representing structured noise in fMRI data. We will take a look at two of these tools:

Now you are going to compare your own classification of signal and noise components to classifications done by FIX and by AROMA. Use the following commands to get a list of the component numbers that were classified as noise by each method, and compare them against your own results (that you passed to fsl_regfilt, above).

This command lists the components that were classified as noise by FIX:

tail -n 1 Rest_EPI.ica/fix4melview_training_thr30.txt

And this command lists the components which were classified as motion-related by AROMA:

cat Rest_EPI.ica/AROMA/classified_motion_ICs.txt

Do the results agree for the ten components that you classified (which were numbers 1, 13, 14, 24, 29, 33, 36, 37, 39 and 42)? Remember that the lists contain all the components that were definitely labeled as noise. This means that the rest of the components were either labeled as Signal, or as Unknown.


Group analysis: using dual regression to investigate group differences

A dual regression analysis is used to map group-ICA results back into individual subjects data, e.g. in order to examine between-group difference in ICA networks. We are using data from 12 subjects including six patients with a tumor and six healthy controls. We are grateful to Natalie Voets for providing the datasets used in this practical.

We already ran the group ICA for you using the following command (please don't run this again). More information about this is in the optional scripting session below. This group ICA will provide us with a fixed set of networks that we want to compare between groups.

melodic -i input_files.txt -o groupICA15 \
  --tr=0.72 --nobet -a concat \
  -m $FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz \
  --report --Oall -d 15
digraph G { rankdir=LR node [shape=box]; 1 [label="text file listing input filenames (-i input_files.txt) type cat input_files.txt to take a look"]; 2 [label="output directory name (-o groupICA15)"]; 3 [label="repetition time (TR) (--tr=0.72)"]; 4 [label="approach, here temporal concatenation (-a concat)"]; 5 [label="generate web report (--report)"]; 6 [label="output everything (-Oall)"]; 7 [label="ICA decomposition dimensionality (-d 15)"]; 8 [label="melodic"]; 9 [label="other options"]; 1 -> 8; 8 -> 2; 3 -> 8; 4 -> 8; 8 -> 5; 8 -> 6; 9 -> 8; }

We have set up a group level design to perform group comparisons. If you want, you can view the design by opening the GLM gui (type Glm or Glm_gui) and loading ~fsl_course_data/rest/ICA/design/unpaired_ttest.fsf). Subsequently we ran the dual regression analysis for you using the following commands (please don't run these again, this would take too long to run in the practical session):

dual_regression groupICA15/melodic_IC 1 \
  design/unpaired_ttest.mat design/unpaired_ttest.con 5000 \
  groupICA15.dr `cat input_files.txt`
digraph G { rankdir=LR node [shape=box]; 1 [label="group-level ICA maps (groupICA15/melodic_IC)"]; 2 [label="variance normalisation (1) i.e. on"]; 3 [label="EVs of group-level design matrix (design/unpaired_ttest.mat)"]; 4 [label="contrasts of group-level design matrix (design/unpaired_ttest.con)"]; 5 [label="number of permutations for randomise (5000)"]; 6 [label="output directory name (groupICA15.dr)"]; 7 [label="text file listing input filenames (`cat input_files.txt`) note use of back quotes"]; 8 [label="dual_regression"]; 1 -> 8; 2 -> 8; 3 -> 8; 4 -> 8; 5 -> 8; 7 -> 8; 8 -> 6; }

Dual regression works in three stages, each with its own output:

The corrected p-value output images from stage 3 (actually 1-p, for convenience of display) are in files groupICA15.dr/dr_stage3_ic00??_tfce_corrp_tstat?.nii.gz, where the ?? means any one of the 15 group-level components (number 00 to 14) and the ? relates to the contrast number. To view the results from the dual regression analysis, run:

fsleyes -std groupICA15/melodic_IC \
  -un -cm red-yellow -nc blue-lightblue -dr 4 15 \
  groupICA15.dr/dr_stage3_ic0007_tfce_corrp_tstat3.nii.gz \
  -cm green -dr 0.95 1 &

Make sure you are viewing them over the appropriate volume of melodic_IC (i.e. set the volume of the melodic_IC image to 7, which is the number of the results image we loaded). The dual regression result is very small (because we only had 12 subjects and therefore not much statistical power), to find it please go to voxel location 63, 81, 54. You may want to change the minimum threshold at the top to 0.9 to show the results at a slightly more lenient p-value. Note that the results are for contrast 3 (tstat3.nii.gz), which is the comparison of healthy controls minus patients.

When you were looking at the dual regression results, the minimum threshold of the tfce_corrp_tstat3 image that was loaded was set to either 0.95 or 0.9. What do these values mean?
Incorrect! As the name of the file suggests, you are looking at a corrp image (i.e. the values are p-values that have been corrected for multiple comparisons). The p-values are shown as 1 minus the p-value to help make it easier to visualise. Therefore a threshold of 0.95 means that you are looking at results p<0.05 corrected, and a threshold of 0.9 shows results p<0.1 corrected.
Incorrect! These images contain 1 minus the p-value to help make it easier to visualise. Therefore a threshold of 0.95 means that you are looking at results p<0.05 corrected, and a threshold of 0.9 shows results p<0.1 corrected.
Correct! To make it easier to visualise, the results are saved as 1 minus the p-value. Note that you are looking at the corrp image, so these p-values have been corrected for multiple comparisons.

Some scripting (optional)

It is good to get acquainted with command line versions of the FSL tools. This is particularly the case for melodic because if you perform some sort of cleaning at the single subject level (e.g. using fsl_regfilt or FIX or AROMA), you currently cannot use the melodic GUI anymore to set up and run your group level analysis (because the GUI would use the un-cleaned images). So we are going to have a look at how we would run a group ICA analysis from the command line, which you will need if you run your own resting state analysis. As mentioned above, the group ICA results you have been looking at were actually created from the command line rather than the GUI, using this command:

melodic -i input_files.txt -o groupICA15 \
  --tr=0.72 --nobet --bgthreshold=1 -a concat \
  --bgimage=$FSLDIR/data/standard/MNI152_T1_2mm_brain.nii.gz \
  -m $FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz \
  --report --Oall -d 15
If you wanted to extract 200 components, which part of the command would you change?
Correct! The -d flag controls the dimensionality of the decomposition
Incorrect! This just controls the name of the output directory, it does not have any influence on the dimensionality. For this you need to change the -d flag.
Incorrect! Command line versions always have at least the same number of options that are available in the gui (and often more options). To change the dimensionality, use the -d flag.

If you wanted to run melodic from the command line using the cleaned-up filtered_func_data_clean.nii.gz images that are in each subject's folder, what problem would you run into? How does the script, listed below, solve the problem?

#!/bin/sh
for j in `ls -d */*_resting.feat` ; do
    applywarp -r ${j}/reg/standard.nii.gz \
              -i ${j}/filtered_func_data_clean.nii.gz \
              -o ${j}/filtered_func_data_clean_standard \
              --premat=${j}/reg/example_func2highres.mat \
	      -w ${j}/reg/highres2standard_warp.nii.gz
done
ls -1 */*_resting.feat/filtered_func_data_clean_standard.nii.gz >> inputlist_new.txt
digraph G { rankdir=LR node [shape=box]; 1 [label="applywarp"]; 2 [label="input image (-i ${j}/filtered_fund_data_clean.nii.gz)"]; 3 [label="reference image (-r ${j}/reg/standard.nii.gz)"]; 4 [label="transformation matix (--premat=${j}/reg/example_func2highres.mat)"]; 5 [label="output filename (-o ${j}/filtered_func_data_clean_standard)"]; 6 [label="FNIRT warp image (-w ${j}/reg/highres2standard_warp.nii.gz)"]; 2 -> 1; 3 -> 1; 4 -> 1; 6 -> 1; 1 -> 5; }

This script is also saved in the ~/fsl_course_data/rest/ICA directory, and you can run it by typing sh ./optional_script.sh into the command line, although you may not want to do this right now, because it does take a little while (up to 20 minutes).

Another quick script that is useful for checking the maximum of every 1-p-value image across a set of dual regression stage 3 outputs is below. You can try to run it on the dual regression output we provided you with in groupICA15.dr. If the maximum value in any given image is not above 0.95, you know that nothing survived thresholding:

cd groupICA15.dr
for i in dr_stage3_ic00??_tfce_corrp_tstat?.nii.gz ; do
    echo $i `fslstats $i -R`
done

The End.