FLICA (FMRIB's Linked Independent Component Analysis) is an exploratory analysis tool designed to automatically find structured components that cross multiple modalities of group data (e.g. TBSS, VBM, Freesurfer, etc).


Referencing FLICA

If you use this tool in your research, you may want to include the following methods description, and please reference the papers listed below:

"The dataset was analysed using FLICA, a linked independent component analysis (ICA) implemented in FSL (Groves et al., 2011; Smith et al., 2004). FLICA is an entirely data-driven approach that can co-model multiple imaging modalities. Its main goal is to model the imaging data as a set of interpretable features (independent components), most of them characterising biophysically plausible modes of variability across all subjects’ images. Unlike in a principal component analysis, the mixing matrix vectors of an ICA are not forced to be orthogonal to each other, and thus can explain common variance of variables external to the ICA, such as age (Douaud et al., 2014). Linked ICA was implemented as described in detail in earlier papers (Groves et al., 2012; Douaud et al., 2014) (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLICA). Here, we ran FLICA on ? different modalities (xx, yy, zz, ...) with ?? components [with ideally some justification about the dimensionality chosen]."

References:

* Original FLICA methods paper:
see Groves et al., Linked independent component analysis for multimodal data fusion. Neuroimage, Feb 2011, 54(3):2198-2217. PubMed #20932919.

* Main FSL methods paper:
see Smith et al., Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage, Apr 2004. PubMed #15501092.

* First basic and clinical application of FLICA paper:
see Douaud et al., A common brain network links development, aging, and vulnerability to disease. PNAS, Dec 2014. PubMed #25422429.

* Follow-up FLICA methods paper and proof-of-concept:
see Groves et al., Benefits of multi-modal fusion analysis on a large-scale dataset: Life-span patterns of inter-subject variability in cortical morphometry and white matter microstructure. Neuroimage, Oct 2012. PubMed #22750721.


Quick start guide

Setup

Prerequisites: working installations of FSL, Freesurfer (if you want to work with surface data), and the various MATLAB functions that come with them.

Download the latest release and unpack with tar xvzf flica_2013-01-15.tar.gz.

FMRIB account-holders who are members of the cvs group should get the latest version of FLICA by running cvs co flica -- this will make it easier to keep up-to-date without losing your modifications.

Prepare the data

Typically you should run your standard single-modality hypothesis-driven analyses first, and obtain the final per-subject 4D images from the various modalities (i.e. one 4D image, with all the concatenated subjects' volumes, per modality). For VBM this is GM_mod_merg_s?.nii.gz, for TBSS this is all_??_skeletonised.nii.gz, and for Freesurfer this is ?h.thick.fsaverage.mgh and ?h.pial.area.fsaverage.mgh, etc.

These should be in the same subject order, and any missing data should be replaced with a volume of all zeroes.

If you have limited memory or a large number of subjects, it may be helpful to downsample the data to a more manageable number of voxels. For hints on how to do this for VBM and TBSS, see ~adriang/scratch/data/oslo/lars484/make4mm and make2mm, respectively. The Freesurfer data should be downsampled using their tools to a lower standard space, e.g. fsaverage5.

Currently, the only supported inputs are *.mgh and *.nii.gz -- but more formats could easily be added to flica_load.m and flica_save.m

Run the analysis

To run FLICA it's best to set up a MATLAB script, along the lines of the template file supplied in flica_template_script.m. Here are the important bits:

   1 addpath([getenv('FSLDIR') '/etc/matlab/'])
   2 addpath /usr/local/fmrib/fmt/
   3 
   4 outdir = '/my/output/directory/';
   5 mkdir(outdir)

This just sets up the environment and creates an output directory. Note that there's no automatic + appending -- so it will overwrite files if it already exists!

   1 Yfiles = {'/path/to/file1.nii.gz', '/path/to/file2_?h.mgh'};
   2 [Y fileinfo] = flica_load(Yfiles);

Noting the use of full paths and question marks instead of l or r in left/right hemisphere MGH files. Now Y contains a cell array of voxels-by-subjects data for each modality, and fileinfo is stuff like the resolution, masks, etc.

Next we set up the options and run the analysis:

   1 clear opts
   2 opts.num_components = 20;
   3 opts.maxits = 10000;
   4 % Run flica_parseoptions to see a list of options and default values

This sets up a few options; anything not supplied will be set to default values. For more information on the available options, call flica_parseoptions.m from the matlab prompt without any input arguments.

   1 Morig = flica(Y, opts);
   2 M = flica_reorder(M);
   3 % Or: M = flica_reorder(M, HmatrixFromAnEarlierRunForComparison)
   4 flica_save_everything(outdir, M, fileinfo);

The flica call does all the calculations, producing a set of components in an arbitrary order. flica_reorder then reoganizes (and sign-flips) them in a sensible order: either in order of total energy, or to match the subject-courses of a previous similar run (for easier comparison, e.g. if you add another modality, or re-run with different initialisation). flica_save_everything populates the output directory with spatial maps, timecourses, and various other information. Now the output is safely saved, and all the rest of these steps could be run in a different MATLAB session.

The post-hoc correlation analysis is performed as follows:

   1 clear des
   2 des.Subject_Index = (1:size(Y{1},2))';
   3 des.Age = load('age.txt');
   4 des.Group = load('group.txt');
   5 % etc etc.
   6 % Each one should be a subjects-by-1 column vector.
   7 
   8 flica_posthoc_correlations(outdir, des);

This produces a few more text files and correlation_*_*.png correlation plot images. It also produces the significantComponents.txt output file, which lists the most-significantly-correlated components for each EV. This text file is dropped straight into the final HTML report. Note that if you re-run flica_posthoc_correlations it will overwrite the existing correlations*.png files and overwrite significantComponents.txt -- so generally it's a good idea to only add, not remove, EVs. If you remove EVs (or change an EV's name) you should manually delete all of the old correlations*.png, or they will continue to appear in the report.

Optionally, if you have high-res versions of the input data, you can "upsample" the results (actually re-fit the model to the new data):

   1 YfilesHR = {'/path/to/file1_HIGHRES.nii.gz', '/path/to/file2_HIGHRES_?h.mgh'};
   2 flica_upsample(outdir, YfilesHR, Y);
   3 %Note that Y is optional, and just used for error checking (to make sure the scaling is the same, etc):
   4 %flica_upsample(outdir, YfilesHR); % Would also work (but more error-prone).

This will produce new output .nii.gz and .mgh files at the higher spatial resolutions.

Finally, the rendering and report-generation is all done by a shell script:

   1 srcpath = regexprep(mfilename('fullpath'),mfilename,''); % /path/to/this/function/
   2 dos([srcpath '/flica_report.sh ' outdir ' _HR'])

Where the _HR argument instructs it to use the high-res outputs rather than low-res outputs to generate the report. Of course, this step could always be performed separately, from the command line -- this might be advisable if you want to use a different machine for Freesurfer rendering. I suggest using a Mac, as Freesurfer rendering in Linux uses a screenshot method that breaks if you cover up the window while it's running.

For a more advanced example of how to set up your main script, see flica_lars484_WMGM.m. This repeats the analysis of [Groves2012].

File outputs

flica_save_everything.m: The main outputs are the spatial maps, contained in niftiOut_mi*.nii.gz and mriOut_mi*.nii.gz. These are pseudo-Z-statistic images, indicating how high a voxel in the spatial map sticks out above the estimated noise floor. The report is generated with hard-coded tresholding from Z=3 to Z=10. The corresponding subject-courses are given in subjectCoursesOut.txt -- note that there's only of of these, because the subject-courses are shared across modalities. Other useful information is contained in noiseStdevOut.txt and

flica_posthoc_correlations.m: This generates the correlation_*.png files.

render_surfaces.sh generates the fsaverage*_in_all.png inflated surface renderings.

surfaces_to_volumes_all.sh converts the surface *.mgh files into *.nii.gz volume files. Note that this currently assumes a linear transformation (FS_to_stdspace.mat) between Freesurfer's standard template (/vols/Data/oslo/fsaverages/fsaverage/mri/orig.mgz; convert this to NIFTI with mri_convert -i orig.mgz -o orig.nii.gz) and the MNI space that everything else is aligned to ($FSLDIR/data/standard/MNI152_T1_?mm.nii.gz). This could explain some of the misalignment between area changes and VBM changes when the two are overlaid. (Another possible suspect: misalignment introduced because of VBM's custom template?)

render_lightboxes_all.sh creates lightbox_*.png files from slices of the *.nii.gz files.

flica_html_report.sh collects all of the relevant *.png and *.txt files into a set of *.html pages that collectively make up the report.

TODO: Move the PNG and HTML files into a "report" directory.

Reading the report

In your output directory, there should be an index.html file. Look at it!

First there are links to each of the components' separate pages. There are also some all-in-one pages if you prefer -- but note that these can contain 100+ MB of images and might slow down your computer!

Below, there are some summary images. The first shows the distribution of weight across different modalities, and should show that many of the components are shared between modalities.

Next the per-subject noise estimates should be reasonably flat, with NO values very close to zero (this would indicate a major problem). It's okay if some of the data structure shows through (e.g. age), but I'd expect the overall variability to be quite small (less than 2:1 ratio between largest and smallest values). There's one line per modality.

Then the "Components dominated by a single subject" plot shows how much of the energy (subject weight squared) comes from a single subject. The lines show the a=0.05 corrected (solid) and uncorrected (dashed) thresholds, assuming normally-distributed subject loadings. Note that the subject loadings aren't normally distributed so this isn't necessarily a problem if it's above threshold. However, very large values indicate that this component is driven by a outlier subject (or subjects) and should be looked at carefully. This can be cause by genuinely unusual brains or by analysis problems (e.g. severe registration failure).

Below these images, the report lists the most-significantly-correlated components for each EV in the posthoc regression analysis. The p-values refer to significance of the linear regression and the significance of the quadratic term in a quadratic regression. Bold and double-starred values are significant corrected across components and EVs.

On the individual component pages, each modality's spatial map is shown in a lightbox view. The text above each image names the modality and gives its "weight" in this component, as a percentage. The "prior" weight is fixed, so its fraction can be used of a rough estimate of how large the component is overall; a small (0-1%) prior weight means the component is very large, and a big (>10%) prior weight indicates it wasn't that far off being eliminated. Surface renderings are also shown for .mgh inputs (but so far I haven't found a good way to project volumetric results back onto the surface).

Below this we have the scatterplots between the component subject-loadings (y axis) and each of the EVs (x axis). If there's a significant linear relationship the linear fit is plotted on the graph, and the quadratic fit is plotted if the quadratic term of that fit is significant. The same definitions of significant are used as above, with a corresponding line equating to ** and a dashed line equating to *.

Debugging

If FLICA fails during the MATLAB part of the run, unfortunately it won't save the output and you'll lose the calculation results. Hopefully the MATLAB output/error messages will help you figure out where you've gone wrong. I think I've moved most of the error checks early in the code so it should crash quickly if you give it invalid inputs.

If the *.nii.gz, *.mgh and *.txt outputs have been saved, then it's an error with the report-generating script flica_report.sh. Based on the output files you have you should be able to figure out what stage it is, and hopefully the error messages will make it easy to determine whether this is a configuration error (FreeSurfer not in the path, etc) or a bug in the scripts.

Assuming it's produced a report, have a look and see if there's anything interesting! Don't expect spectacular results on <100 subjects -- but hopefully you should get the basic effects (age, ICV, etc) and some artefacts.

Generally if you get lots of very noisy-looking components, something has gone wrong... generally overfitting. This is diagnosed by looking at the subject noise plots on the index page; these should be reasonably flat, and any values close to zero indicate severe overfitting. This can be caused by silly mistakes like including the same subject twice. It can also be caused by having too many components... I generally suggest that #components < #subjects/4. Instead of solving the root problem, you can also reduce the overfitting by turning off per-subject noise estimation: use the opts.lambda_dims = ''; option to do this.

 

FLICA (last edited 13:39:34 30-03-2016 by GwenaelleDouaud)