Skip to content
parent41 edited this page Feb 24, 2023 · 9 revisions

(Written by Olivier Parent)

Overview

Uk Biobank is a very large dataset of midlife and aging participants acquired in the UK. UK Biobank aims to recruit over 1,000,000 participants. Right now, the dataset contains over 500,000 participants, with 40,000 of them with detailed brain MRI data, and 5,000 with longitudinal MRI data.

Here is a breakdown of the number of participants per timepoint:

  • 1st TP (2006-2010): All participants (500,000+). No MRI data, but lot's of lifestyle and other variables (i.e., could be used to look at midlife risk factors)
  • 2nd TP (2012-2013): Only 20,000 participants. No MRI data, few variables available.
  • 3rd TP (2014+): Over 40,000 participants with MRI data
  • 4th TP (2019+): Over 4,000 participants with MRI data

The age range is between 44 and 85 years old for the 3rd TP, with 80% of the subjects being between 55 and 75 years old. The subjects are mostly healthy, with only a small fraction suffering from dementias and other neurological diseases.

Non-MRI data

The dataset contains over 20,000 non-MRI variables. There is a very useful variable browser to explore the types of variables available. This resource also shows the number of subjects acquired for each variable and each timepoint, as well as details about how the data was acquired (e.g., exact questions asked, screenshots of the questionnaires).

Categories of variables include:

  • Sociodemographic
  • Lifestyle
  • Medical history
  • Cognition
  • Genetics
  • Mental health
  • Etc.

Of note, some variables have a lot of missing data, so it would be wise to check if there is enough data in your variables of interest before planning a project! Still, the size of the dataset allows for subsets of the data that are still very large.

Imaging-Derived Phenotypes

UK Biobank has already processed the brain MRI data with the FSL pipeline, and derived what they call Imaging Derived Phenotypes (IDPs) These are summary measures of brain morphometry/microstructure/connectivity per structure (e.g., cortial thickness in medial temporal lobe, FA in amygdala, etc.). These numbers are all already available and ready for analysis.

Raw MRI data

There are 5 types of MRI acquisitions available:

  • T1-weighted (T1w)
  • Fluid Attenuated Inversion Recovery (FLAIR) (Like a T2w but with nulled CSF signal)
  • Diffusion-Weighted Imaging (DWI)
  • Susceptibility-Weighted Imaging (SWI)
  • Arterial Spin Labelling (ASL) (only for a small subset (1,800) of participants)
  • Task and Resting state Functional MRI (fMRI) (Not further discussed here)

T1w

The T1w images are of good quality (MPRAGE, 1x1x1 mm). These images can be used mainly for segmentation purposes and to investigate the morphology of the brain (i.e., the size and shape of the different brain regions).

FLAIR

FLAIR images are T2-weighted images that null the CSF signal. The FLAIR images are also of good quality (1x1x1 mm). These images are mostly used to segment white matter hyperintensities (WMHs), which are radiological abnormalities in the white matter that are linked to vascular dysfunction and are highly prevalent in aging populations. These images can also be used as a normal T2w for segmentation purposes, like the T1w.

It is also possible to calculate the T1w/T2w ratio (here using the FLAIR as the T2w) to look at "microstructure". However, this measure is not very specific, and there are lots of other, better microstructural metrics available in the dataset, which are discussed below.

DWI

Diffusion-Weighted Images broadly represent the movement of the water molecules when a magnetic field is applied in a specific direction. The process is repeated for many different directions, which gives us many useful informations about the brain. For example, if the water is trapped in the myelin sheats, it will mostly diffuse in the direction of the myelin (e.g., in the fiber bundle), which can give us information about the direction of the fiber bundle, and also the integrity of the fibers.

DWI images in UKB are also of high quality, with multiple shells (2) and many directions (50 per shell). Without going into detail, this allows for a more sophisticated characterization of the diffusion and microstructure compared to single shells acquisitions. However, the resolution is a bit low (2x2x2 mm).

This data can be used to look at many properties of the brain:

  • Microstructure (i.e., the properties of the tissue).
  • Tractography (i.e., the reconstructure white matter fiber tracts)
  • Structural connectivity (i.e., the strength of the structural connections between different brain regions)

SWI

Susceptibility-Weighted Images are very useful to assess the magnetic properties of the brain tissue. However, the SWI data in UKB is okay but not great (two echo times (TEs), 0.8x0.8x3 mm). Still, there are very biologically-meaningful informations about the brain that can be extracted:

  • Microstructure (particularly related to iron distributions, which is known to be altered in aging and neurodegenerative diseases)
  • Segmentation of the veins
  • Oxygen Extraction Fraction (OEF) (i.e., how much of the oxygen in the blood is extracted by the brain tissue)

ASL

Arterial Spin Labelling images can be primarily used to assess the cerebral blood flow (i.e., the amount of blood flowing in the brain). The sequence used is specifically a pseudocontinuous ASL sequence. The ASL data is quite good to my knowledge (3.4x3.4x4.5 mm, multiple post-labelling delays). However, like previously mentionned, it is only available for a small subset of participants only at the second MRI timepoint. Furthermore, the data hasn't been processed by UKB and is only available in raw form, thus no IDPs are available for quick analyses.

Where to find the data

See bellow for the locations of the data

All of the raw and UKB-processed MRI and non-MRI data can be found on Beluga, after having completed the necessary steps to gain access the data

The CIVET data is available via the CBRAIN platform. Before transferring CIVET data, please consult this google sheet to make sure the data is not already on Niagara. Since the complete outputs are very large, it is recommended to transfer only the specific outputs needed. This process is (unnecessarily?) complicated, here's how to do it:

  1. On CBRAIN, click "files" tab
  2. Create CIVET filter (more -> custom filters -> new filter -> File types -> CIVET output)
  3. Select max 5000 subjects (tip: 1000 subjects per page, select all in page, 5 pages)
  4. Lauch SimpleFileExtractor to get specific files (Launch -> SimpleFileExtractor -> Converter 1 or 2 -> save results to SFTP-1 or 2)
  5. Select file pattern(s) to extract (eg. (/surfaces/_mid_surface_right_81920.obj) and click "Start SimpleFileExtractor"
  6. When job is done, click on job from dashboard -> click on Final file -> Compress
  7. To transfer on Niagara, first go to your Niagara directory (eg $SCRATCH) -> sftp -o port=7500 [email protected] or [email protected]
  8. Your compressed file should be there. To transfer it: get -r *
  9. Exit to return to Niagara, cd into new directory
  10. Uncompress file with tar xzvf CBRAIN_ArchivedContent.tar.gz
  11. Do that again in chunks of 5000 IDs to get all files
  12. It's that easy HAHAHAHAHAHAHAHAHAH

I have transferred a lot of that data on Niagara. You can find the paths to all the data available on Niagara and on the CIC on this google Sheet

Resources

Clone this wiki locally