% % ------------------------------------------------------------------------------ Data files required to process Allen Human Brain Atlas data: % % ------------------------------------------------------------------------------ A. Arnatkeviciute, B.D. Fulcher, A. Fornito. A practical guide to linking brain-wide gene expression and neuroimaging data (https://doi.org/10.1016/j.neuroimage.2019.01.011): Matlab code for processing these data files and reproducing our analyses is in the github repository https://github.com/BMHLab/AHBAprocessing. Data has been updated on the 7th April 2020 - we have discovered some inaccuracies in the cust100 and cust250 parcellations. New random parcellations containing 100 and 250 regions per hemisphere were were generated and corresponding data updated. Data has been updated on the 28th August 2018 - in the previous version gene ordering in the ROI x gene matrices did not correspond to the gene information provided in the probeInformation structure. % % ------------------------------------------------------------------------------ AHBAprocessed.zip contains % % ------------------------------------------------------------------------------ Processed region × gene expression matrices for the four parcellations are presented. Regions are listed along rows and genes along columns. The first column specifies the region of interest (ROI) index. The matrices present data for cortical regions of the left hemisphere only, as this was the data primarily analysed in our manuscript. We focused on the left hemisphere due to limited sample availability in the right hemisphere in the AHBA. It is possible to easily generate matrices that include subcortical regions, as well as data containing samples for both hemispheres, using the code provided in the github repository: https://github.com/BMHLab/AHBAprocessing. Some ROIs did not have samples assigned to them, and expression values for those regions are labelled as NaN. Below we provide some summary information (the mean number of samples assigned to the region ± standard deviation as well as minimum and maximum number of samples assigned to any of the regions in the hemisphere) regarding the sample assignment to the left cortical region in each parcellation, applying 2 mm distance threshold: The following probe filtering criteria were applied during the processing of the provided data resulting in the selection of 10 027 genes: i) the probe to-to-gene annotations were updated using Re-Annotator package; ii) probes where expression measures do not exceed the background in more than 50% samples were removed; iii) genes that did not have the corresponding RNA-seq measures were removed; iv) probes that demonstrated a low correlation to RNA-seq data (Spearman rho<0.2) were removed; v) a representative probe for a gene was selected based on the highest correlation to the RNA-seq data in the corresponding samples. The list of all options is also specified in the optionsSave structure variable. ROIxGene_aparcaseg_RNAseq.mat 34 ROIs per hemisphere: mean 37.8 ± 22.5 (SD) samples assigned per ROI, min= 5; max = 92; No regions have been excluded. ROIxGene_cust100_RNAseq.mat 100 ROIs per hemisphere: mean 7.4 ± 6.5 (SD) samples assigned per ROI, min= 0; max = 37; Region 54 has been excluded. ROIxGene_cust250_RNAseq.mat. 250 ROIs per hemisphere: mean 5.1 ± 3.5 (SD) samples assigned per ROI, min= 0; max = 18; Regions 122, 127, 180, 183, 223, 230 have been excluded. ROIxGene_HCP_RNAseq.mat 180 ROIs per hemisphere: mean 7.1 ± 6.7 (SD) samples assigned per ROI, min= 0; max = 41; Regions 23, 89 and 104 have been excluded. % ---------------------------------------------------------------------------------------------------------------------------------------- The following probe filtering criteria were applied during the processing of the provided data resulting in the selection of 15 745 genes: i) the probe to-to-gene annotations were updated using Re-Annotator package; ii) probes where expression measures do not exceed the background in more than 50% samples were removed; iii) a representative probe for a gene was selected based on the highest intensity. The list of all options is also specified in the optionsSave structure variable. ROIxGene_aparcaseg_INT.mat 34 ROIs per hemisphere: mean 37.8 ± 22.5 (SD) samples assigned per ROI, min= 5; max = 92; No regions have been excluded. ROIxGene_cust100_INT.mat 100 ROIs per hemisphere: mean 7.4 ± 6.5 (SD) samples assigned per ROI, min= 0; max = 37; Region 54 has been excluded. ROIxGene_cust250_INT.mat 250 ROIs per hemisphere: mean 5.1 ± 3.5 (SD) samples assigned per ROI, min= 0; max = 18; Regions 122, 127, 180, 183, 223, 230 have been excluded. ROIxGene_HCP_INT.mat 180 ROIs per hemisphere: mean 7.1 ± 6.7 (SD) samples assigned per ROI, min= 0; max = 41; Regions 23, 89 and 104 have been excluded. Differential stability (DS) values provided are restricted to the cortical regions only in order to be representative of the data. As previously shown in Hawrylycz et al 2015, DS values based on cortical and cortical&subcortical samples can show substantial differences. If processing options are chosen to include subcortical regions as well, DS will be calculated using both cortical and subcortical regions. % % ------------------------------------------------------------------------------ AHBAdata.zip contains four folders: % % ------------------------------------------------------------------------------ % % ------------------------------------------------------------------------------ parcellations % % ------------------------------------------------------------------------------ Contains four brain parcellations for each of the six brains: defaultparc_NativeAnat.nii - 34 regions per hemisphere + 7 subcortical regions; Desikan et al, 2006. random200_acpc_uncorr_asegparc_NativeAnat.nii - 100 random regions per hemisphere + 10 subcortical regions. HCPMMP1_acpc_uncorr.nii - 180 regions per hemisphere; Glasser et al., 2016. random500_acpc_uncorr_asegparc_NativeAnat.nii - 250 random regions per hemisphere + 15 subcortical regions. Annotation files for the left cortical hemisphere in each parcellation: lh.aparc.annot lh.random200.annot lh.HCP-MMP1.annot lh.random500.annot FS average white matter and pial surfaces for the left cortical hemisphere: lhfsaverage.pial lhfsaverage.white Spherical representation of the cortical surface: lh.sphere HCPMMP1 volumetric parcellation in MNI space. MMPinMNI.nii % % ------------------------------------------------------------------------------ probeReannotation % % ------------------------------------------------------------------------------ hba_microarray_probes_fixed.xlsx - probe sequences provided by the AHBA where data format was fixed (some gene names are assigned date format in excel that needed to be changed manually). probes2annotateALL.fasta - probe sequence information that can be used to perform probe-to-gene reannotation in Re-annotator software. probes2annotateALL_merged_readAnnotation.txt - Re-annotator software output for probe-to-gene annotations. % % ------------------------------------------------------------------------------ processedData % % ------------------------------------------------------------------------------ Pre-computed distances between sample pairs on the cortical surface (DistancesONsurfaceXXX.mat) as well as within the grey matter volume (distancesGM_MNIXXX.mat) for the defaultparc_NativeAnat parcellation. Pre-computed correlation values between different probe selection methods before the intensity-based filtering (probeCorrelationsRAND_RNAseqnoQC.mat and probeCorrelationsRANDnoQC.mat) as well as after intensity-based filtering ((probeCorrelationsRAND_RNAseqQC.mat and probeCorrelationsRANDQC.mat). % % ------------------------------------------------------------------------------ rawData % % ------------------------------------------------------------------------------ Non-processed AHBA expression data downloaded from http://human.brain-map.org/static/download as well as some pre-computed data such as: reannotatedProbes.mat - probe-to-gene reannotated data where re-annotation was performed using Re-annotator software package http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139516 Probes.xlsx - probe information file provided by the AHBA containing information about probe names, probe IDs and associated gene information. mart_export_updatedProbes.txt - probe-to-gene reannotated data where re-annotation was performed using biomart software package http://asia.ensembl.org/biomart/martview/39c608cf8abba1c5248dbe73bcd9c639 limmanormalisedExpression.txt – gene expression measures normalised using limma software package https://bioconductor.org/packages/release/bioc/html/limma.html