Merging tag-based proteomic experiments

software

posted on 2018-02-26, 10:11 authored by Andrew Landels

This dataset/code forms part of Andrew Landels' thesis: "Improving proteomic methods and investigating H2 production in Synechocystis sp. PCC6803" http://etheses.whiterose.ac.uk/id/eprint/19034

The code in this section is split into two separate scripts, both written in Mathematica. The first (MaxQuant_to_SignifiQunat) converts the data format of files generated by the program MaxQuant and re-orders them into a format that can be input to SignifiQuant - a program in the in-house proteomics pipeline available at the Sheffield University Biological and Chemical Engineering Department. This code reads one or more files within a relevant directory, collects all peptide information, and writes a new file containing all required data. As such, it is both a conversion script and also a data-collecting script.

The second script investigates methods for merging together two biologically replicated datasets - specifically, one dataset represents a complete experimental replicate of the other. The theory behind this methodology is described in the aforementioned thesis, chapter 4.6. Briefly, this code examines the label intensity distributions, log-transforms the data, then utilises the median correction method to generate a fixed median value (0) and scales the data to generate an equal gradient between the 40th and 60th percentile.

The protein data in the repeat experiment are then scaled by the protein data in the initial experiment. This slightly disrupts the balancing by median correction, however not significantly. The data are then plotted against each other in a scatter plot, demonstrating systematic improvement of the quality of the between-experiment repeatability. A principal component analysis was then performed, showing a much closer clustering by experimental condition (principal component 1) than of experimental replication deviations (principal component 2), demonstrating success of the method.

This method shows effective combination of two proteomic datasets that are completely independent experimental repeats, demonstrating for the first time that this methodology is feasible in tag-based proteomic investigations.

Funding

EU FP7 308518

History

Ethics

The project has ethical approval and have included the number in the description field

Policy

The data complies with the institution and funders' policies on access and sharing

Sharing and access restrictions

The data can be shared openly

Data description

The file formats are open or commonly used

Methodology, headings and units

Headings and units are explained in the files

Usage metrics

Keywords

iTRAQ MaxQuant Merging experimental data Bioinformatics Proteomics Bioinformatics

Licence

MIT

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM