TY - DATA T1 - Merging tag-based proteomic experiments PY - 2018/02/26 AU - Andrew Landels UR - https://orda.shef.ac.uk/articles/software/Merging_tag-based_proteomic_experiments/5327506 DO - 10.15131/shef.data.5327506.v1 L4 - https://ndownloader.figshare.com/files/9139237 L4 - https://ndownloader.figshare.com/files/9139273 L4 - https://ndownloader.figshare.com/files/10590382 KW - iTRAQ KW - MaxQuant KW - Merging experimental data KW - Bioinformatics KW - Proteomics KW - Bioinformatics N2 - This dataset/code forms part of Andrew Landels' thesis: "Improving proteomic methods and investigating H2 production in Synechocystis sp. PCC6803" http://etheses.whiterose.ac.uk/id/eprint/19034The code in this section is split into two separate scripts, both written in Mathematica. The first (MaxQuant_to_SignifiQunat) converts the data format of files generated by the program MaxQuant and re-orders them into a format that can be input to SignifiQuant - a program in the in-house proteomics pipeline available at the Sheffield University Biological and Chemical Engineering Department. This code reads one or more files within a relevant directory, collects all peptide information, and writes a new file containing all required data. As such, it is both a conversion script and also a data-collecting script.The second script investigates methods for merging together two biologically replicated datasets - specifically, one dataset represents a complete experimental replicate of the other. The theory behind this methodology is described in the aforementioned thesis, chapter 4.6. Briefly, this code examines the label intensity distributions, log-transforms the data, then utilises the median correction method to generate a fixed median value (0) and scales the data to generate an equal gradient between the 40th and 60th percentile. The protein data in the repeat experiment are then scaled by the protein data in the initial experiment. This slightly disrupts the balancing by median correction, however not significantly. The data are then plotted against each other in a scatter plot, demonstrating systematic improvement of the quality of the between-experiment repeatability. A principal component analysis was then performed, showing a much closer clustering by experimental condition (principal component 1) than of experimental replication deviations (principal component 2), demonstrating success of the method.This method shows effective combination of two proteomic datasets that are completely independent experimental repeats, demonstrating for the first time that this methodology is feasible in tag-based proteomic investigations. ER -