Determination of Protein Secondary Structure from
Infrared Spectra Using Partial Least-Squares Regression
Kieaibi
E. Wilcox
Ewan W. Blanch
Andrew J. Doig
10.1021/acs.biochem.6b00403.s002
https://acs.figshare.com/articles/journal_contribution/Determination_of_Protein_Secondary_Structure_from_Infrared_Spectra_Using_Partial_Least-Squares_Regression/3467171
Infrared
(IR) spectra contain substantial information about protein
structure. This has previously most often been exploited by using
known band assignments. Here, we convert spectral intensities in bins
within Amide I and II regions to vectors and apply machine learning
methods to determine protein secondary structure. Partial least squares
was performed on spectra of 90 proteins in H<sub>2</sub>O. After preprocessing
and removal of outliers, 84 proteins were used for this work. Standard
normal variate and second-derivative preprocessing methods on the
combined Amide I and II data generally gave the best performance,
with root-mean-square values for prediction of ∼12% for α-helix,
∼7% for β-sheet, 7% for antiparallel β-sheet, and
∼8% for other conformations. Analysis of Fourier transform
infrared (FTIR) spectra of 16 proteins in D<sub>2</sub>O showed that
secondary structure determination was slightly poorer than in H<sub>2</sub>O. Interval partial least squares was used to identify the
critical regions within spectra for secondary structure prediction
and showed that the sides of bands were most valuable, rather than
their peak maxima. In conclusion, we have shown that multivariate
analysis of protein FTIR spectra can give α-helix, β-sheet,
other, and antiparallel β-sheet contents with good accuracy,
comparable to that of circular dichroism, which is widely used for
this purpose.
2016-06-20 00:00:00
90 proteins
II data
84 proteins
16 proteins
II regions
structure prediction
band assignments
multivariate analysis
D 2 O
Infrared Spectra
protein structure
peak maxima
H 2 O
protein FTIR spectra
structure determination