Reversible auto-encoding of amino-acid residues in reduced space: an application to predicting DNA-binding proteins

2017-11-21T00:27:06Z (GMT) by Ahmad, Shandar
There have been a number of recent studies aiming to predict binding sites and other structural and sequence features of proteins using local amino acid sequence as inputs to a machine learning system. This requires representing amino acids in numerical space, which is typically 20 bits per residue. Number of trainable parameters significantly becomes large with the addition of each neighbor information and hence the application of the technique becomes restricted to the prediction of properties for which large amounts of data is available. Thus, there is a need to find alternatives to this type of sparse encoding. Here a method of auto encoding 20-dimensional sparse representation into lower dimensional space is developed with amino-acids in perspective- although the method is general. It is shown that 20-bit sparse encoding could be reduced to 6-dimensional real space without loss of information and to even lower dimensions with varying degrees of information loss. An application to predicting DNA-binding sites was tested to assess the validity of the proposed method and it was observed that auto-encoded neural network prediction was comparable or superior to sparse encoding system. PRIB 2008 proceedings found at: Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.