ct5b00594_si_002.xlsx (58.11 kB)
Guaranteed Discrete Energy Optimization on Large Protein Design Problems
dataset
posted on 2015-12-08, 00:00 authored by David Simoncini, David Allouche, Simon de Givry, Céline Delmas, Sophie Barbe, Thomas SchiexIn Computational Protein Design (CPD),
assuming a rigid backbone
and amino-acid rotamer library, the problem of finding a sequence
with an optimal conformation is NP-hard. In this paper, using Dunbrack’s
rotamer library and Talaris2014 decomposable energy function, we use
an exact deterministic method combining branch and bound, arc consistency,
and tree-decomposition to provenly identify the global minimum energy
sequence-conformation on full-redesign problems, defining search spaces
of size up to 10234. This is achieved on a single core
of a standard computing server, requiring a maximum of 66GB RAM. A
variant of the algorithm is able to exhaustively enumerate all sequence-conformations
within an energy threshold of the optimum. These proven optimal solutions
are then used to evaluate the frequencies and amplitudes, in energy
and sequence, at which an existing CPD-dedicated simulated annealing
implementation may miss the optimum on these full redesign problems.
The probability of finding an optimum drops close to 0 very quickly.
In the worst case, despite 1,000 repeats, the annealing algorithm
remained more than 1 Rosetta unit away from the optimum, leading to
design sequences that could differ from the optimal sequence by more
than 30% of their amino acids.