Publications • Baker Lab

Preprints available on bioRxiv

26 entries « ‹ 2 of 2 › »

21.

Baek, Minkyung; Anishchenko, Ivan; Park, Hahnbeom; Humphreys, Ian R.; Baker, David

Protein oligomer modeling guided by predicted inter-chain contacts in CASP14 Journal Article

In: Proteins, 2021.

22.

Norn, Christoffer; Wicky, Basile I. M.; Juergens, David; Liu, Sirui; Kim, David; Tischer, Doug; Koepnick, Brian; Anishchenko, Ivan; Baker, David; Ovchinnikov, Sergey

Protein sequence design by conformational landscape optimization Journal Article

In: Proceedings of the National Academy of Sciences, vol. 118, no. 11, 2021.

Abstract | Links | BibTeX

@article{Norn2021,

title = {Protein sequence design by conformational landscape optimization},

author = {Norn, Christoffer and Wicky, Basile I. M. and Juergens, David and Liu, Sirui and Kim, David and Tischer, Doug and Koepnick, Brian and Anishchenko, Ivan and Baker, David and Ovchinnikov, Sergey},

url = {https://www.pnas.org/content/118/11/e2017228118, PNAS

https://www.bakerlab.org/wp-content/uploads/2021/03/Norn_etal_PNAS2021_LandscapeOptimization.pdf, Download PDF},

doi = {10.1073/pnas.2017228118},

year  = {2021},

date = {2021-03-16},

urldate = {2021-03-16},

journal = {Proceedings of the National Academy of Sciences},

volume = {118},

number = {11},

abstract = {Almost all proteins fold to their lowest free energy state, which is determined by their amino acid sequence. Computational protein design has primarily focused on finding sequences that have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state but the energy difference between the folded state and the lowest-lying alternative states. We describe a deep learning approach that captures aspects of the folding landscape, in particular the presence of structures in alternative energy minima, and show that it can enhance current protein design methods.The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen{textquoteright}s thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Almost all proteins fold to their lowest free energy state, which is determined by their amino acid sequence. Computational protein design has primarily focused on finding sequences that have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state but the energy difference between the folded state and the lowest-lying alternative states. We describe a deep learning approach that captures aspects of the folding landscape, in particular the presence of structures in alternative energy minima, and show that it can enhance current protein design methods.The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen{textquoteright}s thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.

Close

23.

Hiranuma, Naozumi; Park, Hahnbeom; Baek, Minkyung; Anishchenko, Ivan; Dauparas, Justas; Baker, David

Improved protein structure refinement guided by deep learning based accuracy estimation Journal Article

In: Nature Communications, vol. 12, no. 1340, 2021.

Abstract | Links | BibTeX

24.

Ziatdinov, Maxim; Zhang, Shuai; Dollar, Orion; Pfaendtner, Jim; Mundy, Christopher J.; Li, Xin; Pyles, Harley; Baker, David; De Yoreo, James J.; Kalinin, Sergei V.

Quantifying the Dynamics of Protein Self-Organization Using Deep Learning Analysis of Atomic Force Microscopy Data Journal Article

In: Nano Letters, 2021.

Abstract | Links | BibTeX

25.

Yang, Jianyi; Anishchenko, Ivan; Park, Hahnbeom; Peng, Zhenling; Ovchinnikov, Sergey; Baker, David

Improved protein structure prediction using predicted interresidue orientations Journal Article

In: Proceedings of the National Academy of Sciences, 2020, ISBN: 0027-8424.

Abstract | Links | BibTeX

@article{Yang2020,

title = {Improved protein structure prediction using predicted interresidue orientations},

author = {Yang, Jianyi and Anishchenko, Ivan and Park, Hahnbeom and Peng, Zhenling and Ovchinnikov, Sergey and Baker, David},

url = {https://www.pnas.org/content/early/2020/01/01/1914677117

https://www.bakerlab.org/wp-content/uploads/2020/01/Yang2020_ImprovedStructurePredictionInterresidueOrientations.pdf

},

doi = {10.1073/pnas.1914677117},

isbn = {0027-8424},

year  = {2020},

date = {2020-01-02},

journal = {Proceedings of the National Academy of Sciences},

abstract = {Protein structure prediction is a longstanding challenge in computational biology. Through extension of deep learning-based prediction to interresidue orientations in addition to distances, and the development of a constrained optimization by Rosetta, we show that more accurate models can be generated. Results on a set of 18 de novo-designed proteins suggests the proposed method should be directly applicable to current challenges in de novo protein design.The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

26.

Wu, Qi; Peng, Zhenling; Anishchenko, Ivan; Cong, Qian; Baker, David; Yang, Jianyi

Protein contact prediction using metagenome sequence data and residual neural networks Journal Article

In: Bioinformatics, vol. 36, no. 1, 2019.

Abstract | Links | BibTeX

26 entries « ‹ 2 of 2 › »