Determining large protein structures (>200 amino acids) from limited NMR data using Rosetta

Raman, Lange et al, Science, 327, 1014-8. (2010) 

Large protein structures can now be determined by incorporating backbone-only NMR data into Rosetta. Shown here is the structural comparison of ALG13 (201 amino acids) determined.(A) computationally using RDCs and backbone NH-NH NOEs. (B) experimentally by conventional NMR methods (PDB id : 2jzc)

Nuclear Magnetic Resonance (NMR) is a powerful method for determining protein structures in the physiologically relevant solution state. Chemical shift, which is a unique signature of a protein atom’s microenvironment, is required for all backbone and sidechain atoms to determine the NMR structure by conventional methods. While backbone chemical shifts can be assigned relatively quickly and in an automated fashion, assigning sidechain chemical shifts can be significantly more complex, time-consuming and expensive. For increasingly larger proteins, the NMR spectrum gets extremely crowded, rendering it virtually impossible to assign the spectrum. 

In this study, we demonstrated that sparse, backbone-only NMR data can be combined with the Rosetta protein structure program to computationally determine large NMR structures up to 200 amino acids. This method was tested on over 25 proteins of varying sizes and topologies, of which 5 were completely blind tests. In order to guide Rosetta’s conformation sampling, residual dipolar coupling and NH-NH backbone distance restraints from NMR experiments were used. The data used here is relatively easy to obtain and is far too sparse to enable NMR structure determination by conventional methods.

The Rosetta program rapidly samples protein conformation space, favoring conformations that lower the overall energy (the native state has the lowest energy). Owing to the vastness of search space, unbiased Rosetta trajectories find it difficult to sample conformations close to the native state. For proteins upto 120 amino acids, this study shows that backbone NMR data significantly biases sampling towards the native state. For larger proteins, a novel method that iteratively enriches low energy conformations, can successfully generate structures in the neighborhood of the native structure.

These results suggest a change in the traditional NOE-constraint-based approach to NMR structure determination. In the new approach, the bottlenecks of assigning side-chain chemical shifts and NOESY peaks are eliminated. Instead, more backbone information is collected: RDCs in one or more media and a small number of unambiguous constraints between backbone amide protons from three- or four-dimensional experiments, which restrict beta-strand registers. The approach is compatible with deuteration necessary for proteins greater that 15kD and, for larger proteins, can be extended to include methyl NOEs on selectively protonated samples.