Understanding protein structure and mechanism of their action has been an exciting area of research both from the point of view of understanding biological/biochemical processes and to guide the structure based drug discovery process (if the protein is a drug target) . However, the unprecedented increase in the number of new protein sequences discovered in the sequencing phase of various genome projects highlights the need of fast, efficient and high-throughput methods for structural studies of proteins [2-4]. The X-ray crystallography and solution state NMR are the two most widely used techniques for structural studies of proteins. However, the versatility of NMR has made it the technique of choice both for determining three-dimensional structures of moderately sized proteins (molecular weight < 20 kDa) and in characterizing their biophysical and biochemical functions . Particularly, it is preferred over X-ray crystallography because of its ability to provide (a) structural models of proteins in near physiological conditions, (b) derive residue level information about protein dynamics and (c) studying their interactions with their physiological binding partners; all critically important to understand how do these proteins function and how can their activity be altered . However, the NMR-based approaches for protein structure determination in vogue are highly time-consuming, laborious and suffer in throughput. The major limitation is the long experiment time requirement (ranging from weeks to months) to collect all the necessary multidimensional NMR data and even months to analyze the spectral data to solve a single 3D structure of a protein. Many efforts have been made in recent past to speed up the NMR data collection by developing fast multidimensional NMR methodologies [6-8]. These fast data acquisition schemes were paralleled with advent of ultra-high-field NMR spectrometers, automated data analysis tools and structural modeling software’s to facilitate the structure determination process [9,10]. However despite commendable developments in NMR methodologies and technologies, time-efficient determination of protein structures is still far from routine due to several procedural bottlenecks associated with current NMR based approaches.
The most laborious and time-consuming step in NMR structure determination of proteins is the assignment of 1H-1H NOE cross-peaks to particular proton (1H) resonances . The process not only requires additional data collection to establish the sequence specific assignment of side chain proton resonances, but it further involves extensive human intervention (a) to resolve the ambiguities arising because of degeneracies of 1H chemical shifts and (b) to correct the inaccuracies/ mismatches between the side chain 1H shifts and NOE cross-peaks arising because of inter-spectral variation of chemical shifts. Therefore, the approach based on the use of 1H-1H NOEs is not ideally suited for high-throughput structural studies of proteins; rather researchers remain interested
to restrict to those structural constraints which could be obtained rapidly and unambiguously. The backbone amide NOEs are of particular interest in this regard owing to several clear advantages such as: (a) these can be assigned very accurately and rapidly without much effort  and (b) the process does not require additional data collection and the analysis can be started directly after the sequence specific backbone amide assignment is established which is performed almost invariably in every structural study of protein by NMR. Generally, these NOEs are sufficiently strong for the residues of beta-sheet and alpha-helical elements and coupled with backbone dihedral angles are suffice to generate the 3D folds. The approach -known as PFBD (or protein fold from backbone data only)- has already been demonstrated by our group on human ubiquitin and Chicken Sh3 domain . Both these proteins are largely beta-sheet proteins, however, when we applied the same PFBD approach for structure modeling of bovine apo-calbindin, it failed to generate a reliable 3D fold, though secondary structural elements were all wellformed. This failure attempt derived our interest to further improve the PFBD protocol . The key lesson from the failure was that the backbone amide NOEs alone are not sufficient to orient the helical elements properly and may lead to an inaccurate fold which cannot even be refined by making use of residual dipolar couplings (RDCs) ; thus suggesting that the PFBD approach is not at all adequate for structural modeling of proteins containing alpha-helices as the internal supportive elements. Therefore, an adequate number of side chain NOEs are crucial for accurate protein structure modeling. However, to achieve rapidity in the process, these should be obtained without much effort and without much elaborating the data collection and data analysis time. In this regard, the13C-13C NOEs could be of potential interest owing to several procedural advantages such as:
1. The dispersion and resolution of carbon chemical shifts is relatively much better compared to proton chemical shifts ; therefore, the assignments of carbon 13C resonances and thereof, 13C-13C NOEs would require very little effort and time compared to that required for assigning side chain proton and NOE crosspeak resonances.
2. Total number of aliphatic carbons is almost half of the total number of protons present in a protein (Figure 1A); therefore, the overall analysis time and efforts required to assign the 13C-13C NOEs would be significantly low compared to that of 1H-1H NOEs.
Figure 1: (A) Lowest energy 3D conformer of
3. No additional data collection would be required to establish the assignment of 13C-13C NOEs. This is because of the fact that the carbon chemical shifts -except carbonyl carbon (13C’) chemical shifts- are highly specific to amino acid types and are widely and almost invariably used to map the sequential connectivity’s onto the protein primary sequence for assigning them sequence specifically . Not only for the purpose of resonance assignment, these shifts (especially, 13Cα and 13Cβ) in combination with backbone 13C’ chemical shifts are also used to extract the secondary structure information in terms of backbone dihedral (φ,ψ) angles; and the resulted information is then used in protein structure modeling [16-18]. However advantageously, these sequence specifically assigned backbone and side-chain carbon chemical shifts can also be used to differentiate intraresidue carbon NOEs from inter-residue carbon NOEs.
Despite of having great potential to facilitate and expedite the NMR structure determination process, there has not been any attempt so far to make the exclusive use of 13C-13C NOEs in routine protein structure modeling process. This is because of technical reason that the direct 13C-13C NOEs are very week in case of smallto- moderately sized proteins (for explanation see this reference  and therefore cannot be discerned for such systems. An alternative and contemporary solution to this has been envisaged in our lab based on the use of 13C-13C NOEs obtained from triple resonance 3D 13C-HSQC-NOESY-15N-HSQC spectrum -commonly known as 3D CNH-NOESY spectrum . The spectrum was reported previously to complement the spectra like 15N- edited NOESY HSQC and/or NCH-NOESY ; particularly to resolve the ambiguities arising because of degenerate 1H chemical shifts and getting maximum and unambiguous 1H-1H NOE cross-peak assignments including diastereotopic discrimination. For the purpose it was reported, the experiment of-course has been used in several NMR based protein structure determination projects [14,19]. The particular advantage of CNH-NOESY experiment is that the NOE cross-peaks appear along the 13C dimension which exhibits better peak dispersion and relatively lesser number of peaks compared to 1H dimension; therefore rendering the assignment of NOE cross peaks considerably simple and straightforward. However, important to mention here is that the CNH-NOESY spectrum does not provide the direct 13C-13C NOEs rather these are 1H-1H NOEs transferred from protons (1H) to the attached carbons (13C) though J-coupling evolution and detected along the 13C dimension. However, to the best of our knowledge, the NOE information (or empirical distance constraints) derived from this spectrum have not been used exclusively so far for protein structure modeling. This is presumably because of the complications involved in accurately scaling NOE cross-peak intensities into the distance information as these are obtained indirectly through 1H-1H dipolar mixing and are further affected by transfer efficiencies involved in the two heteronuclear transfers . Therefore, the spectrum not only provides ambiguous 13C-13C distance constrains, it also lacks the diastereotopic information for CH2 groups. However, the problem of ambiguous distance constraints is inherent to all NOESY type experiments either because of spin-diffusion process  or chemical shift degeneracy (i.e. when two or more NOE cross-peaks overlap, it results in incorrect cross-peak intensity and therefore ambiguous distance constraint). No doubt, such problems can be circumvented making using of additional complementary experiments- e.g. the ambiguities in 15N/1HN edited NOESY HSQC spectrum due to amide shift degeneracy can be resolved by using 13C/1H edited NOESY HSQC spectrum and vice versa (thus reducing the probability that they overlap). However, the approach not only elaborates the analysis, it further increases the demand for NMR instrument time; therefore is not well suited for high-throughput structural proteomics. Another way is to directly use the ambiguous distance constraints (generally less than 6 Å) to generate an ensemble of initial structures (called conformers); each starting from an initial random conformation and further optimized by simulated annealing to maximally satisfy the available experimental restraints [21-25]. Afterward, the protein structure obtained with initial constraint set is refined following iterative optimization cycles, in which the protein structure obtained in a given cycle is used to refine the structural constraints. In practice, the violated structural constraints are either refined or removed from the input file and another cycle is run and the process is repeated till a quality structural ensemble is obtained. The quality of conformers is assessed by comparing their variance (i.e. degree of their topological agreement) and evaluating their target functions [22,23]. Generally, the target function is defined such that it is zero if all the input restraints are fulfilled and all non-bonded atom pairs satisfy a check for the absence of steric overlap) [22,23]. Therefore, the calculated structure is considered to be optimal; if the target function is close to zero, deviation of the conformers is minimal, while only a minimal number of input constraints are violated. Though it is not possible to validate/check that the procedure has found a global minimum, it has been shown in practice (i.e. by comparing the results with structures already known from crystallography) that it is possible to determine the correct structure, if the assigned protein resonances are correct and enough distance constraints are available to generate the fold and fix the orientation of secondary structural elements. Based on this practical scenario, we reckoned that the indirectly measured 1H-1H NOEscan also be used for protein structure modeling. A generalized flowchart of the strategy designed to prove this conjecture is schematically illustrated in Figure 1B. The strategy has been given the name<Prot3DNMR>which additionally encompasses the benefits of backbone amide NOEs. The purpose of using amide NOEs is to generate a reliable initial foldtopologyin combination with backbone dihedral angles and the resulted fold is then used to guide the assignment of 13C-13C NOEs in the CNH-NOESY spectrum. This is important because the erroneous assignments can severely affect the quality and accuracy of the resulted structural ensemble; therefore, early identification of such errors, or ideally, to develop procedures that minimize the occurrence of such errors, is of critical importance to ensure the convergence and accuracy of the final structure. The purported <Prot3DNMR> approach has successfully been tested on two representative protein systems: bovine apo-calbindin and human SUMO-1. In both the cases, we were able to generate reliable protein folds with adequate accuracy. After successful structure validation and making fold reliability check, both these <Prot3DNMR> structures have been submitted in the protein data bank (www.rcsb.org/pdb) and their respective PDB IDs are: 2MAZ and 2MW5. However, further validation studies on protein systems with different structural fold and complexity are imperative to bring this high-throughput and efficient strategic idea in routine NMR practice. The immediate utility of <Prot3DNMR> approach can be envisaged to generate high-resolution protein structures i.e. first to generate a reliable 3D fold using this approach and then use the residual dipolar couplings (RDCs)to further refine the <Prot3DNMR> fold .
I would like to acknowledge the Department of Science and Technology, India for providing the project research grant under SERC (now SERB) Fast Track Scheme (Registration Number: SR/ FT/LS-114/2011). Also, I am highly obliged to Dr. Ashish Arora for helping in molecular biology experiments. And last but not the least, I acknowledge the help of all the research students –Nancy Jaiswal, Nisha Raikwal, Vaivav Shukla, Chandresh Sharma and Hemachandra Kotamarthi- who were involved in various aspects of this proof- of the principle study and helped to accomplishing this at its good time. I am also very grateful to Prof. C L Khetrapal and Dr. Anupam Guleria for their kind support and suggestions during the course of this whole work.
Kumar D. Indirectly Measured 1H-1H NOEs for Rapid Protein Structure Modeling By NMR: A Counterfeit in the Game. SM J Bioinform Proteomics. 2016; 1(1): 1004.