The Chemistry Behind Protein Folding

Divyan Bavan

Proteins are the most diverse group of macromolecules in the cell. With 20 amino acids and many sequences with hundreds of residues, these molecules occupy a vast combinatorial space. As expected, this diversity in protein sequence leads to diversity in protein structure. This builds the underlying principle of structural biology: function follows form. For this reason, proteins can carry out many tasks—breaking down metabolites, transporting other proteins, replicating DNA, and more. In this essay, it will be explored how a protein’s primary, secondary, and tertiary structures lead to the formation of the final, active products.

Primary Structure: Protein Composition

Proteins are composed of 20 different amino acids. These molecules gain their name from the amine and carboxyl ends, which are both ionized at physiological pH values (Voet and Voet, 2011). These ends are what allow for the linkage of amino acids, eventually creating the full polymer.

Several features of amino acids make them suitable building blocks for proteins. The most important is the diversity of their R groups. As seen in Figure 1, valine and cysteine have very different R groups; valine has a non-polar isopropyl side chain while cysteine has a sulfur-containing polar side chain. These differences contribute to various properties of proteins, such as a disulfide bridge from two cysteine residues increasing protein stability (Voet and Voet, 2011).

Since proteins are polymers of amino acids, it is useful to determine their order in the polypeptide chain. Several experimental techniques have been developed for determining the sequence of a protein. A classic example is Edman degradation. This process works by cleaving the N-terminal residue using Edman’s reagent (phenylisothiocyanate) and a strong anhydrous acid. The resulting amino acid derivative is then extracted and converted to a more stable form with aqueous acid. To find its identity, the amino acid is run through high performance liquid chromatography (HPLC). This process is repeated with the resulting polypeptide chain, removing one N-terminal residue at a time, eventually finding the identify of each amino acid. While this process works well for small chains, modern approaches use high-throughput methods such as mass spectrometry to achieve higher speed and efficiency (Voet and Voet, 2011).

Secondary Structure

In 1961, Christian Anfinsen showed through a series of experiments that protein folding is solely dependent on the primary sequence. By denaturing and renaturing Ribonuclease A in isolation from any cellular constituents, he showed that the protein structure and function could be recovered. Since then, our understanding of how protein sequence affects protein structure has rapidly expanded (Prof. JS Lecture Slides).

The simplest level of this is secondary structure. While proteins have many atoms able to make bonds, only some of these conformations are allowed. This is due to steric clashes, unfavourable torsion angles, and electrostatic repulsion. These factors can be represented by a Ramachandran plot, which show the allowed torsion angles for amino acids. The plots reveal that there are two main secondary structures which are favourable in most polypeptides. These elements—alpha helices and beta pleated sheets—are formed through interactions between the backbone of different amino acids, primarily through hydrogen bonding (Branden and Tooze, 2009).

The alpha helix is formed through hydrogen bonding between the carboxyl oxygen of amino acid n and the amine hydrogen of amino acid n+4. The side chain of each residue extends away from the helix; due to steric clashing between side chains, some amino acids are more favourable in alpha helices (Branden and Tooze, 2009).

Hydrogen bonding also occurs in beta pleated sheets. However, since interactions occur between different strands, there is no standard distance between amino acids for the bonds. Finally, it should be noted that not all amino acids form secondary structures. Many proteins have loop regions, which connect secondary structures but do not have a defined shape. Furthermore, some proteins are also intrinsically disordered, meaning that they do not have a defined structure (Branden and Tooze, 2009).

Putting the Structure Together

Once the secondary structural elements have been formed, they can be grouped together into motifs, domains, globular proteins, and protein complexes. These can be referred to as supersecondary, tertiary, and quaternary protein structures.

Motifs are created by combining secondary structures together. Despite appearing commonly throughout the proteome, however, motifs cannot fold independently. Instead, they must be incorporated into domains. A domain is the smallest level of protein organization, and they can be combined to form larger proteins. Thus, proteins can be thought of as modular. The coordination of domain folding, side chain interactions, and metal ion binding creates the tertiary structure of a protein. In many cases, this is the final step for protein folding. However, other proteins form complexes with separate chains, undergo post translational modifications, and are tagged. This is considered the quaternary structure (Branden and Tooze, 2009).

Several methods exist to study these structures. The three most widely used protocols are X-ray crystallography, NMR spectroscopy, and electron microscopy. These methodologies are used in different contexts; X ray crystallography is good for precise mapping, NMR spectroscopy captures different conformations, and electron microscopy is good at capturing large complexes (Prof. JS Lecture Slides). Despite their differences, all three can produce atomic coordinates which are used to construct a protein model. These structures can be used to train machine learning models such as AlphaFold, which can predict protein structure computationally. This has enabled the structural predictions of entire proteomes (Prof. JS Lecture Slides).

Conclusion

Our understanding of protein structure has expanded rapidly over the past few years. This is the product of new techniques, theories, and experiments. Through these methods, the levels of protein folding—primary, secondary, tertiary, and quaternary—have unlocked new areas of structural biology. We can now map protein function to specific structural aspects, and vice versa. This has unlocked numerous advances in medicine, particularly through understanding drug-target interactions. By advancing our knowledge in how proteins fold, change, and interact, we can hopefully go much further.

References

Branden, Carl, and John Tooze. Introduction to Protein Structure. New York, Ny, Garland Pub, 2009.

Voet, Donald, and Judith G Voet. Biochemistry. 4th ed., Hoboken, N.J., John Wiley And Sons, 2011.

Professor Jason Schnell’s Lecture Slides