Subject Filter

A scientist's take on the Game of Kings
| Chess Puzzles | Book Reviews | | Annotated Games | Opening Analysis | Science | First Time Here?

Friday, September 28, 2012

Molecular Biology Primer: The Central Dogma

The science of molecular biology is found in the latest cancer treatment, behind the development of antiviral and antibiotic drugs, and even in today's superhero movies (where the hero's DNA is altered through a lab experiment, spider bite, etc). But just what the heck is molecular biology? Do you need a degree in biology or chemistry to understand and appreciate what occurs on the smallest scale in every living organism?

This is part of a primer written for readers with very little knowledge of biology. I have compiled this brief explanation of biological information, as well as some resources for further studies, so that such readers will still be able to access and digest any future posts with commentary on scientific topics.

In the rest of the post, I hope to accomplish a few things. First, through a musical analogy of the journey from sheet music to sound, you will learn the so called 'central dogma' of molecular biology, the flow of information which is at the very heart of life. This will lead into an explanation of how this information is ultimately expressed in the form of the function of proteins, how this expression is controlled, and some of the consequences of protein function.

My depiction of the central dogma and the flow of information from genetic material to protein function. For the aficionado, I attempted to make this as accurate as possible: the mRNA sequence translates to the protein shown, and the protein is folded in order to shield the hydrophobic residues A, V, and L.

In a future post, I will describe in more detail some of the methods researchers use to study molecular biology. As a teaser, I highlight here some typical questions posed by molecular biologists. (Note: these questions are probably more typical of academic scientists. This is another product of that axiom, 'write what you know'). Finally, I will provide some resources for further study (in particular, check out of the videos from the Dolan DNA Learning Center. They are really cool!).

The Flow of Musical Information

The composition and playing of music a succinct analogy to the flow and expression of information in a biological system. No analogy is perfect, and the reader must keep in mind that molecular biology is a complex science with many nuances that are beyond the scope of this article, which only touches upon the core ideas.

Take any composition by any artist; in this example, we can use sheet music for multiple instruments in a piece by Beethoven.

The original documents that were composed by Beethoven contain the information which will eventually produce music. However, in order for more than one person to play at a time, and to help preserve the integrity of the original, copies are made and distributed to musicians. After all, while being used in a performance, the sheet music may be damaged or  marked.

The sheet music contains information in a coded form. In the case of the piano, the code are the position of the notes on the staff. The sheet music alone is incapable of producing music, and to somebody that cannot read it, it is (to the untrained eye) incomprehensible and impossible to determine the sound that it will generate.

The sheet music can be copied, or transcribed, onto new sheets of paper, as mentioned earlier. However, a trained musician and an instrument is required to bring the sheet music to life and produce sound. The musician must translate the code of note character, position, and spacing on a staff into the code of a specific sequence and timing of keystrokes on a piano (or any other instrument: for a guitar, it is not keystrokes but rather finger positioning).

This sequence of keystrokes is closer to the sound of the music than is the sheet music, but the keystrokes alone are not sufficient for the production of the sound. The keystrokes determine how the instrument will behave, and this finally produces the music. The sound depends both on the information applied by the musician, the accuracy of his/her execution of this task, as well as the state of the instrument (for example, how well it is tuned).

To summarize, the sound that is finally produced was originally specified by the sheet music, transcribed so that it can be translated by a musician into keystrokes on a piano, which was major determinant of the sound the instrument made. The sound is the ultimate function of the sheet music; it is the expression of the information contained within it. The composition itself has little significance until it is turned into music that an audience can enjoy.

The Flow of Biological Information

All of the steps above can serve as an analogy for what is known as the central dogma in molecular biology. This is the flow of information from DNA, to RNA, to Protein. Whereas the explanation of how music is generated will be readily accessible to most people, some readers may not be familiar with the nature of these three types of molecules (or may have forgotten whatever bit about them was learned in high school). 

DNA, RNA and Protein are all classified as macromolecules or polymers. Jargon aside, this simply means that they are long strings of connected building blocks put together in a particular sequence. Other strings of simple characters or entities put into a longer sequence include letters in the words of the English language, or the notes in a piece of sheet music, or even the individual keystrokes that comprise a music performance. DNA and RNA are chemically similar, and are both built from a simple alphabet or code of 4 chemical building blocks known as nucleotides (They share three of the same nucleotides, which are abbreviated A, C, and G. They differ in the last character: a T in DNA is equivalent to a U in RNA. The distinction between these chemicals is outside the scope of this article). Protein molecules are built from a large alphabet of twenty different chemical building blocks known as amino acids

DNA is similar to the Beethoven's original copy of a performance. It is the genetic material, and the information contained within is preserved for future generations (It is copied into new molecules with high fidelity and accuracy, just as care would be taken when making a copy of an original Beethoven work). All living organisms (except for some viruses, which are arguably not living) utilize DNA to store information for all the tasks that occur during their lifetime. This is usually referred to as the chromosome, and is similar to an entire book of sheet music, containing parts for multiple different instruments to be played together.

The information in DNA is transcribed into RNA before it is utilized, much in the same way that the original copy of a Beethoven work (or any other master copy) is not provided directly to the musician. The information in DNA (the sequence of A, C, G, or T) is copied into RNA, in the same sequence (Substituting U for T). I'll come back to how this is actually accomplished later (through the actions of something called RNA Polymerase), but for now you can imagine a molecular sized Xerox machine.

The information and code in RNA, like the notes in sheet music, does not serve any function by itself (there some exceptions to this that I will ignore here). This information needs to be translated into another code. Notes on the staff became keystrokes due to the actions of a musician, and the code in RNA is translated through a complex process into the sequence of amino acids in a protein.

This process of RNA directed protein synthesis, aptly dubbed translation, is central to all life. Just as sheet music is almost useless without a musician and an instrument, so too is the genetic material in DNA and RNA without function unless it can be translated into proteins. Proteins are, in some ways, both the keystrokes and the sound at once. They are the music of life.


Piano keystrokes only produce music because of the way they effect the instrument. The keystrokes of a protein molecule, that is, their amino acid sequence, exert their effect through directing the folding of this string of building blocks into a specific three dimensional shape (See the diagram of the central dogma at the top of this article for a better idea of what this means.) This shape will allow the protein to carry out some function or serve a specific purpose. The variety of functions carried out by these proteins are vast. For example, Hemoglobin carries oxygen in your blood, Antibodies specifically recognize other proteins and molecules associated with infection, and Keratin provides support in skin and hair. Proteins in your nerve cells are responsible for relaying the charges between different neurons. Proteins are also capable of catalyzing chemical reactions; such proteins are usually referred to as enzymes. Enzymes are responsible for almost all the metabolism, and some are even responsible for carrying out the transfer of information outlined above.

A model representative of a particular group of proteins involved in inflammation (among other things). Notice that they form a definite shape, and it is this shape which in part allows them to carry out their function. From C-reactive protein in the spotlight @

The importance of proteins cannot be understated. It is not taking that great of a liberty to generalize and say that anything of consequence that occurs in life (in your body, or any other organism) is happening because of particular proteins carrying out their function (or failing to do so properly, in the case of disease).

Algorithms in Chess and in Cells

The scenarios thus far, both musical and biological, have been relatively simple. One composition directed a single musician to produce a solitary melody and sound, just as one portion of the genetic code resulted in a single protein with one function. However, the complexity of life for all organisms demands a much greater set of instructions. In fact, what is required for even the simplest organisms is an orchestra vastly bigger than any group that has ever played (to my knowledge). To produce harmonious sound, such an orchestra requires one or more conductors to ensure that particular musicians play only at the appropriate time. 

The molecular orchestra that occurs in every living cell also requires selective performance from separate musicians. The control of when certain proteins are created, how long they are around, and to what extent they carry out their function (you can think of this as playing the same melody either softly or loudly), is collectively known as the regulation of gene expression. Various control mechanisms (a mind boggling amount, at times) exist within a cell to properly regulate the expression of particular proteins, ensuring that the 'sound' they produce is played at the right moment.

Even with access to all the sheet music for individual parts, it is very difficult to properly imagine or predict what all the instruments will sound like when played together. This difficulty grows with the size of the orchestra; for the workings of the cell, or even a whole organism, it is not possible to model or predict the resulting sound with great precision if all you are provided with is the sheet music (That is to say, you cannot predict to the finest detail the inner workings of the cell if all you are provided is the genetic material in DNA). 

This inability to determine the inner workings of the cell using just the genetic information, however, is not only a technological limitation that scientists face (a limit and boundary which is challenged everyday by scientists worldwide). The almost impossible complexity of life seems paradoxical in the face of the relatively limited amount of information stored in the genetic material. For example, the lowly bacteria E. coli contains several million bases of DNA, encoding for several thousand proteins. But a complete physical description of even a single cell of E. coli is exceedingly more complex than is the information which specifies how it is composed.

To address this problem, I encourage the reader to temporarily forget the musical analogy which I have weaved into this primer. Two other analogies are helpful when concerning the level of determination in this biological information (that is to say, how much of the resulting product is exactly described in the original information). 

From Looks delicious!

The first analogy is that of the relationship between a recipe and a cake (This analogy is not my own; I think I read it in Matt Ridley's Genome, but cannot be sure.). The recipe, which contains the information, does not have a physical description of the resulting cake, but rather is a set of ingredients and steps that are needed to produce the cake. Biological information acts in a similar manner. The sequence, and to some extent even the shape, of the various proteins in a cell are specified in the genetic information; these are both the ingredients and the steps in the recipe. The cake in a biological system is the cell, and it results from the individual components (the proteins) interacting with themselves and the environment.

Logo of the Fritz chess software from Chessbase. Image from

Another analogy is that of a computer chess program. The evaluation function and algorithms that comprise the computer are similar to the genetic instructions and the resulting protein sequences. Knowing the evaluation function does allow you to predict the course and outcome of a chess game played by the program, since the moves they select will be determined both by their own algorithms and evaluations as well as the move choice of their opponent. The opponent provides context in which programming and algorithms of chess playing software is expressed. Much the same way, the environment (both surroundings inside and outside the cell) provides the context in which the genetic information in DNA is expressed. 

Putting the molecular orchestra under the microscope

Molecular biologists, and many other researchers in related fields (biochemistry, structural biology, etc) are usually trying to answer questions relating to the creation, function, and elimination of proteins that are implicated in some important process (or disease).

These questions include (but of course are not limited to)

  1. What protein is responsible for a particular activity or function in the cell (Which part of the sheet music corresponds to a particular sound heard)
  2. When is a particular protein made, and what governs the decision of wheter or not the cell will make or destroy that protein (i.e. When is that particular instrument played during the piece)
  3. What functions or importance does a particular protein have in the cell (How does the protein 'sound' when played, what parts of its sequence are important for the overall 'melody'*
  4. What is the structure of the protein, and how does this allow it to carry out its task

*There are several ways in which you can either predict or confirm the presence of a particular protein, without knowing what function it will carry out. The presence of the protein can be predicted based upon features in the DNA sequence (which can be determined in the laboratory; a vast amount of sequence data has already been determined for a variety of living things and is available online). Predicting the shape of a protein from this information alone is, at time of writing, still a very difficult task. Predicting how the shape and function of a single protein will interact and integrate into a system with thousands of other proteins, many of which are also of unknown shape, is nearly impossible (or at least, at time of writing, unattainable except at very crude approximations.)

Molecular machines serving information flow

The flow of biological information from DNA to RNA to protein requires both transcription or copying and translating from one type of code to another. Since the cell does not have nano scale xerox machines and pianos, it accomplishes this feat through the use of another molecular machine already familiar to the reader by now: proteins. A particular protein (or group of proteins, depending on the organism), known as RNA polymerase, has a sequence and a shape which endows it with the ability to 'read' the information in DNA and build a replica in RNA (DNA is duplicated for use in future generations by another enzyme, aptly named DNA polymerase). The translation of one code into another is a more complex process, and depends on a more complex set of machines. These include both proteins, and some specialized RNA molecules which also adopt specific three dimensional shapes.  This process is centered around the ribosome, which is a scaffold through which transfer RNAs match the code in the RNA to be translated with specific amino acids. 

Of course, disruption or defects in the molecular machines that are players in the flow of information in the cell will have many serious consequences. One bad note, or one unpleasant song make not ruin an entire performance, but if the instrument that is responsible for playing most or every piece is defective, there will be much more noticeable defects in the resulting sound. 

The astute reader may have noticed a type of chicken and the egg, or bootstrapping problem posed by this scheme: how can proteins and RNA be responsible for the synthesis of proteins and RNA? There have been many attempts to answer this question that have important ramifications in evolutionary biology, but this is outside the scope of this article.

I welcome both questions from the reader, if there is anything that I might have missed in this introduction, as well as comments. Are there analogies that you prefer when describing the central dogma of molecular biology? Please share them!

Further Reading

Wikibook introduction to Molecular Biology

A more comprehensive and more accurate introduction to molecular biology than the one penned here. Of course, these features also mean it is less colorful and concise then mine my primer.

Wikipedia entry on central dogma

Brief introduction to the central dogma of molecular biology, the flow of information from DNA to RNA to Protein. Also contains a bit of history on the phrase 'central dogma' used in this context.

A molecular biology glossary

A short glossary of molecular biology terms, with some illustrations

Glossary of Biochemistry and Molecular Biology

A longer glossary of terms

...saving the best for last!

The Dolan DNA Learning Center

Cartoon and 3D Animations
This is a great resource, which contains very detailed and fairly accurate representations of the processes of transcription and translation, as well as a treasure trove of other informative videos. I highly recommend that everybody check these videos out. Note: to the neophyte, I recommend first viewing the simple versions where possible. The advanced versions are much more accessible once the simple versions have been viewed.

3D Animation describing the central dogma

3D animation of transcription, simple version.

3D animation of translation, simple version.

(The results may surprise you...)

No comments:

Post a Comment