Subject Filter

A scientist's take on the Game of Kings
| Chess Puzzles | Book Reviews | | Annotated Games | Opening Analysis | Science | First Time Here?

Wednesday, October 3, 2012

Molecular Biology Primer: Obtaining and Interpreting Data

As mentioned in the previous post in this series, molecular biologists are primarily concerned with studying the expression, structure, and function of proteins in the cell (sometimes this necessitates the study of the genetic material which encodes the protein, particular when the proteins expression is being questioned). 

How do scientists go about answering these questions? What are their tools of the trade, how are they used, and what are their limitations? Here I explore some of the broad limitations common to many techniques used in molecular biology. If you'd like to learn more about some of the more common techniques, I have provided at the end of the article short descriptions and ample links for more information.

I'd also like to hear from you! What experimental limitations have I omitted or underestimated? If you know of some cool new techniques, please help me grow the list I have compiled by adding a comment.

Detective Work

Conducting research in molecular biology is a bit like doing detective work. When investigating a crime, a detective may need to consider many potential suspects, victims and witnesses. The identity or presence of some of these people he/she must infer only from certain clues. In much the same way, the biologist can often identify and measure particular proteins or RNA molecules only through indirect methods or with the use of specialized equipment. Sometimes, these molecules are analyzed only through the clues they leave behind, traces of their existence. For example, sometimes enzymes are detected or studied based upon their activity, by measurement of the products of the reactions they catalyze. Finally, the problems of studying a particular suspect or molecule are compounded by all the other thousands of different types of proteins and RNA that exist in the cell. 

It is possible to study particular proteins in isolation in what are known as in vitro experiments. However, this usually does not yield as much relevant information as those done in vivo, or from living samples. The conditions that a protein molecule will face in those two situations are vasty different. Some proteins require certain amounts of salt and other molecules, or other proteins to interact with, before they function properly.

There are a wide variety of different techniques used to study nucleic acids and proteins, too many be covered here in great detail. I have provided a set of links for some popular, important, or useful techniques for the interested reader to pursue. For all of these techniques, from structure determination through crystallization to sub cellular localization with immunofluorescence, researchers must (or at least should) understand their limitations, so as to deter themselves from over interpreting the data.

The same points which are useful for the scientist to keep in mind are also useful for the informed reader that is trying to interpret a scientific publication. Below, I expand upon several common caveats for any experiment, namely the problems of manipulation and indirect measurements which result in what may only be a portion of a larger picture. It is not my intention to underestimate the power of science in generating useful knowledge. Rather, I only encourage the reader or budding scientist to keep an open, yet critical mind when considering the meaning of data obtained from experiments in molecular biology.

Many of the techniques involve indirect measurements.

Except in rare cases, most of what occurs in the cell cannot be directly visualized to the researcher. Scientists instead must rely on the read out from (usually) sophisticated instruments. Therefore, proper controls (samples which should give a known result if the experiment is performed correctly) and instrument calibration must be done. Otherwise, the results obtained may have more to do with the method of measurement or a glitch in the instrument than the true interactions in the cell. This is true even when the equipment used is rather simple, or when the technology relied upon is not mechanical but rather some biochemical read out (an example of this is a western blot, described in a section to follow, in which the presence of one protein is detected indirectly, through use of a series of other proteins previously manipulated by the researcher)

As an alternative to the use of specialized equipment or methods (which can be prohibitively expensive), data can occasionally be obtained through measuring certain the behaviors of a cell; how fast it grows, what shape and color it adopts, etc. This is similar to what is depicted in the above photograph; there, the quality of interest (height of the pole) is inferred through measurement of something else.

Even if the cell behavior being measured is thought to be entirely dependent on the activity of a single protein, this protein does not act in isolation but rather in the context of a complex network of interactions within the crowded environment inside the cell. It is possible that there are many intermediate steps between the activity of the protein of interest and the measurable phenotype presented by the cell. Therefore, you should always consider alternative explanations in which the effect of the protein is indirectly responsible for the results observed.

Manipulation of the system may alter the result or render the result irrelevant. 

In order to study a complex molecular process, scientists will sometimes manipulate the experimental system to make their task easier. An extreme example of this is when experiments are conducted in vitro, with isolated components and proteins mixed in a test tube. As mentioned above, this often will diminish the relevance of the finding, unless the in vitro findings are corroborated with similar results from an in vivo experiment. 

In more subtle cases, scientists alter the expression, sequence, or structure of a protein, even in living tissue. Again, guard must be taken during experimental design to prevent this from producing misleading results. Readers should be critical of experiments that contain too much manipulation or simplification of the conditions, as it is likely that the results do not accurately reflect what occurs in real situations.

Even when scientist endeavor not to manipulate the system they are studying, the process of taking the measurement may itself influence the results obtained. Trying to observe the life of an ant under a magnifying glass is pointless if the act of observation results in the insect's death, as is the case in the above photograph. 

Consider a more pertinent example: a researcher wants to determine what accessory proteins bind the ribosome, the main complex responsible for translation in all cells. While the cells used in a study may be grown in a natural, unaltered state, the conditions they are experience during release and purification of the ribosome may cause some accessory proteins to escape the researcher's detection. On top of this, other proteins which normally do not participate in meaningful interactions with the ribosome can give produce false positives if it is the conditions of lysis and purification that enable it to associate with the ribosome.

In order to correctly interpret any experiment, therefore, a knowledge of biochemistry and molecular biology must be harnesses to weigh the possibility of various alternative explanations. The more likely explanations should be further ruled out through additional experiments, in order to either validate or dismiss the original findings.

Snapshots only capture a part of the truth, and correct explanations may not be comprehensive. 

Certain techniques, while informative, may only provide a small window into the activity of a protein inside the cell. Certainly, as already mentioned, in vitro experiments may not reflect the true behavior of a protein in the cell. However, results even from in vivo experiments may only capture a part of the truth behind the function or characteristics of a protein. 

Cells are dynamic, and therefore measurements taken under only certain conditions, or only at certain times during the life of a cell, may not give a complete description as to the importance of a particular protein. Even when cells in a variety of states are measured together and averaged, phenomenon that occurs in a minority of cases may be lost. Examples of this include persister cells in E coli and stem cells in higher organisms. These cells may make up a fraction of a population, but behave differently, with different profiles of gene expression.

A vivid example of why averaging could pose a problem is shown in the picture of four different faces, averaged together (See figure above. This was generated from the Face Research website). The features of each face that are distinct are largely lost in the average.

The dynamics of a cell are driven by the immensely complex network of proteins that interacting with each other and catalyzing a wide variety of reactions. These proteins themselves sometime exhibit dynamic structures, and since structure in part determines their function, this further compounding the problem of understanding their function. Scientists have a variety of methods for studying protein structure, and can generate atomic level models for the way in which the primary sequence of amino acids in a protein fold into a three dimensional shape. These models, beyond being plagued by questions of relevance (since they are sometimes generated in vitro, with the protein in isolation), also do not preclude possibilities that alternative structures are adopted during the course of a protein's lifetime.

Whereas measuring the average of a population of cells or proteins has the potential to lose some information, taking snapshots of single protein structures also suffers from drawbacks. A single snapshot of either a face or a protein structure does not inform a scientist of the possible alternatives. If the only image of a cat you ever observed was the one pictured above, you might be very surprised to learn that these animals can move and take on many different poses and shapes.

When studying proteins or any other aspect of molecular biology it is necessary to get as many different pictures of your subject as possible. Ideally, through the efforts of one or more researchers, multiple averages and snapshots will be obtained for the activity and structure of a single protein.

To any aspiring molecular biologist or enthusiast, when you are designing experiments or deciphering the resulting data just remember that you might be looking at a shadow, burning ants, losing your facial features, or getting a cat wet!

Resources for Further Reading

Genetic Manipulation (DNA)

These links involve techniques used to study or otherwise manipulate genetic material, i.e. DNA. By modifying the DNA of a gene, researchers can effectively modify the RNA and Proteins that result from expression of the said gene.

Restriction Enyzmes and Cloning

Restrictions enzymes are proteins that are capable of cutting DNA molecules at specific sequences. They, together with DNA Ligase (which catalyzes an opposite reaction) enable researchers to cut and paste different sequences of DNA together.

  • The Dolan DNA Learning Center at Cold Spring Harbor Laboratory has a great animation on Cloning as well as a 3D animation of the actual action of a restriction enzyme and DNA Ligase
  • Wikipedia entry on Restriction Enzymes and on Ligation

Electrophoresis (Nucleic Acids)

Electrophoresis is a technique which separates nucleic acid molecules based upon size. It accomplishes this by forcing the molecules (via electric current) to move through an entangled molecular maze made from a gelatinous substance.

  • Wikipedia entry on Gel Electrophoresis in general and with nucleic acids in particular
  • Another excellent animation from the CSHL Dolan DNA Learning Center
  • A very detailed and straightforward video on how gels are made and run. A must for any budding molecular biologist or neophyte!

Polymerase Chain Reaction

PCR, or polymerase chain reaction, is a process which can be used to copy a specific portion of a DNA molecule. It can be adapted to a wide variety of uses, and is a major workhorse of any molecular biology laboratory.


The main method by which researchers can determine the sequence of a DNA molecule. Chain-termination technology (Sanger Method) was used to sequence the human genome, although new 'next generation' sequencing technologies are orders of magnitude more powerful.

Transformation, Electroporation and Transfection

What these three procedures have in common is their ultimate goal: to introduce foreign, often manipulated nucleic acids into a living cell. Each technique acts to make cells permissible to the uptake of these materials; most cells are not normally receptive to large molecules of DNA or RNA. Transformation is a term normally reserved for the introduction of foreign DNA into bacterial cells, while transfection is more often used for the analogous procedure for higher organisms (mouse and human cells). Electroporation can be used towards this end in both cases.

  • Wikipedia entry on transformation, covering both bacteria and higher organism
  • The Dolan animation covering steps of cloning, culminating in transformation of bacteria with the manipulated DNA
  • An animation covering many steps of the cloning process. Click through to reach a stage in which electroporation is used to transform bacteria with the plasmid DNA

Identifying and Measuring RNA

Measuring RNA levels can serve various purposes. It reports on the expression of a gene, and is weakly correlated with the amount of protein that is produced from the code in the RNA. Some RNAs have functions on their own, adopting three dimensional shapes similar to proteins.

Northern Blot

This method involves detection (usually after electrophoresis) of a specific RNA through a detectable probe RNA which recognizes its target based upon complementarity of the sequence.

  • Wikipedia entry on northern blots
  • A video which shows the progression of RNA from inside a bacterial cell, to electrophoresis, and finally to being subjected to northern blot analysis. Skip ahead to ~1:10 to see the concept behind the blotting


This adaptation of PCR involves first the conversion of RNA to DNA by the enzyme Reverse Transcriptase (so called because it catalysis the opposite reaction of RNA polymerase, which is required in transcription). This is followed by PCR, which can be done in a quantitative manner (to measure the initial level of the template DNA or, in this case, RNA).

RNA Structure Determination

Some sequences can cause RNA molecules to fold into three dimensional shapes, which can impart function to the RNA. Therefore, there is a need to both predict and empirically determine the structure of RNA. This is done with a variety of methods, including computational folding and chemical probing.

Identifying and Measuring Proteins

Proteins are responsible for most of the action that occurs within a cell, and so most studies in molecular biology culminate in identifying or measuring specific proteins.

Mass Spectometery

This is a complicated technique with many varieties. The main idea behind mass spectrometry is to measure the size of an ionized protein or protein fragment as it moves through a charged field. This technique allows size determination with such resolution and accuracy that it is often possible to determine the identity and sequence of a protein through this manner.

  • Wikipedia entry for Mass Spectometery(MS)
  • Wikipedia entry for MALDI-TOF, a type or adaptation of MS analysis.
  • Wikipedia entry for LC-MS, another way in which MS is utilized. This approach usually involves fragmentation of the protein to be studied.

Electrophoresis (SDS-PAGE)

Similar to electrophoresis of DNA, except the molecular maze is often composed of an acrylic based substance, rather than agarose (which is a sugar found in seaweed). A few other tricks are usually employed to get the desired result, which is separation of proteins based upon their size or shape.

Western Blotting

This is another workhorse method in a molecular biology laboratory (the other being PCR). Through this technique, researchers can detect the amount of a specific protein from a complex mixture. This technique relies on antibodies, which are themselves proteins. This amazing class of proteins are produced by the immune system of higher organisms (including ourselves), and have regions that vary, allowing shapes that can bind to almost anything, including other proteins. Each antibody will have a particular sequence and shape in the variable region, and thus will have a dedicated target.


This is conceptually similar to a western blot, except proteins are detected not on a membrane but rather within the context of the cell.

Protein Purification

In order to perform more detailed studies on a protein, such as 3D structure determination, it is often necessary for researchers to purify one specific protein out of the many that exist in the cell. There are a wide range of techniques that can be used to achieve this goal, many of which are conceptually similar.

  • A nice video from GE healthcare that walks through the purification of a single target protein using several different methods in sequence, some of which are described below.

Immoblized Metal Affinity Chromatography

This is a specialized version of the affinity chromatography described below, and relies on altered proteins that contain a sequence of six histidines in a row. This sequence can bind to nickel, and allows researchers a way to specifically capture these altered proteins.

Ammonium Sulfate Precipitation

Most proteins only adopt stable, soluble shapes under certain conditions. If there is too much, or too little salt, in the solution, the protein will no longer be soluble but instead will precipitate out of the solution. This can be exploited to remove unwanted proteins from a sample.

Ion Exchange and Affinity Chromatography

Both of these techniques rely on the affinity proteins will have to either a specific target, or to a charged chemical group (Ion exchange). This affinity will vary between proteins, and changing the conditions proteins experience while they are subjected to a target molecule on solid support will allow proteins to be separated based upon the strength of their interaction with the target.

Gel Filtration

This technique involves passing the protein sample through a type molecular maze not that dissimilar from agarose mesh used in electrophoresis. Although gel filtration does not utilize an electric current, both techniques effectively separate molecules based upon their size and shape.

Protein Structure Determination Techniques

There are a variety of techniques used to create models of a protein's structure. These models can be very detailed, although they must be taken with a grain of salt; they are just models, and should be used to generate a testable hypothesis as to how a protein carries out its function.

Cryo-Electron Microscopy 

In essence, this technique involves looking at many instances of a particular individual protein under a high powered microscope. The molecules will all be in different orientations, but computational averaging of each molecule's shape can produce a low resolution structural model. 

  • Wikipedia entry on Cryo-EM
  • A video introduction to Cryo-EM, without sound


The protein of interest is purified, and placed into conditions (usually empirically determined) that will cause it to crystalize (group together in an array). Each protein molecule in the crystal will adopt a similar shape, and when this crystal is bombarded with X-rays, the resulting diffraction pattern that is observed can be used to create a model of the protein's structure, usually with atomic level resolution. This technique is difficult, and involves treating the protein harshly in solution, so the relevance of the structural model to the protein's in vivo activity must be established.

Nuclear Magnetic Resonance 

This is a very complicated technique, which is most succinctly described as an extremely high powered MRI used for examination of proteins at an atomic level (not the most accurate description, but not a bad one with which to start).

Atomic Force Microscopy

This technique is in some ways the atomic scale equivalent of a blind person's walking stick. The shape of a molecule can be determined by the displacement effect it has on a molecular sized arm that passes over it.

Computational Predictions

Due to the expense associated with experimentally determining protein structure, methods have been developed (with moderate success) to generate structural models computationally, starting with the sequence of the protein. In other words, these computational methods try to recapitulate and model the folding process that a particular protein will undergo. Some of the methods are extended to predict the dynamic movements that a protein may exhibit during its lifetime.

No comments:

Post a Comment