Cracking our code: The Human Genome Project

Allison Trouten
December 15, 2021
Abstract luminous DNA molecule, neon helix on color background.
Luminous DNA double helix on color background. Licensed from istockphoto.com

What makes us human?

Is it our ability to communicate through complex language? Our intelligent and problem-solving minds? Perhaps it is our ability to determine right from wrong?

For decades, scientists have aimed to understand the fundamentals of biology to answer this age-old question. In October of 1990, the scientific community announced their proposal to answer this question - that is, they sought to sequence, or read, the entire genetic code (DNA) of humans.

Thus was born the Human Genome project. It is one of the most profound scientific endeavors ever attempted. Taking on a project this colossal required the collaboration of scientists from research institutions, encompassing dozens of labs in the U.S. and around the world, and the federal government, through funding by the National Institute of Health and Department of Defense.

How did scientists even begin this massive undertaking?

First, researchers needed to collect the DNA of volunteers via blood draws. DNA, or deoxyribonucleic acid, is the book of instructions that serves as the manual for our cells to function properly. Sequencing the DNA allows scientists to decode this manual, letter by letter, until the entire book - that is, the full genome - is read.

Sequencing the genome is accomplished by breaking the words of multiple copies of the manual into strings of letters, fragmenting the words at random places. Piece by piece, the book is reassembled by identifying overlaps until the entire book is completed.

"Another interesting finding from the project was that the genome contained a relatively small number of genes, roughly 30,000, compared to previous estimations of over 100,000."
-- Allison Trouten

The Human Genome project was completed in April of 2003 and reliably sequenced 99% of the genome with 99.99% accuracy. That means that almost the entirety of the 3 billion base pairs of the human genome were sequenced with almost perfect accuracy. Moreover, the venture far exceeded its goals in terms of resolution, amount of the genome covered, SNPs mapped, and cost – undeniably making it one of the most successful studies attempted.

Interestingly, when scientists began reading the genome, they discovered subtle variations in the basic manual. These differences could lead to positive changes, negative changes, or no changes at all. Scientifically, these small alterations to the DNA are called single nucleotide polymorphisms (SNPs) and have huge implications in human disease. For example, cystic fibrosis is caused by a mutation in a single gene – the cystic fibrosis transmembrane conductance regulator (CFTR) gene – which leads to a variety of severe symptoms including mucus overproduction, chronic cough and repeated lung infections.

In addition to the human genome, the mouse and rat genomes, among other model organisms, were constructed. Another interesting finding from the project was that the genome contained a relatively small number of genes, roughly 30,000, compared to previous estimations of over 100,000. Moreover, after also sequencing the mouse, rat and other model organisms’ genomes, the project determined that the human genome is very similar to other mammalian organisms and not as unique as we once thought. In fact, the human and mouse have about 85% shared sequence identity.

The contribution of this work to the scientific community and beyond is hard to appraise. The sequencing of the human genome, along with key model organisms, has supported more in-depth assessment of these model organisms and how well they can recapitulate human disease. The sequencing and annotation of the human genome has allowed researchers to home in on disease-causing SNPs, thus aiding in the development of possible future treatments. The publicly available consensus sequence was critical for the “omics” explosion – the analysis of large amounts of data regarding DNA, RNA or proteins. Clinically, the availability of a comparative reference sequence allowed clinicians to confirm patient diagnoses for diseases with known gene mutations, as well as identify new mutations associated with disease. 

"In fact, the human and mouse have about 85% shared sequence identity."

-- Allison Trouten

Although the project was officially completed in April 2003, there are continued tweaks, adjustments and revisions underway. The National Human Genome Research Institute (NHGRI) has new questions to answer. First, the human genome consensus sequence is constantly being refined and improved as better sequencing technology becomes available. Moreover, the NHGRI aims to sequence more volunteers to gain a better understanding of variations found in certain populations.

The NHGRI is also constantly developing improved computational methods to analyze enormous sequencing datasets in the most efficient way. Importantly, while the availability of this knowledge has the potential to bring about more scientific breakthroughs, it also has the potential to be misused. NHGRI’s Ethical, Legal and Social Implications Research Program was created to look into the impact of this research beyond the laboratory and ensure that it is appropriately and ethically used.

The Human Genome Project is without a doubt one of the most significant and far-reaching contributions to the scientific community, with applications in virtually every field. With continued efforts to refine and expand this work, the Project will surely continue to improve our understanding of ourselves, and maybe allowing us to finally answer the question “What makes us human?”