More than a thousand researchers throughout the world contributed to the first project to sequence the human genome. Their task, which they successfully completed in 2003, was to decode the DNA from each of 23 pairs of human chromosomes. Two groups undertook this sequencing task. One large, multinational, publicly funded group adopted an approach called hierarchical sequencing. This group's work is better known as the Human Genome Project. The other group, at a privately funded company, took an approach called shotgun sequencing.
This animated tutorial describes the methods used by the two groups, but keep in mind that technologies have changed a lot since then. The Human Genome Project was relatively slow, expensive, and labor intensive. It took 13 years and $2.7 billion to sequence one genome! In contrast, at the time of this writing, current methods allow researchers to sequence a human genome in just a few days for several thousand dollars.
The following are just some of the interesting facts that we have learned about the human genome:
Of the 3.2 billion bp in the haploid human genome, an estimated 1.2 percent (about 21,000 genes) make up protein-coding regions.This was a surprise. Before sequencing began, humans were estimated to have 80,000–150,000 genes.
The average gene has 27,000 bp. Gene sizes vary greatly, from about 1,000 bp to 2.4 million bp. Variation in gene size was expected given that human proteins (and RNAs) vary in size, from 100 to about 5,000 amino acids per polypeptide chain.
Virtually all human genes have many introns.
About half of the genome is made up of transposons and other highly repetitive sequences.
When the genomes of two unrelated individuals are compared, most of the sequence—about 99.5 percent—is identical. Despite this apparent homogeneity, there are many differences, and as more genomes are sequenced, more variants are found. Current estimates suggest that each haploid genome contains about 3.3 million single nucleotide polymorphisms (SNPs), so these account for about one-fifth of the variation between two individuals. The remaining four-fifths are due to copy number variation: differences in sequence copy number that have arisen through chromosomal deletions, duplications, or translocations or through duplications caused by transposons.
Genes are not evenly distributed over the genome. Chromosome 19 is packed densely with genes, whereas chromosome 8 has long stretches without coding regions. The Y chromosome has the fewest genes (about 230), and chromosome 1 has the most (about 3,000).