In this explainer, we will learn how to describe the composition of eukaryotic and prokaryotic genomes.
The Human Genome Project was an international project to sequence all the DNA inside a human cell. It began in 1990 and the first draft was published in 2001 (with about of the genome’s 3.0–3.2 billion nucleotides sequenced). The final draft was completed in 2003. Overall, the cost is estimated to have been about 2.7 billion dollars.
In 2016, a human genome could be sequenced for less than $1 500!
Much of this discrepancy has to do with the development of new sequencing technology. DNA sequencing is the process of determining the order of nucleotides in DNA (guanine, cytosine, thymine, and adenine). Older technology was limited in sequencing short segments of DNA (250–500 nucleotides) and combining them by looking for overlapping sequences. The technology kept on continuously advancing, and in 2008, millions of nucleotides could be sequenced at once! Due to these advancements, as of 2020, the genomes of over 58 000 organisms have been sequenced.
So, what have we learned after sequencing all these genomes? Let’s first describe what the words gene and genomics are actually referring to.
A gene is a section of DNA that codes for a specific gene product. This could be a protein that has a specific function, like insulin that is involved in regulating blood sugar levels, or it could be RNA that has a specific function, like ribosomal RNA that is involved in assembling the ribosome, the site of protein synthesis.
A protein is a complex biological macromolecule, made up of amino acid monomers, that can have a wide variety of forms and functions.
A gene is a section of DNA that contains the information needed to produce a functional unit, for example, a protein.
The genome is the complete DNA of an organism, although there are nuances to the way the word is actually used. In some cases, we can refer to the genome of specific organelles, such as the mitochondria or chloroplast, both of which have their own DNA, or we can refer to the genome of a specific cell, like a tumor cell, which may be different from the genome of the cells in the tissue surrounding it. The genome of a virus may actually be RNA based rather than DNA based! Generally, and in the context of this explainer, a genome refers to all the genetic material of an organism.
The genome is all the genetic material of an organism.
Key Term: Organelle
Organelles are compartments within a cell that perform a specific function and are usually isolated from the rest of the cytoplasm by membranes.
So, what are some of the characteristics of a genome?
This depends on whether you are looking at a prokaryote or a eukaryote. You will recall that prokaryotes are organisms that are lacking membrane-bound organelles and a nucleus within their cells, while eukaryotes do have membrane-bound organelles and a nucleus in their cells. Their genomes are in many ways similar but there are some important differences. Let’s first look at the eukaryotic genome.
Definition: Prokaryotic Cell
A prokaryotic cell is a cell that lacks a nucleus and membrane-bound organelles.
Definition: Eukaryotic Cell
A eukaryotic cell is a cell that contains a membrane-bound nucleus and other membrane-bound organelles.
Eukaryotes have their DNA separated into individual chromosomes. Each chromosome contains a single, linear piece of DNA. In humans, there are 46 chromosomes that add up to about 6.0–6.4 billion nucleotides. This is double the original number of 3.0–3.2 billion because most cells in our bodies contain two sets of chromosomes. You can see these 46 human chromosomes below in Figure 1. Note that we have a pair of 22 chromosomes and then either two X chromosomes (female) or a single X and Y chromosome (male).
A chromosome is a long molecule of DNA and associated proteins which contains the genetic information of an organism in the form of genes.
So how many genes are contained in this massive genome?
In the early 1990s, scientists believed the human genome had as many as 100 000 protein coding genes. This was easily accepted considering how complex humans are. After publication of the human genome project, this number dropped to about 40 000. As the data was further refined and analyzed, by the 2010s, this number further dropped to about 20 000–25 000. The average size of a protein coding gene in humans is about 10 000 nucleotides.
Example 1: Defining Chromosomes and Genes
Which of the following correctly links genes and DNA?
- One molecule of DNA (chromosome) contains many genes.
- One gene contains many different DNA molecules (chromosomes).
A chromosome is a single molecule of DNA that is heavily compacted and contains many genes. A gene is a section of DNA that contains the information needed to produce a specific protein or RNA. For example, the gene for insulin codes for the insulin hormone that is involved in regulating blood sugar levels. There are many genes located on a single chromosome. You can see this in the image below.
Let’s look at the different answers to decide which one best links genes and DNA.
In A, “one molecule of DNA (chromosome) contains many genes” is accurate. A single molecule of DNA can contain many genes located throughout the DNA sequence.
In B, “one gene contains many different DNA molecules (chromosomes)” is not accurate. A gene is a sequence of DNA, and many genes can be on a single molecule of DNA. DNA can be compacted to form a chromosome.
Therefore, the answer that best links genes and DNA is A: “one molecule of DNA (chromosome) contains many genes.”
We know there are around 20 000–25 000 protein coding genes in the human genome with an average size of 10 000 nucleotides; this represents about 200 million nucleotides. If the human genome is 3.0–3.2 billion nucleotides long, what accounts for the other 2.8–3.0 billion nucleotides?
The answer is noncoding DNA! This is DNA that does not code for proteins. In fact, about of the human genome is made up of noncoding DNA. In the earlier days, it was sometimes called “junk DNA” because it did not seem to have any function. In some cases, this may be true, but in many cases “junk DNA” is not junk at all.
Key Term: Noncoding DNA (ncDNA)
Noncoding DNA refers to parts of an organism’s DNA that do not code for proteins.
Noncoding DNA can come in many varieties.
Noncoding DNA can make RNA that has a certain function but does not code for a protein. Some can be structural, for example, ribosomal RNA that can be used to build ribosomes for protein synthesis, while others can have a regulatory role, like microRNAs that can control how genes are expressed. These RNAs have an important role in the cell, and even though they do not code for proteins, the DNA from which they are derived is still considered genes. As of 2018, there are 17 000–22 000 predicted noncoding genes.
Approximately two-thirds of the human genome is repetitive DNA, where sequences of nucleotides are repeated over and over again. This can sometimes be called “satellite DNA.” Besides humans, other organisms also have repetitive DNA; for example, the sequence “AGAAG” is repeated about 100 000 times in a chromosome of the fruit fly! There are several possible reasons for this, such as when a piece of a chromosome is duplicated in error during DNA replication.
Key Term: Repetitive DNA
Repetitive DNA is repeating sequences of DNA that are found throughout the genome.
Example 2: Understanding the Definition of a Gene
What is the primary purpose of genes in a eukaryotic genome?
- To act as regulatory elements and control what sections of DNA are transcribed and translated
- To provide the instructions for making functional units like proteins or RNA molecules
- To comprise the majority of the DNA in the genome
- To provide long sections of noncoding DNA
A gene is a region of DNA that codes for a specific gene product. There are protein-coding genes like insulin that is involved in regulating blood sugar levels. There are also noncoding genes that do not code for proteins but instead produce an RNA molecule with a specific function. For example, ribosomal RNA is involved in assembling the ribosome (plays a role in protein synthesis), and microRNAs can be used to regulate gene expression.
The genome is the complete set of DNA of a particular organism. Eukaryotes, those organisms that have membrane-bound organelles and a nucleus to house their DNA, typically have a large amount of noncoding DNA in their genome. This is DNA that does not code for proteins but can still include genes that contain instructions for building functional RNA molecules. However, the greatest proportion of the genome is due to repetitive DNA. In humans, this accounts for as much as of the genome.
Let’s look at the different answers to choose the one that best describes what the purpose of a gene is in a eukaryotic genome.
The answer A, “to act as regulatory elements and control what sections of DNA are transcribed and translated,” is not correct. Although there are special DNA sequences that can control gene expression, called promoters and enhancers, for example, these act as binding sites for other proteins to affect their regulation. These are not genes, because a specific product is not formed.
The answer B, “to provide the instructions for making functional units like proteins or RNA molecules,” is correct. This is exactly what a gene does.
The answer C, “to comprise the majority of DNA in the genome,” is incorrect. In fact, the majority of DNA in the eukaryotic genome is made up of noncoding DNA. Some of this may be protein coding genes, but the majority of the genome is repetitive DNA.
The answer D, “to provide long sections of noncoding DNA,” is incorrect. Genes code for proteins or functional RNAs, so this is not an accurate statement.
Therefore, the correct answer is B: the primary purpose of genes in a eukaryotic genome is to provide the instructions for making functional units like proteins or RNA molecules.
Genome size varies between eukaryotes. This is summarized in Table 1 below. The smallest eukaryotic genome is less than 2.5 million nucleotides, while the largest is over 150 billion. There is no clear correlation between the size of the genome and the complexity of the organism. If there was, then the human genome would be much larger than that of the Japanese flower Paris japonica, which is 50 times the size of the human genome!
There is also no correlation between the number of protein-coding genes and the complexity of the organism. Triticum aestivum (wheat) has about 107 000 protein-coding genes, which is about 5 times the number of protein coding genes in humans. This is also summarized in Table 1 below.
It is important to note that the genome is very similar between individuals in the same species but is not exactly the same. There are minor differences in the DNA sequence.
Example 3: Understanding the Relationship between Gene Number and Organism Complexity
What is the correlation between the complexity of an organism and the number of protein-coding genes it contains?
A gene is a section of DNA that codes for a specific gene product, such as a protein or an RNA molecule. Protein-coding genes are genes that code for proteins, such as the gene for insulin that codes for the hormone insulin that is used to regulate blood sugar levels.
There are many different genes in a single organism. The complete set of DNA inside an organism is called a genome. The genome of humans, for example, contains approximately 3.2 billion nucleotides that contain about 20 000 protein-coding genes.
Since the development of DNA sequencing technology, many genomes for different organisms have been sequenced. Analysis of these genomes has also revealed the number of protein-coding genes inside the respective genomes.
What we have learned is that the complexity of an organism does not correlate with the number of protein-coding genes in the organism’s genome. For example, humans, arguably the most complex organisms on Earth, have approximately 20 000 protein-coding genes compared to wheat that has about 5 times this number. This large difference might be due to repeated gene duplication events in plants during the course of their evolution.
As shown in Table 1, the size of the genome does not necessarily correlate with the number of protein-coding genes. Some genomes are more densely packed with genes compared to others. See Figure 2 below to compare a 50,000-nucleotide section of DNA between yeast and humans. You can see that there are significantly fewer genes in the human DNA section. The percentage of genes in the human genome is lower than that of the yeast genome because there is a higher amount of noncoding DNA in the human genome.
The genomes of prokaryotes are even more densely packed!
Prokaryotic genomes are usually, but not always, smaller and more densely packed with genes than eukaryotic genomes. Most prokaryotes have their genome on a single, circular DNA molecule. E. coli has a genome of about 4.6 million nucleotides with 4 400 protein-coding genes. These genes are very densely packed and in some cases only a single nucleotide separates them! There are very few repetitive DNA sequences in prokaryotes compared to eukaryotes.
Let’s recap some of the key points we have covered in this explainer.
- A genome is the total DNA of an organism.
- In eukaryotes, the majority of the genome is noncoding and does not make proteins.
- The size of the genome, as well as the number of genes in an organism, does not correlate with its complexity.
- In prokaryotes, the majority of the genome is coding and makes proteins.