In this video, we’ll discuss the structure of the genome. We’ll learn what the terms gene and genome are referring to and what makes up the
genome of a eukaryote and prokaryote. We’ll also learn more about the purpose of coding and noncoding DNA and see how this
relates to eukaryotic and prokaryotic genomes.
As humans, we have it pretty good. We can walk upright, grab things with our opposable thumbs, and we’re pretty smart
too. We wouldn’t be where we are today without these features. It all comes down to our DNA, the genetic material found in most cells in the body
that give the instructions for making us who we are. The genome is all the genetic material of an organism. It’s the instruction manual for our biology and is not to be taken lightly. It can give us a long and healthy life or we can be affected by disease. So one of humankind’s great pursuits is to understand all of this a bit better, and
that’s the aim of the Human Genome Project.
The Human Genome Project was an international project to sequence all the DNA inside
a human cell. It started in 1990, and the final draft was completed in 2003. Overall, the cost was estimated to be about 2.7 billion dollars. In 2016, the human genome can be sequenced for less than 1,500 dollars mostly because
of improvements in the sequencing technology. As of 2020, over 58,000 genomes have been sequenced. So all this talk about sequencing, what do we mean by sequencing exactly? If we were to take out the DNA from one of our cells, unpackage it, and stretch it
all out, it would be about two meters in length, conveniently about the size of our
human standing here.
If we zoom in, our DNA is arranged as a double helix as shown here, with nucleotides
or base pairs indicated as these colored boxes. If we zoom in further, we could see the chemical structure of DNA. Here, we could see the two strands of DNA and the nucleotide, the basic subunit of
DNA. Each nucleotide contains a nitrogenous base — guanine, cytosine, adenine, or thymine
— that can pair with its complementary base to form a base pair. So when we talk about sequencing the genome, what we mean is determining the sequence
or order of these nucleotides. So if we were to sequence this strand, we’d get GAG, then CGT, then CAT. The sequence is written down here. And since we know the sequence of one strand, because of complementary base-pairing
rules in DNA, we can determine the sequence on the other strand.
So here’s one base pair, and this sequence is one, two, three, four, five, six,
seven, eight, nine base pairs long. And here’s one nucleotide. And if we were to count them all up, there’s really 18 nucleotides on both
strands. But that’s confusing because there’s nine base pairs. So when we’re talking about the size of a sequence in nucleotides, we’ll just look at
one strand. That way, we can say that this sequence is nine nucleotides long, which is the same
number of base pairs, and there’s no confusion. So with the Human Genome Project, we determined that the sequence of the human genome
was about three to 3.2 billion nucleotides long. This is a massive number. If it takes one second to read one nucleotide, it would take about 100 years to read
out the sequence of the human genome.
So now, you might appreciate why it took the whole world over 10 years to sequence
the human genome. It’s pretty big. Besides the size of the genome, we also learned more about the number of genes. So using our same human-sized DNA molecule on the left here, let’s zoom in once
again. So here’s a segment of DNA with a gene in the middle. This gene can have the traditional function of coding for a protein with mRNA being
produced. But a gene can also be used to produce a noncoding RNA molecule that isn’t translated
into a protein. There’s different types of these, and they often have specific functions, some of
which are involved in regulating gene expression, for example. We’ll touch more on this later.
So the definition of a gene has changed over the years from something that coded for
a protein to being more general to account for these noncoding RNA genes. A good definition is that a gene is a section of DNA that contains the information
needed to produce a functional unit like a protein or a noncoding RNA molecule with
some regulatory role, for example. Now that we understand more about what a gene and genome are, next we’ll be
discussing the characteristics of eukaryotic and prokaryotic genomes. Let’s start with the eukaryotic genome.
Earlier, we mentioned that there was about two meters of DNA inside most cells in the
human body represented here as this thin black line. So it appears that this is one long continuous molecule of DNA, but this isn’t the
case. There’s actually 46 molecules of DNA in most cells that are packed really tightly in
the structures called chromosomes.
We have 46 chromosomes, and they’re numbered based on their size, with chromosome one
being larger than chromosome two, which is larger than chromosome three, and so
on. There’s also the sex chromosomes, X and Y, which determine our biological sex. You’ll also notice that there’s two chromosomes for each chromosome number, one in
blue and one in pink. We have two copies of each chromosome, one from our biological mother and one from
our biological father. Because we have two chromosomes, we’re called diploid.
A good point to mention now is that genomes are often represented as haploid, so
having only one set of chromosomes. So the 3.0 to 3.2 billion nucleotides in the human genome is really the size of the
haploid genome. Most of our cells are diploid. So these were to contain 6.0 to 6.4 billion nucleotides. We reduce the genome to haploid because it represents the complete set of genes that
exist in humans. In addition, it helps with consistency because many eukaryotes, in particular plants,
have variable copies of chromosomes. The strawberry plant is octoploid and has eight copies of its chromosomes.
As we mentioned, DNA is compacted to form these chromosomes. What does this look like exactly? Unraveling a bit of this chromosome and we see that DNA is wound up tightly around
special proteins called histones. These help compact the over three billion nucleotides of DNA in humans. Contained within this DNA are genes. In the human genome, there’s between 20,000 to 25,000 protein-coding genes with an
average size of about 10,000 nucleotides. Multiplying these numbers together gives us about 200 million to 250 million
nucleotides, which is considerably less than the over three billion nucleotides in
the genome. This is mostly due to noncoding DNA.
So if we change perspectives a bit and look at our DNA stretched out like this, we
can see that there aren’t too many protein-coding genes in our DNA. In fact, about 99 percent of our genome is made up of noncoding DNA. When it was first discovered, many thought it had no function, and because of this,
it was called junk DNA. But there’s nothing junkie about it. We now know that noncoding DNA has multiple functions and can make a variety of
noncoding RNA molecules, one of which is structural RNA, an example of which is
ribosomal or rRNA, which is a critical component of the ribosome, the organelle that
translates proteins from an mRNA transcript.
Noncoding DNA can also make regulatory RNA like microRNA, which play an important
role in downregulating, or turning down, the expression of certain genes. Numerous microRNAs are thought to be involved in the disease Alzheimer’s. Approximately two-thirds of the human genome is made up of repetitive DNA, where
sequences of nucleotides are repeated over and over again. An extreme example of this is in the fruit fly, where the sequence AGAAG is repeated
about 100,000 times.
Speaking of the fruit fly genome, let’s take a moment to look at some of its
characteristics and some other eukaryotes as well. The fruit fly genome is about 170 million nucleotides, and it contains about 14,000
protein-coding genes in contrast to humans, which have a genome of about three to
3.2 billion nucleotides and 20 to 25,000 genes. You’ll notice that even though our genome is about 20 times the size of the fruit fly
genome, we don’t have 20 times the number of genes. And if you look at the rice plant with a genome size of 470 million, which is roughly
10 times smaller than our genome, rice is about twice the number of protein-coding
So there really is no correlation between the size of the genome and the number of
protein-coding genes. If there was, then we might expect to have more genes than the rice genome because
ours is larger. Another interesting point to make is when we look at the tiny protozoan
Trichomonas vaginalis. This organism has a tiny genome comparable to the fruit fly genome yet has an
astounding 60,000 genes. We can see here that the complexity of an organism doesn’t necessarily correlate with
the number of protein-coding genes, as this relatively simple protozoan has nearly
three times the number of genes compared to humans.
Now that we’ve looked at the eukaryotic genome in some detail, let’s turn our
attention to the prokaryotic genome. Prokaryotes often have their DNA contained within a single circular chromosome. Escherichia coli, or E. coli, has a genome size of about 4.6 million
nucleotides, which is about 100 times smaller than our genome. E. coli also has about 4,400 protein-coding genes, which is only about one-fifth of
our genome. That’s a lot of genes in a small amount of space. The genomes of prokaryotes is often packed with genes with very few gaps between
adjacent genes. In some cases, the genes are so densely packed that only a single nucleotide
separates the two genes. So in contrast to eukaryotes, which are made up largely of noncoding DNA, the genomes
of prokaryotes are made up mostly of coding DNA.
Now let’s apply what we’ve learned and look at a practice question.
What is the correlation between the complexity of an organism and the number of
protein-coding genes it contains?
Before we answer this question, let’s look at a few key terms. A gene is a section of DNA that produces a functional unit. So if we had a section of DNA with two genes on it, one of them might produce a
protein — like insulin, for example, that’s involved in regulating blood sugar
levels — or a functional RNA molecule, like a special type of RNA called microRNA
that’s involved in regulating gene expression. So there’re really two types of genes. Protein-coding genes code for proteins, while genes that don’t code for proteins but
instead produce an RNA molecule’s function are called noncoding genes.
The genome is the complete set of genetic material of an organism. And by studying the genomes of different organisms, for example, the protozoan
Trichomonas vaginalis, the fruit fly, and a human, we’ve
determined the number of protein-coding genes that the organism contains. Trichomonas vaginalis has about 60,000 protein-coding genes, the
fruit fly has about 14,000, and humans have somewhere between 20 and 25,000
Since this question is asking us about the complexity of organisms relative to the
number of protein-coding genes, let’s rank them. We can say that Trichomonas vaginalis is the least complex because it’s only a single
cell, whereas humans are the most complex because they’re multicellular, have more
tissue types and a more advanced nervous system than our fruit fly. So our relative complexity can look something like this. This question is asking us about the correlation, or relationship, between the
complexity of an organism and the number of protein-coding genes it contains. So as complexity increases, what can we say about the number of protein-coding
Well, if we look at the fruit fly and the human, we can see that as complexity
increases, so do the number of protein-coding genes. But if we look at Trichomonas vaginalis and the fruit fly, we see that the opposite
is true. And as complexity increases, the number of protein-coding genes decrease. So we see no consistent relationship between organisms, which means there is no
correlation between complexity and the number of protein-coding genes.
Now, let’s go over some of the key points that we learned in this video. The genome is the complete set of genetic material in an organism. Contained in the genome are numerous genes. A gene is a segment of DNA that produces a functional unit, such as a protein or a
functional RNA. Eukaryotic genomes are made up of coding DNA, which consists of protein-coding genes,
and noncoding DNA, which contains DNA that does not code for proteins. Most DNA in humans is noncoding and is made up of repeating sequences. Eukaryotes often have larger genomes compared to prokaryotes, but they’re often less
densely packed with genes.