Video Transcript
In this video, we’ll discuss the
structure of the genome. We’ll learn what the terms gene and
genome are referring to and what makes up the genome of a eukaryote and
prokaryote. We’ll also learn more about the
purpose of coding and noncoding DNA and see how this relates to eukaryotic and
prokaryotic genomes.
As humans, we have it pretty
good. We can walk upright, grab things
with our opposable thumbs, and we’re pretty smart too. We wouldn’t be where we are today
without these features. It all comes down to our DNA, the
genetic material found in most cells in the body that give the instructions for
making us who we are. The genome is all the genetic
material of an organism. It’s the instruction manual for our
biology and is not to be taken lightly. It can give us a long and healthy
life or we can be affected by disease. So one of humankind’s great
pursuits is to understand all of this a bit better, and that’s the aim of the Human
Genome Project.
The Human Genome Project was an
international project to sequence all the DNA inside a human cell. It started in 1990, and the final
draft was completed in 2003. Overall, the cost was estimated to
be about 2.7 billion dollars. In 2016, the human genome can be
sequenced for less than 1,500 dollars mostly because of improvements in the
sequencing technology. As of 2020, over 58,000 genomes
have been sequenced. So all this talk about sequencing,
what do we mean by sequencing exactly? If we were to take out the DNA from
one of our cells, unpackage it, and stretch it all out, it would be about two meters
in length, conveniently about the size of our human standing here.
If we zoom in, our DNA is arranged
as a double helix as shown here, with nucleotides or base pairs indicated as these
colored boxes. If we zoom in further, we could see
the chemical structure of DNA. Here, we could see the two strands
of DNA and the nucleotide, the basic subunit of DNA. Each nucleotide contains a
nitrogenous base — guanine, cytosine, adenine, or thymine — that can pair with its
complementary base to form a base pair. So when we talk about sequencing
the genome, what we mean is determining the sequence or order of these
nucleotides. So if we were to sequence this
strand, we’d get GAG, then CGT, then CAT. The sequence is written down
here. And since we know the sequence of
one strand, because of complementary base-pairing rules in DNA, we can determine the
sequence on the other strand.
So here’s one base pair, and this
sequence is one, two, three, four, five, six, seven, eight, nine base pairs
long. And here’s one nucleotide. And if we were to count them all
up, there’s really 18 nucleotides on both strands. But that’s confusing because
there’s nine base pairs. So when we’re talking about the
size of a sequence in nucleotides, we’ll just look at one strand. That way, we can say that this
sequence is nine nucleotides long, which is the same number of base pairs, and
there’s no confusion. So with the Human Genome Project,
we determined that the sequence of the human genome was about three to 3.2 billion
nucleotides long. This is a massive number. If it takes one second to read one
nucleotide, it would take about 100 years to read out the sequence of the human
genome.
So now, you might appreciate why it
took the whole world over 10 years to sequence the human genome. It’s pretty big. Besides the size of the genome, we
also learned more about the number of genes. So using our same human-sized DNA
molecule on the left here, let’s zoom in once again. So here’s a segment of DNA with a
gene in the middle. This gene can have the traditional
function of coding for a protein with mRNA being produced. But a gene can also be used to
produce a noncoding RNA molecule that isn’t translated into a protein. There’s different types of these,
and they often have specific functions, some of which are involved in regulating
gene expression, for example. We’ll touch more on this later.
So the definition of a gene has
changed over the years from something that coded for a protein to being more general
to account for these noncoding RNA genes. A good definition is that a gene is
a section of DNA that contains the information needed to produce a functional unit
like a protein or a noncoding RNA molecule with some regulatory role, for
example. Now that we understand more about
what a gene and genome are, next we’ll be discussing the characteristics of
eukaryotic and prokaryotic genomes. Let’s start with the eukaryotic
genome.
Earlier, we mentioned that there
was about two meters of DNA inside most cells in the human body represented here as
this thin black line. So it appears that this is one long
continuous molecule of DNA, but this isn’t the case. There’s actually 46 molecules of
DNA in most cells that are packed really tightly in the structures called
chromosomes.
We have 46 chromosomes, and they’re
numbered based on their size, with chromosome one being larger than chromosome two,
which is larger than chromosome three, and so on. There’s also the sex chromosomes, X
and Y, which determine our biological sex. You’ll also notice that there’s two
chromosomes for each chromosome number, one in blue and one in pink. We have two copies of each
chromosome, one from our biological mother and one from our biological father. Because we have two chromosomes,
we’re called diploid.
A good point to mention now is that
genomes are often represented as haploid, so having only one set of chromosomes. So the 3.0 to 3.2 billion
nucleotides in the human genome is really the size of the haploid genome. Most of our cells are diploid. So these were to contain 6.0 to 6.4
billion nucleotides. We reduce the genome to haploid
because it represents the complete set of genes that exist in humans. In addition, it helps with
consistency because many eukaryotes, in particular plants, have variable copies of
chromosomes. The strawberry plant is octoploid
and has eight copies of its chromosomes.
As we mentioned, DNA is compacted
to form these chromosomes. What does this look like
exactly? Unraveling a bit of this chromosome
and we see that DNA is wound up tightly around special proteins called histones. These help compact the over three
billion nucleotides of DNA in humans. Contained within this DNA are
genes. In the human genome, there’s
between 20,000 to 25,000 protein-coding genes with an average size of about 10,000
nucleotides. Multiplying these numbers together
gives us about 200 million to 250 million nucleotides, which is considerably less
than the over three billion nucleotides in the genome. This is mostly due to noncoding
DNA.
So if we change perspectives a bit
and look at our DNA stretched out like this, we can see that there aren’t too many
protein-coding genes in our DNA. In fact, about 99 percent of our
genome is made up of noncoding DNA. When it was first discovered, many
thought it had no function, and because of this, it was called junk DNA. But there’s nothing junkie about
it. We now know that noncoding DNA has
multiple functions and can make a variety of noncoding RNA molecules, one of which
is structural RNA, an example of which is ribosomal or rRNA, which is a critical
component of the ribosome, the organelle that translates proteins from an mRNA
transcript.
Noncoding DNA can also make
regulatory RNA like microRNA, which play an important role in downregulating, or
turning down, the expression of certain genes. Numerous microRNAs are thought to
be involved in the disease Alzheimer’s. Approximately two-thirds of the
human genome is made up of repetitive DNA, where sequences of nucleotides are
repeated over and over again. An extreme example of this is in
the fruit fly, where the sequence AGAAG is repeated about 100,000 times.
Speaking of the fruit fly genome,
let’s take a moment to look at some of its characteristics and some other eukaryotes
as well. The fruit fly genome is about 170
million nucleotides, and it contains about 14,000 protein-coding genes in contrast
to humans, which have a genome of about three to 3.2 billion nucleotides and 20 to
25,000 genes. You’ll notice that even though our
genome is about 20 times the size of the fruit fly genome, we don’t have 20 times
the number of genes. And if you look at the rice plant
with a genome size of 470 million, which is roughly 10 times smaller than our
genome, rice is about twice the number of protein-coding genes.
So there really is no correlation
between the size of the genome and the number of protein-coding genes. If there was, then we might expect
to have more genes than the rice genome because ours is larger. Another interesting point to make
is when we look at the tiny protozoan Trichomonas vaginalis. This organism has a tiny genome
comparable to the fruit fly genome yet has an astounding 60,000 genes. We can see here that the complexity
of an organism doesn’t necessarily correlate with the number of protein-coding
genes, as this relatively simple protozoan has nearly three times the number of
genes compared to humans.
Now that we’ve looked at the
eukaryotic genome in some detail, let’s turn our attention to the prokaryotic
genome. Prokaryotes often have their DNA
contained within a single circular chromosome. Escherichia coli, or E.
coli, has a genome size of about 4.6 million nucleotides, which is about 100
times smaller than our genome. E. coli also has about 4,400
protein-coding genes, which is only about one-fifth of our genome. That’s a lot of genes in a small
amount of space. The genomes of prokaryotes is often
packed with genes with very few gaps between adjacent genes. In some cases, the genes are so
densely packed that only a single nucleotide separates the two genes. So in contrast to eukaryotes, which
are made up largely of noncoding DNA, the genomes of prokaryotes are made up mostly
of coding DNA.
Now let’s apply what we’ve learned
and look at a practice question.
What is the correlation between the
complexity of an organism and the number of protein-coding genes it contains?
Before we answer this question,
let’s look at a few key terms. A gene is a section of DNA that
produces a functional unit. So if we had a section of DNA with
two genes on it, one of them might produce a protein — like insulin, for example,
that’s involved in regulating blood sugar levels — or a functional RNA molecule,
like a special type of RNA called microRNA that’s involved in regulating gene
expression. So there’re really two types of
genes. Protein-coding genes code for
proteins, while genes that don’t code for proteins but instead produce an RNA
molecule’s function are called noncoding genes.
The genome is the complete set of
genetic material of an organism. And by studying the genomes of
different organisms, for example, the protozoan Trichomonas vaginalis, the
fruit fly, and a human, we’ve determined the number of protein-coding genes that the
organism contains. Trichomonas vaginalis has
about 60,000 protein-coding genes, the fruit fly has about 14,000, and humans have
somewhere between 20 and 25,000 protein-coding genes.
Since this question is asking us
about the complexity of organisms relative to the number of protein-coding genes,
let’s rank them. We can say that Trichomonas
vaginalis is the least complex because it’s only a single cell, whereas humans are
the most complex because they’re multicellular, have more tissue types and a more
advanced nervous system than our fruit fly. So our relative complexity can look
something like this. This question is asking us about
the correlation, or relationship, between the complexity of an organism and the
number of protein-coding genes it contains. So as complexity increases, what
can we say about the number of protein-coding genes?
Well, if we look at the fruit fly
and the human, we can see that as complexity increases, so do the number of
protein-coding genes. But if we look at Trichomonas
vaginalis and the fruit fly, we see that the opposite is true. And as complexity increases, the
number of protein-coding genes decrease. So we see no consistent
relationship between organisms, which means there is no correlation between
complexity and the number of protein-coding genes.
Now, let’s go over some of the key
points that we learned in this video. The genome is the complete set of
genetic material in an organism. Contained in the genome are
numerous genes. A gene is a segment of DNA that
produces a functional unit, such as a protein or a functional RNA. Eukaryotic genomes are made up of
coding DNA, which consists of protein-coding genes, and noncoding DNA, which
contains DNA that does not code for proteins. Most DNA in humans is noncoding and
is made up of repeating sequences. Eukaryotes often have larger
genomes compared to prokaryotes, but they’re often less densely packed with
genes.