Lesson Video: Structure of the Genome Biology

Start Practising

In this video, we will learn how to describe the composition of eukaryotic and prokaryotic genomes.

13:54

Video Transcript

In this video, we’ll discuss the structure of the genome. We’ll learn what the terms gene and genome are referring to and what makes up the genome of a eukaryote and prokaryote. We’ll also learn more about the purpose of coding and noncoding DNA and see how this relates to eukaryotic and prokaryotic genomes.

As humans, we have it pretty good. We can walk upright, grab things with our opposable thumbs, and we’re pretty smart too. We wouldn’t be where we are today without these features. It all comes down to our DNA, the genetic material found in most cells in the body that give the instructions for making us who we are. The genome is all the genetic material of an organism. It’s the instruction manual for our biology and is not to be taken lightly. It can give us a long and healthy life or we can be affected by disease. So one of humankind’s great pursuits is to understand all of this a bit better, and that’s the aim of the Human Genome Project.

The Human Genome Project was an international project to sequence all the DNA inside a human cell. It started in 1990, and the final draft was completed in 2003. Overall, the cost was estimated to be about 2.7 billion dollars. In 2016, the human genome can be sequenced for less than 1,500 dollars mostly because of improvements in the sequencing technology. As of 2020, over 58,000 genomes have been sequenced. So all this talk about sequencing, what do we mean by sequencing exactly? If we were to take out the DNA from one of our cells, unpackage it, and stretch it all out, it would be about two meters in length, conveniently about the size of our human standing here.

If we zoom in, our DNA is arranged as a double helix as shown here, with nucleotides or base pairs indicated as these colored boxes. If we zoom in further, we could see the chemical structure of DNA. Here, we could see the two strands of DNA and the nucleotide, the basic subunit of DNA. Each nucleotide contains a nitrogenous base — guanine, cytosine, adenine, or thymine — that can pair with its complementary base to form a base pair. So when we talk about sequencing the genome, what we mean is determining the sequence or order of these nucleotides. So if we were to sequence this strand, we’d get GAG, then CGT, then CAT. The sequence is written down here. And since we know the sequence of one strand, because of complementary base-pairing rules in DNA, we can determine the sequence on the other strand.

So here’s one base pair, and this sequence is one, two, three, four, five, six, seven, eight, nine base pairs long. And here’s one nucleotide. And if we were to count them all up, there’s really 18 nucleotides on both strands. But that’s confusing because there’s nine base pairs. So when we’re talking about the size of a sequence in nucleotides, we’ll just look at one strand. That way, we can say that this sequence is nine nucleotides long, which is the same number of base pairs, and there’s no confusion. So with the Human Genome Project, we determined that the sequence of the human genome was about three to 3.2 billion nucleotides long. This is a massive number. If it takes one second to read one nucleotide, it would take about 100 years to read out the sequence of the human genome.

So now, you might appreciate why it took the whole world over 10 years to sequence the human genome. It’s pretty big. Besides the size of the genome, we also learned more about the number of genes. So using our same human-sized DNA molecule on the left here, let’s zoom in once again. So here’s a segment of DNA with a gene in the middle. This gene can have the traditional function of coding for a protein with mRNA being produced. But a gene can also be used to produce a noncoding RNA molecule that isn’t translated into a protein. There’s different types of these, and they often have specific functions, some of which are involved in regulating gene expression, for example. We’ll touch more on this later.

So the definition of a gene has changed over the years from something that coded for a protein to being more general to account for these noncoding RNA genes. A good definition is that a gene is a section of DNA that contains the information needed to produce a functional unit like a protein or a noncoding RNA molecule with some regulatory role, for example. Now that we understand more about what a gene and genome are, next we’ll be discussing the characteristics of eukaryotic and prokaryotic genomes. Let’s start with the eukaryotic genome.

Earlier, we mentioned that there was about two meters of DNA inside most cells in the human body represented here as this thin black line. So it appears that this is one long continuous molecule of DNA, but this isn’t the case. There’s actually 46 molecules of DNA in most cells that are packed really tightly in the structures called chromosomes.

We have 46 chromosomes, and they’re numbered based on their size, with chromosome one being larger than chromosome two, which is larger than chromosome three, and so on. There’s also the sex chromosomes, X and Y, which determine our biological sex. You’ll also notice that there’s two chromosomes for each chromosome number, one in blue and one in pink. We have two copies of each chromosome, one from our biological mother and one from our biological father. Because we have two chromosomes, we’re called diploid.

A good point to mention now is that genomes are often represented as haploid, so having only one set of chromosomes. So the 3.0 to 3.2 billion nucleotides in the human genome is really the size of the haploid genome. Most of our cells are diploid. So these were to contain 6.0 to 6.4 billion nucleotides. We reduce the genome to haploid because it represents the complete set of genes that exist in humans. In addition, it helps with consistency because many eukaryotes, in particular plants, have variable copies of chromosomes. The strawberry plant is octoploid and has eight copies of its chromosomes.

As we mentioned, DNA is compacted to form these chromosomes. What does this look like exactly? Unraveling a bit of this chromosome and we see that DNA is wound up tightly around special proteins called histones. These help compact the over three billion nucleotides of DNA in humans. Contained within this DNA are genes. In the human genome, there’s between 20,000 to 25,000 protein-coding genes with an average size of about 10,000 nucleotides. Multiplying these numbers together gives us about 200 million to 250 million nucleotides, which is considerably less than the over three billion nucleotides in the genome. This is mostly due to noncoding DNA.

So if we change perspectives a bit and look at our DNA stretched out like this, we can see that there aren’t too many protein-coding genes in our DNA. In fact, about 99 percent of our genome is made up of noncoding DNA. When it was first discovered, many thought it had no function, and because of this, it was called junk DNA. But there’s nothing junkie about it. We now know that noncoding DNA has multiple functions and can make a variety of noncoding RNA molecules, one of which is structural RNA, an example of which is ribosomal or rRNA, which is a critical component of the ribosome, the organelle that translates proteins from an mRNA transcript.

Noncoding DNA can also make regulatory RNA like microRNA, which play an important role in downregulating, or turning down, the expression of certain genes. Numerous microRNAs are thought to be involved in the disease Alzheimer’s. Approximately two-thirds of the human genome is made up of repetitive DNA, where sequences of nucleotides are repeated over and over again. An extreme example of this is in the fruit fly, where the sequence AGAAG is repeated about 100,000 times.

Speaking of the fruit fly genome, let’s take a moment to look at some of its characteristics and some other eukaryotes as well. The fruit fly genome is about 170 million nucleotides, and it contains about 14,000 protein-coding genes in contrast to humans, which have a genome of about three to 3.2 billion nucleotides and 20 to 25,000 genes. You’ll notice that even though our genome is about 20 times the size of the fruit fly genome, we don’t have 20 times the number of genes. And if you look at the rice plant with a genome size of 470 million, which is roughly 10 times smaller than our genome, rice is about twice the number of protein-coding genes.

So there really is no correlation between the size of the genome and the number of protein-coding genes. If there was, then we might expect to have more genes than the rice genome because ours is larger. Another interesting point to make is when we look at the tiny protozoan Trichomonas vaginalis. This organism has a tiny genome comparable to the fruit fly genome yet has an astounding 60,000 genes. We can see here that the complexity of an organism doesn’t necessarily correlate with the number of protein-coding genes, as this relatively simple protozoan has nearly three times the number of genes compared to humans.

Now that we’ve looked at the eukaryotic genome in some detail, let’s turn our attention to the prokaryotic genome. Prokaryotes often have their DNA contained within a single circular chromosome. Escherichia coli, or E. coli, has a genome size of about 4.6 million nucleotides, which is about 100 times smaller than our genome. E. coli also has about 4,400 protein-coding genes, which is only about one-fifth of our genome. That’s a lot of genes in a small amount of space. The genomes of prokaryotes is often packed with genes with very few gaps between adjacent genes. In some cases, the genes are so densely packed that only a single nucleotide separates the two genes. So in contrast to eukaryotes, which are made up largely of noncoding DNA, the genomes of prokaryotes are made up mostly of coding DNA.

Now let’s apply what we’ve learned and look at a practice question.

What is the correlation between the complexity of an organism and the number of protein-coding genes it contains?

Before we answer this question, let’s look at a few key terms. A gene is a section of DNA that produces a functional unit. So if we had a section of DNA with two genes on it, one of them might produce a protein — like insulin, for example, that’s involved in regulating blood sugar levels — or a functional RNA molecule, like a special type of RNA called microRNA that’s involved in regulating gene expression. So there’re really two types of genes. Protein-coding genes code for proteins, while genes that don’t code for proteins but instead produce an RNA molecule’s function are called noncoding genes.

The genome is the complete set of genetic material of an organism. And by studying the genomes of different organisms, for example, the protozoan Trichomonas vaginalis, the fruit fly, and a human, we’ve determined the number of protein-coding genes that the organism contains. Trichomonas vaginalis has about 60,000 protein-coding genes, the fruit fly has about 14,000, and humans have somewhere between 20 and 25,000 protein-coding genes.

Since this question is asking us about the complexity of organisms relative to the number of protein-coding genes, let’s rank them. We can say that Trichomonas vaginalis is the least complex because it’s only a single cell, whereas humans are the most complex because they’re multicellular, have more tissue types and a more advanced nervous system than our fruit fly. So our relative complexity can look something like this. This question is asking us about the correlation, or relationship, between the complexity of an organism and the number of protein-coding genes it contains. So as complexity increases, what can we say about the number of protein-coding genes?

Well, if we look at the fruit fly and the human, we can see that as complexity increases, so do the number of protein-coding genes. But if we look at Trichomonas vaginalis and the fruit fly, we see that the opposite is true. And as complexity increases, the number of protein-coding genes decrease. So we see no consistent relationship between organisms, which means there is no correlation between complexity and the number of protein-coding genes.

Now, let’s go over some of the key points that we learned in this video. The genome is the complete set of genetic material in an organism. Contained in the genome are numerous genes. A gene is a segment of DNA that produces a functional unit, such as a protein or a functional RNA. Eukaryotic genomes are made up of coding DNA, which consists of protein-coding genes, and noncoding DNA, which contains DNA that does not code for proteins. Most DNA in humans is noncoding and is made up of repeating sequences. Eukaryotes often have larger genomes compared to prokaryotes, but they’re often less densely packed with genes.

Lesson Video: Structure of the Genome Biology

Video Transcript

Join Nagwa Classes