| Non-Rationalised Science NCERT Notes and Solutions (Class 6th to 10th) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6th | 7th | 8th | 9th | 10th | ||||||||||
| Non-Rationalised Science NCERT Notes and Solutions (Class 11th) | ||||||||||||||
| Physics | Chemistry | Biology | ||||||||||||
| Non-Rationalised Science NCERT Notes and Solutions (Class 12th) | ||||||||||||||
| Physics | Chemistry | Biology | ||||||||||||
Chapter 6 Molecular Basis Of Inheritance
The Dna
Following the principles of inheritance established by Mendel, scientists sought to understand the chemical nature of the 'factors' responsible for heredity. Over time, evidence accumulated, pointing towards Deoxyribonucleic Acid (DNA) as the primary genetic material in most organisms. Ribonucleic Acid (RNA) also serves as genetic material in some viruses, but more commonly acts as a messenger, adapter, structural component, or even a catalyst in other organisms.
Nucleic acids are polymers composed of repeating monomer units called nucleotides. You have previously learned the basic structure of nucleotides and how they link together to form the nucleic acid chains.
Structure Of Polynucleotide Chain
A nucleotide unit consists of three components:
- A Nitrogenous Base: There are two categories of nitrogenous bases:
- Purines: Adenine (A) and Guanine (G). These have a double-ring structure.
- Pyrimidines: Cytosine (C), Uracil (U), and Thymine (T). These have a single-ring structure.
- A Pentose Sugar: A five-carbon sugar. This is deoxyribose in DNA and ribose in RNA. The difference is the presence of an extra -OH group at the 2' position in ribose compared to deoxyribose.
- A Phosphate Group.
A nitrogenous base is linked to the 1' carbon of the pentose sugar via an N-glycosidic linkage, forming a structure called a nucleoside. Examples of nucleosides include adenosine (adenine + ribose), deoxyadenosine (adenine + deoxyribose), guanosine, deoxyguanosine, cytidine, deoxycytidine, uridine (uracil + ribose), and deoxythymidine (thymine + deoxyribose).
When a phosphate group is linked to the 5' carbon of a nucleoside via a phosphoester linkage, a nucleotide is formed (or deoxynucleotide if the sugar is deoxyribose).
Two nucleotides are linked together by a 3'-5' phosphodiester linkage between the 3' hydroxyl group of one sugar and the 5' phosphate group of the next sugar, forming a dinucleotide. This linkage is repeated to form a long polynucleotide chain.
A polynucleotide chain has a directionality or polarity. One end has a free phosphate group attached to the 5' carbon of the sugar (the 5'-end), and the other end has a free hydroxyl group attached to the 3' carbon of the sugar (the 3'-end).
The repeating sugar-phosphate units form the backbone of the polynucleotide chain, with the nitrogenous bases projecting inwards or outwards from this backbone.
DNA, initially identified as an acidic substance in the nucleus by Friedrich Meischer in 1869 and named 'Nuclein', posed technical challenges for structural studies due to its long polymer nature.
The now famous Double Helix model for DNA structure was proposed in 1953 by James Watson and Francis Crick. Their model was based on:
- X-ray diffraction data generated by Maurice Wilkins and Rosalind Franklin, which provided clues about the helical nature and dimensions of DNA.
- Erwin Chargaff's observations (Chargaff's rules) which stated that in double-stranded DNA, the amount of Adenine (A) is equal to the amount of Thymine (T), and the amount of Guanine (G) is equal to the amount of Cytosine (C). That is, A/T = 1 and G/C = 1.
The base pairing explained by Chargaff's rules gives a unique property to the two polynucleotide strands: they are complementary to each other. If the base sequence of one strand is known, the sequence of the other strand can be predicted (A pairs with T, and G pairs with C).
This complementarity immediately suggested a mechanism for how DNA could replicate, with each strand serving as a template for synthesising a new complementary strand, resulting in two identical daughter DNA molecules.
Key features of the DNA Double-helix structure:
- It consists of two polynucleotide chains.
- The backbone is formed by alternating sugar and phosphate groups, while the nitrogenous bases are oriented towards the inside of the helix.
- The two chains have anti-parallel polarity. If one strand runs in the 5' to 3' direction, the other runs in the 3' to 5' direction.
- The bases in the two strands are paired through Hydrogen bonds (H-bonds), forming base pairs (bp). Adenine (A) always pairs with Thymine (T) via two H-bonds ($\textsf{A=T}$). Guanine (G) always pairs with Cytosine (C) via three H-bonds ($\textsf{G}\equiv\textsf{C}$). This specific pairing (purine with pyrimidine) maintains a relatively uniform distance between the two strands.
- The two chains are coiled in a right-handed helical fashion.
- The pitch of the helix (one complete turn) is approximately $3.4$ nanometers ($3.4 \times 10^{-9}$ m).
- There are roughly 10 base pairs per turn of the helix.
- The distance between two adjacent base pairs is approximately $0.34$ nanometers ($0.34 \times 10^{-9}$ m).
- The flat plane of one base pair stacks over the next in the double helix. This stacking, in addition to hydrogen bonds, contributes significantly to the stability of the helical structure.
The simplicity of the double helix structure and its implications for genetics were revolutionary. Soon after, Francis Crick proposed the Central Dogma of Molecular Biology, stating that genetic information flows from DNA to RNA to Protein ($\textsf{DNA} \to \textsf{RNA} \to \textsf{Protein}$). In some viruses (retroviruses), this flow can be reversed (RNA $\to$ DNA) through a process called reverse transcription.
Packaging Of Dna Helix
Given the distance between base pairs ($0.34$ nm), a typical human diploid DNA ($6.6 \times 10^9$ bp) has a total length of approximately $6.6 \times 10^9 \textsf{ bp} \times 0.34 \times 10^{-9} \textsf{ m/bp} \approx 2.2$ metres. This is far larger than the size of a typical nucleus (around $10^{-6}$ m). Thus, DNA must be highly packaged to fit within the cell.
In prokaryotes (like E. coli), which lack a defined nucleus, the DNA is not scattered throughout the cell. The negatively charged DNA is coiled and held in large loops by positively charged proteins in a region called the nucleoid.
In eukaryotes, DNA packaging is more complex:
- DNA is associated with a set of positively charged basic proteins called histones. Histones are rich in the basic amino acids lysine and arginine, whose positively charged side chains interact with the negatively charged phosphate backbone of DNA.
- Histones are organised into units of eight molecules called histone octamers.
- The negatively charged DNA is wrapped around the positively charged histone octamer, forming a structure called a nucleosome. A typical nucleosome contains about 200 base pairs of DNA.
- Nucleosomes are the repeating units of chromatin, the thread-like structures observed in the nucleus when stained.
- When viewed under an electron microscope, chromatin appears as a 'beads-on-string' structure, where the 'beads' are nucleosomes and the 'string' is the DNA connecting them.
The beads-on-string chromatin is further packaged into chromatin fibers, which are then coiled and condensed during cell division (metaphase) to form visible chromosomes.
Higher levels of chromatin packaging involve additional proteins called Non-histone Chromosomal (NHC) proteins.
Within the nucleus, chromatin exists in different states of packing:
- Euchromatin: Loosely packed regions of chromatin that stain lightly. Euchromatin is generally considered transcriptionally active, meaning genes in these regions are more readily expressed.
- Heterochromatin: More densely packed regions of chromatin that stain darkly. Heterochromatin is typically transcriptionally inactive.
The Search For Genetic Material
Despite Mendel's laws and the discovery of chromosomes, the specific molecule that constituted the genetic material remained a mystery until the mid-20th century. Chromosomes were known to contain both proteins and DNA, leading to debate about which of these molecules carried the hereditary information.
Transforming Principle
One of the early experiments suggesting that a substance could transfer genetic information was performed by Frederick Griffith in 1928. He worked with Streptococcus pneumoniae bacteria, which cause pneumonia.
- He used two strains: a virulent 'S' strain (smooth colonies with a polysaccharide coat) and a non-virulent 'R' strain (rough colonies without a coat).
- Mice injected with S strain died; mice injected with R strain survived.
- Mice injected with heat-killed S strain survived.
- However, when he injected a mixture of heat-killed S strain and live R strain, the mice died.
- Furthermore, he was able to recover live S strain bacteria from the dead mice.
Griffith concluded that some substance or 'transforming principle' from the heat-killed S strain had transformed the live R strain bacteria into the virulent S strain. This transformation implied the transfer of genetic material, but the biochemical nature of this principle was unknown from his experiment.
Biochemical Characterisation Of Transforming Principle
Building on Griffith's work, Oswald Avery, Colin MacLeod, and Maclyn McCarty (1933-1944) aimed to identify the biochemical nature of the transforming principle. At the time, proteins were widely believed to be the genetic material.
- They purified proteins, DNA, and RNA from heat-killed S strain bacteria.
- They tested which purified component could transform live R bacteria into S bacteria.
- They found that DNA alone from S bacteria was able to transform R bacteria.
- To further confirm, they treated the purified components with enzymes:
- Treatment with proteases (digests proteins) did not inhibit transformation.
- Treatment with RNases (digests RNA) did not inhibit transformation.
- Treatment with DNase (digests DNA) inhibited transformation.
These results strongly indicated that DNA, not protein or RNA, was the transforming principle and hence the genetic material. Although their findings were compelling, some biologists remained unconvinced.
The Genetic Material Is Dna
The definitive proof that DNA is the genetic material came from the experiments of Alfred Hershey and Martha Chase in 1952. They worked with bacteriophages (viruses that infect bacteria).
Bacteriophages attach to bacteria and inject their genetic material into the host cell. The bacterial cell then produces new viral particles using the viral genetic instructions. Hershey and Chase designed an experiment to determine whether it was the protein coat or the DNA of the virus that entered the bacteria.
Steps of the Hershey-Chase experiment:
- They prepared two batches of bacteriophages:
- One batch was grown in a medium containing radioactive phosphorus ($\textsf{^{32}P}$). Since DNA contains phosphorus but proteins do not, the DNA of these viruses became radioactive.
- Another batch was grown in a medium containing radioactive sulfur ($\textsf{^{35}S}$). Since proteins contain sulfur but DNA does not, the proteins of these viruses became radioactive.
- The radioactive phages were allowed to infect E. coli bacteria.
- After infection, the mixture was agitated in a blender to detach the viral protein coats from the bacteria.
- The mixture was then centrifuged. This separated the heavier bacterial cells from the lighter viral particles (supernatant).
Results:
- Bacteria infected with $\textsf{^{32}P}$-labeled phages were found to be radioactive, while the supernatant contained very little radioactivity. This indicated that the radioactive DNA entered the bacterial cells.
- Bacteria infected with $\textsf{^{35}S}$-labeled phages were found to be non-radioactive, while the supernatant was radioactive. This indicated that the radioactive protein coats remained outside the bacterial cells.
This experiment unequivocally demonstrated that DNA, not protein, is the genetic material transferred from the virus to the bacteria during infection, thus confirming its role as the hereditary substance.
Properties Of Genetic Material (Dna Versus Rna)
Following the Hershey-Chase experiment, DNA was established as the genetic material in most organisms. However, in some viruses like Tobacco Mosaic Virus (TMV) and QB bacteriophage, RNA acts as the genetic material. This raised questions about the functional differences between DNA and RNA.
A molecule considered to be genetic material must meet the following criteria:
- It should be able to replicate (generate its own copies).
- It should be chemically and structurally stable.
- It should provide the possibility for mutation (slow changes needed for evolution).
- It should be able to express itself in the form of traits (Mendelian Characters).
Evaluating DNA and RNA based on these criteria:
- Replication: Both DNA and RNA have the ability to replicate due to the principle of complementarity and base pairing, allowing them to serve as templates for new strands. Other molecules like proteins cannot replicate in this manner.
- Stability:
- DNA is chemically less reactive and structurally more stable than RNA.
- RNA has an extra -OH group at the 2' position in its ribose sugar, which makes it more reactive and easily degradable (labile). RNA is also known to be catalytic in some cases, which further increases its reactivity.
- The presence of Thymine (T) instead of Uracil (U) in DNA also contributes to its greater stability (related to DNA repair mechanisms).
- The double-stranded nature of DNA also adds stability, as a broken strand can be repaired using the complementary strand as a template.
- Mutation: Both DNA and RNA can undergo mutations. However, RNA, being less stable, mutates at a faster rate. This is why RNA viruses (like influenza or HIV) mutate and evolve quickly, making them harder to control.
- Expression: RNA can directly code for protein synthesis (translation) and express traits. DNA, on the other hand, is dependent on transcription into RNA before it can direct protein synthesis. The cellular machinery for protein synthesis (ribosomes) has evolved around RNA.
Conclusion: While both can function as genetic material, DNA is preferred for storing genetic information due to its higher stability. RNA, being more reactive, is better suited for the transmission and expression of genetic information.
Rna World
Based on current evidence, it is widely believed that RNA was the first genetic material. This period is sometimes referred to as the 'RNA world'.
- Evidence suggests that many essential life processes, including metabolism, translation, and RNA splicing, evolved with RNA playing central roles.
- RNA acted not only as genetic material but also as a catalyst (ribozymes). For example, ribosomal RNA (rRNA) in ribosomes has catalytic activity during protein synthesis.
- However, RNA's catalytic nature made it reactive and inherently unstable.
- Over time, DNA evolved from RNA through chemical modifications that conferred greater stability (e.g., replacing ribose with deoxyribose, thymine with uracil).
- The development of the double-stranded structure and associated repair mechanisms further enhanced DNA's stability, making it the preferred molecule for long-term storage of genetic information.
Replication
When Watson and Crick proposed the double helical structure of DNA, they immediately suggested a mechanism for how DNA could replicate itself. Their statement was: "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material".
The proposed mechanism was that the two strands of the DNA double helix would separate, and each parental strand would serve as a template for the synthesis of a new complementary strand. After replication, each new DNA molecule would consist of one parental strand and one newly synthesised strand. This model of DNA replication is called semiconservative replication.
The Experimental Proof
The semiconservative nature of DNA replication was experimentally proven by Matthew Meselson and Franklin Stahl in 1958, using Escherichia coli bacteria.
Their experiment involved using different isotopes of nitrogen ($\textsf{^{15}N}$ and $\textsf{^{14}N}$) to label DNA and distinguish parental DNA from newly synthesised DNA based on density.
Steps of the Meselson-Stahl experiment:
- They grew E. coli for several generations in a medium containing $\textsf{^{15}NH_4Cl}$ (heavy isotope of nitrogen) as the sole nitrogen source. This resulted in the incorporation of $\textsf{^{15}N}$ into the bacterial DNA, making the DNA 'heavy'.
- They then transferred the $\textsf{^{15}N}$-labeled bacteria to a medium containing normal $\textsf{^{14}NH_4Cl}$ (light isotope).
- Samples of bacteria were taken at specific time intervals (corresponding to generations) after the transfer.
- DNA was extracted from each sample and centrifuged in a cesium chloride (CsCl) density gradient. In a CsCl gradient, molecules separate based on their density. Heavier molecules sediment further down the tube.
Results:
- After one generation (20 minutes, since E. coli divides every 20 minutes), the extracted DNA showed a single band of intermediate density, which was denser than $\textsf{^{14}N}$-DNA but lighter than $\textsf{^{15}N}$-DNA. This hybrid DNA contained one $\textsf{^{15}N}$-labeled parental strand and one $\textsf{^{14}N}$-labeled newly synthesised strand, supporting the semiconservative model.
- After two generations (40 minutes), the extracted DNA showed two bands: one band of intermediate density (hybrid DNA) and one band of light density ($\textsf{^{14}N}$-DNA). This occurred because the hybrid DNA molecules from the first generation replicated in the $\textsf{^{14}N}$ medium, producing some more hybrid DNA and some entirely light DNA (two $\textsf{^{14}N}$ strands).
This experiment provided strong evidence for semiconservative DNA replication. A similar experiment using radioactive thymidine in Vicia faba (faba beans) by Taylor and colleagues in 1958 confirmed that replication in eukaryotes is also semiconservative.
The Machinery And The Enzymes
DNA replication in living cells requires a complex set of enzymes and proteins.
- The main enzyme responsible for synthesising the new DNA strands is DNA-dependent DNA polymerase. It uses a DNA template strand to add deoxyribonucleotides in a polymer chain.
- DNA polymerases are highly efficient; for example, E. coli (4.6 $\times$ 10$^6$ bp) completes replication in about 18 minutes, implying a polymerisation rate of about 2000 bp per second.
- These enzymes also have proofreading capabilities to ensure accuracy. Mistakes can lead to mutations.
- Replication is energetically expensive. The deoxyribonucleoside triphosphates (dATP, dGTP, dCTP, dTTP) serve as both substrates for the new DNA chain and provide the necessary energy from the cleavage of their two terminal phosphates.
For long DNA molecules, the entire double helix cannot be separated at once. Replication occurs within a small opened region called the replication fork.
DNA polymerases can only catalyse polymerisation in one direction: 5' to 3'.
- On the template strand with 3' to 5' polarity, the new strand is synthesised continuously in the 5' to 3' direction. This is called the leading strand.
- On the template strand with 5' to 3' polarity, the new strand must be synthesised discontinuously in short fragments (also in the 5' to 3' direction). These fragments are called Okazaki fragments, and this is the lagging strand synthesis.
- The discontinuously synthesised Okazaki fragments are later joined together by the enzyme DNA ligase.
DNA polymerases cannot initiate replication on their own; they require a primer. Replication also starts at specific DNA sequences called origin of replication. These origins are important for initiating and controlling replication. In recombinant DNA technology, vectors must have an origin of replication to allow the inserted DNA piece to replicate in the host cell.
In eukaryotes, DNA replication occurs during the S phase of the cell cycle and must be tightly coordinated with cell division. Failure in this coordination can lead to chromosomal anomalies like polyploidy.
Transcription
Transcription is the process of copying genetic information from one strand of DNA into a complementary RNA molecule.
Similar to replication, transcription follows the principle of complementarity, but with one key difference: Adenine (A) in DNA pairs with Uracil (U) in the newly synthesised RNA (instead of Thymine, T).
Unlike replication, where the entire genome is copied, transcription involves copying only a specific segment of DNA, and usually only one of the two DNA strands.
Why only one strand is transcribed:
- If both strands were transcribed, they would produce two different RNA molecules (due to different sequences). These RNA molecules would then code for two different protein sequences, which would complicate the genetic information flow.
- If both complementary RNA strands were produced simultaneously, they would likely bind together to form a double-stranded RNA molecule. Double-stranded RNA cannot be efficiently translated into protein, making the transcription process futile.
Transcription Unit
A segment of DNA that is transcribed into RNA is called a transcription unit. It typically consists of three main regions:
- Promoter: A DNA sequence located upstream (towards the 5' end of the coding strand) of the structural gene. It serves as the binding site for RNA polymerase and signals the start of transcription. The promoter's location also helps define which DNA strand is the template.
- Structural gene: The segment of DNA that contains the genetic information to be transcribed into RNA.
- Terminator: A DNA sequence located downstream (towards the 3' end of the coding strand) of the structural gene. It signals the end of transcription and causes RNA polymerase to detach from the DNA.
Defining the DNA strands in a transcription unit:
- Since RNA polymerase synthesises RNA in the 5' to 3' direction, the DNA strand with the 3' to 5' polarity acts as the template. This is called the template strand.
- The other DNA strand has the 5' to 3' polarity. Its sequence is identical to the newly synthesised RNA sequence (except for T in DNA replaced by U in RNA). This strand is called the coding strand, even though it does not directly serve as the template and is displaced during transcription.
All reference points for defining the transcription unit (like upstream/downstream location of promoter/terminator) are given with respect to the polarity of the coding strand.
Example DNA sequence (coding strand shown as the reference):
3'-ATGCATGCATGCATGCATGCATGC-5' Template Strand
5'-TACGTACGTACGTACGTACGTACG-3' Coding Strand
RNA sequence transcribed from the template strand (3'-ATGCATGCATGCATGCATGCATGC-5'):
5'-UACGUACGUACGUACGUACGUACG-3' RNA
Notice the RNA sequence is the same as the Coding Strand (5'-TACGTACGTACGTACGTACGTACG-3') but with U replacing T.
Transcription Unit And The Gene
A gene is considered the functional unit of inheritance. While genes are located on DNA, their definition in terms of DNA sequence can be complex.
- A cistron is defined as a segment of DNA that codes for a polypeptide.
- Structural genes in a transcription unit can be monocistronic (coding for a single polypeptide) or polycistronic (coding for multiple polypeptides). Polycistronic genes are common in bacteria and prokaryotes, while monocistronic genes are mostly found in eukaryotes.
In eukaryotes, structural genes are often split. This means the coding sequences are interrupted by non-coding sequences.
- Exons: These are the coding or expressed sequences that appear in the mature, functional RNA.
- Introns: These are intervening or non-coding sequences that do not appear in the mature RNA.
The presence of introns makes the definition of a gene as a simple continuous DNA segment coding for RNA complicated. The split-gene arrangement is considered an ancient feature of genomes.
Sometimes, regulatory sequences (like promoters or enhancers) that do not code for RNA or protein but affect gene expression are loosely referred to as 'regulatory genes'.
Types Of Rna And The Process Of Transcription
In bacteria, there are three main types of RNA molecules involved in protein synthesis:
- mRNA (messenger RNA): Provides the template (sequence of codons) that determines the order of amino acids in a polypeptide chain.
- tRNA (transfer RNA): Acts as an adapter molecule, carrying specific amino acids to the ribosome and reading the genetic code on the mRNA.
- rRNA (ribosomal RNA): Structural components of ribosomes and also act as catalysts (ribozymes) during protein synthesis.
In bacteria, a single type of DNA-dependent RNA polymerase transcribes all three types of RNA.
The process of transcription in bacteria involves three steps:
- Initiation: RNA polymerase binds to the promoter sequence on the DNA. The enzyme requires an initiation factor (sigma factor, s) to recognise and bind to the promoter. This binding unwinds the DNA helix, allowing transcription to begin.
- Elongation: RNA polymerase moves along the DNA template strand, synthesising the complementary RNA molecule in the 5' to 3' direction. Nucleoside triphosphates are used as substrates. The enzyme facilitates the opening of the DNA helix ahead and rewinding behind it.
- Termination: When RNA polymerase reaches the terminator sequence on the DNA, transcription stops. The nascent RNA molecule and the RNA polymerase enzyme are released from the DNA. In bacteria, termination can be signaled by specific sequences or involve a termination factor (rho factor, r).
In bacteria, transcription and translation can occur simultaneously (coupled) because there is no nucleus to separate the processes, and the mRNA does not require post-transcriptional processing to become active. Translation can begin on the mRNA even before its transcription is completed.
In eukaryotes, the process of transcription is more complex:
- Multiple RNA Polymerases: Eukaryotes have at least three different RNA polymerases in the nucleus, each responsible for transcribing specific types of RNA:
- RNA Polymerase I: Transcribes ribosomal RNAs (28S, 18S, 5.8S rRNA).
- RNA Polymerase II: Transcribes the precursor of mRNA, called heterogeneous nuclear RNA (hnRNA).
- RNA Polymerase III: Transcribes tRNA, 5S rRNA, and snRNAs (small nuclear RNAs).
- Post-transcriptional Processing: The primary RNA transcripts produced in eukaryotes (like hnRNA) are non-functional and require processing. The hnRNA contains both exons and introns.
- Splicing: Introns are removed from the hnRNA, and the exons are joined together in a specific order to form the mature mRNA. This process is carried out by a complex called the spliceosome.
- Capping: An unusual nucleotide, methyl guanosine triphosphate, is added to the 5' end of the hnRNA. This cap is important for mRNA stability and translation initiation.
- Tailing: A series of adenylate residues (200-300 A's) are added to the 3' end of the hnRNA in a template-independent manner. This poly(A) tail is also important for mRNA stability and transport.
Only the fully processed mRNA is transported out of the nucleus into the cytoplasm for translation. The split-gene arrangement and splicing are thought to be ancient features, perhaps reflecting the RNA world and potentially allowing for evolutionary flexibility through alternative splicing (different combinations of exons from a single gene can produce different proteins).
Genetic Code
While replication and transcription involve copying one nucleic acid template to another based on complementarity, translation involves converting the information encoded in a polymer of nucleotides (mRNA) into a polymer of amino acids (polypeptide/protein). There is no direct chemical complementarity between nucleotides and amino acids.
Experimental evidence showed that changes in DNA (mutations) resulted in changes in protein sequences, leading to the idea of a genetic code that dictates the sequence of amino acids during protein synthesis.
Deciphering the genetic code was a major challenge involving collaboration across different scientific fields.
- Physicist George Gamow proposed that the code must be a combination of bases. With 4 bases (A, U, G, C) and 20 amino acids, a triplet code (three nucleotides per amino acid) was suggested, as $4^3 = 64$, providing more than enough combinations. A doublet code ($4^2 = 16$) would be insufficient.
- Har Gobind Khorana developed chemical methods to synthesise RNA molecules with defined sequences (e.g., homopolymers like UUUUUU..., or copolymers with repeating units).
- Marshall Nirenberg developed a cell-free system for protein synthesis, which allowed researchers to use synthetic RNA molecules and observe which amino acids were incorporated into proteins.
- Severo Ochoa's enzyme (polynucleotide phosphorylase) was also used to synthesise RNA with defined sequences in a template-independent manner.
These contributions, among others, led to the deciphering of the complete genetic code, represented in a checkerboard format.
| First Position | Second Position | Third Position | |||||||
|---|---|---|---|---|---|---|---|---|---|
| U | C | A | G | ||||||
| U | UUU | Phe (F) | UCU | Ser (S) | UAU | Tyr (Y) | UGU | Cys (C) | U |
| UUC | Phe (F) | UCC | Ser (S) | UAC | Tyr (Y) | UGC | Cys (C) | C | |
| UUA | Leu (L) | UCA | Ser (S) | UAA | STOP | UGA | STOP | A | |
| UUG | Leu (L) | UCG | Ser (S) | UAG | STOP | UGG | Trp (W) | G | |
| C | CUU | Leu (L) | CCU | Pro (P) | CAU | His (H) | CGU | Arg (R) | U |
| CUC | Leu (L) | CCC | Pro (P) | CAC | His (H) | CGC | Arg (R) | C | |
| CUA | Leu (L) | CCA | Pro (P) | CAA | Gln (Q) | CGA | Arg (R) | A | |
| CUG | Leu (L) | CCG | Pro (P) | CAG | Gln (Q) | CGG | Arg (R) | G | |
| A | AUU | Ile (I) | ACU | Thr (T) | AAU | Asn (N) | AGU | Ser (S) | U |
| AUC | Ile (I) | ACC | Thr (T) | AAC | Asn (N) | AGC | Ser (S) | C | |
| AUA | Ile (I) | ACA | Thr (T) | AAA | Lys (K) | AGA | Arg (R) | A | |
| AUG | Met (M) | ACG | Thr (T) | AAG | Lys (K) | AGG | Arg (R) | G | |
| G | GUU | Val (V) | GCU | Ala (A) | GAU | Asp (D) | GGU | Gly (G) | U |
| GUC | Val (V) | GCC | Ala (A) | GAC | Asp (D) | GGC | Gly (G) | C | |
| GUA | Val (V) | GCA | Ala (A) | GAA | Glu (E) | GGA | Gly (G) | A | |
| GUG | Val (V) | GCG | Ala (A) | GAG | Glu (E) | GGG | Gly (G) | G | |
Salient features of the Genetic Code:
- The code is a triplet codon: Three adjacent nucleotides on mRNA specify one amino acid.
- There are 64 codons in total.
- 61 codons code for amino acids.
- 3 codons (UAA, UAG, UGA) do not code for any amino acid and function as stop codons, signalling the termination of translation.
- The code is degenerate: Some amino acids are coded by more than one codon. For example, Leucine is coded by six different codons (UUA, UUG, CUU, CUC, CUA, CUG).
- The code is read in a contiguous manner: The codons are read sequentially without any pauses or punctuation between them.
- The code is nearly universal: The same codon specifies the same amino acid in almost all organisms, from bacteria to humans. Minor exceptions exist, for example, in mitochondrial DNA and some protozoa.
- AUG has a dual function: It codes for the amino acid Methionine (Met) and also acts as an initiator codon, signalling the start of translation.
Mutations And Genetic Code
Studying the effects of mutations on protein sequence has provided insights into the relationship between genes (DNA) and the genetic code.
- Point mutations: Changes in a single base pair in the DNA. These can affect a single codon.
- Example: Sickle cell anemia is caused by a point mutation in the beta-globin gene. A single base substitution changes a codon from GAG (coding for Glutamic acid) to GUG (coding for Valine). This single amino acid change alters the protein's structure and function.
- Frameshift mutations: Insertions or deletions of one or two base pairs (or any number not a multiple of three) within the coding sequence. These mutations shift the 'reading frame' of the codons downstream from the site of insertion/deletion, leading to a completely different sequence of amino acids and often a non-functional protein.
- Insertions or deletions of a multiple of three bases: These insert or delete one or more codons, adding or removing amino acids without shifting the reading frame. The resulting protein may still be functional, depending on the location and number of amino acids added/removed.
Trna– The Adapter Molecule
Francis Crick hypothesised the existence of an 'adapter molecule' that could bridge the gap between the nucleotide sequence of mRNA and the amino acid sequence of proteins. This molecule would need to be able to 'read' the codons on mRNA and also bind to specific amino acids.
The tRNA (transfer RNA) molecule serves this adapter function. It was previously known as soluble RNA (sRNA).
Key features of tRNA structure related to its function:
- It has an anticodon loop containing three bases that are complementary to the codon on the mRNA. This allows the tRNA to recognise and bind to the correct codon via base pairing.
- It has an amino acid acceptor end at the 3' end to which a specific amino acid is attached.
- There is a specific tRNA for each amino acid.
- There is a special initiator tRNA that recognises the start codon (AUG) and carries Methionine (or formyl-methionine in bacteria).
- There are no tRNAs for the stop codons.
While the 2D structure of tRNA is often depicted as a clover-leaf shape, its actual 3D structure is a compact, folded molecule resembling an inverted 'L'.
Translation
Translation is the process of synthesising a polypeptide chain (protein) using the genetic information encoded in the nucleotide sequence of an mRNA molecule. Amino acids are polymerised in a specific order dictated by the codons on the mRNA.
Amino acids are linked together by peptide bonds. The formation of peptide bonds requires energy. This energy is supplied by ATP during the activation of amino acids.
Before translation can begin, each amino acid must be activated and attached to its specific tRNA molecule. This process is called charging of tRNA or aminoacylation of tRNA. It is catalyzed by the enzyme aminoacyl-tRNA synthetase.
The cellular machinery responsible for protein synthesis is the ribosome. Ribosomes are composed of ribosomal RNA (rRNA) and many proteins. In their inactive state, ribosomes exist as two subunits: a large subunit and a small subunit. These subunits come together and bind to the mRNA to initiate translation.
During translation, the ribosome moves along the mRNA, reading the codons. The large ribosomal subunit has different binding sites for tRNA molecules, bringing adjacent amino acids close enough for peptide bond formation.
The formation of the peptide bond itself is catalysed by a ribozyme (an RNA enzyme) which is part of the large ribosomal subunit (e.g., the 23S rRNA in bacteria).
A translational unit in mRNA is the sequence region that is translated into a polypeptide. It is defined by the presence of a start codon (AUG) at the beginning and a stop codon (UAA, UAG, or UGA) at the end of the coding sequence.
mRNA molecules also contain Untranslated Regions (UTRs) at both the 5' end (before the start codon) and the 3' end (after the stop codon). These regions are not translated into amino acids but are important for the efficiency and regulation of the translation process.
Steps of translation:
- Initiation: The small ribosomal subunit binds to the mRNA, typically at the start codon (AUG). The initiator tRNA, carrying Methionine (or formyl-Methionine in prokaryotes), binds to the start codon. The large ribosomal subunit then joins the complex.
- Elongation: The ribosome moves along the mRNA codon by codon. Charged tRNAs carrying the appropriate amino acids arrive at the ribosome and bind to their complementary codons on the mRNA. Peptide bonds form between consecutive amino acids, creating the growing polypeptide chain. The tRNAs that have delivered their amino acids are released.
- Termination: When the ribosome reaches a stop codon on the mRNA, there is no corresponding tRNA. A protein called a release factor binds to the stop codon. This causes the ribosome subunits to dissociate from the mRNA and releases the completed polypeptide chain.
The fact that critical steps in translation, such as peptide bond formation, are catalysed by rRNA (ribozyme) supports the idea that RNA played a central role in early life and that the process of translation evolved around RNA.
Regulation Of Gene Expression
Regulation of gene expression refers to the control of which genes are transcribed and translated, and at what rate, allowing cells to produce specific proteins only when and where they are needed. This broad concept can occur at multiple levels in eukaryotes:
- Transcriptional level: Controlling the rate of primary transcript formation.
- Processing level: Regulating the splicing of hnRNA to form mature mRNA.
- Transport level: Controlling the movement of mRNA from the nucleus to the cytoplasm.
- Translational level: Affecting the rate or efficiency of protein synthesis from mRNA.
Gene expression is regulated in response to metabolic, physiological, or environmental conditions. For example, bacteria that can use lactose as an energy source will only produce the enzyme beta-galactosidase (needed to break down lactose) when lactose is present in their environment.
Regulation of gene expression is also fundamental to the processes of development and differentiation in multicellular organisms, where different sets of genes are activated or silenced in specific cell types and at specific times.
In prokaryotes, the primary level of gene expression control is often at the transcriptional initiation stage. The binding and activity of RNA polymerase at the promoter are regulated by interactions with accessory proteins (activators or repressors). These proteins can influence RNA polymerase's ability to recognise and initiate transcription from a promoter.
In many prokaryotic genes, the promoter's accessibility is regulated by interactions between proteins and specific DNA sequences called operators. The operator is typically located adjacent to the promoter. Repressor proteins can bind to the operator, blocking RNA polymerase from transcribing the downstream structural genes.
Genes involved in the same metabolic pathway are often organised together in functional units called operons in bacteria. Each operon usually has a specific operator sequence that interacts with a specific regulatory protein (repressor or activator) for that operon.
The Lac Operon
The lac operon in E. coli is a classic example of a transcriptionally regulated system, first elucidated by Francois Jacob and Jacque Monod. It controls the genes necessary for the metabolism of lactose.
The lac operon is a polycistronic transcription unit, meaning it contains multiple structural genes regulated by a common promoter and regulatory region.
Components of the lac operon:
- i gene (regulatory gene): Codes for the repressor protein of the lac operon. The 'i' stands for inhibitor. This gene is constitutively expressed (always producing repressor protein).
- Promoter (p): Binding site for RNA polymerase.
- Operator (o): Binding site for the repressor protein. Located adjacent to the promoter.
- Structural genes: Three genes involved in lactose metabolism:
- z gene: Codes for beta-galactosidase (b-gal), which hydrolyses lactose into glucose and galactose.
- y gene: Codes for permease, which increases the permeability of the bacterial cell membrane to lactose.
- a gene: Codes for transacetylase, whose function is less clear in lactose metabolism.
Regulation of the lac operon by lactose:
- In the absence of lactose: The repressor protein produced by the i gene is active and binds to the operator region. This physically blocks RNA polymerase from binding to the promoter and transcribing the structural genes (z, y, a). The operon is switched off.
- In the presence of lactose: Lactose (or a related molecule called allolactose, formed from lactose) acts as an inducer. The inducer binds to the repressor protein, causing a conformational change that inactivates the repressor. The inactivated repressor cannot bind to the operator. RNA polymerase is then free to bind to the promoter and transcribe the structural genes. The operon is switched on.
Thus, lactose regulates its own metabolism by acting as an inducer that switches on the lac operon. A low level of permease is always present to allow initial entry of lactose.
The regulation of the lac operon by the repressor is an example of negative regulation (a repressor protein inhibits transcription).
The lac operon is also subject to positive regulation involving other proteins and glucose levels, ensuring lactose is only used when glucose (the preferred energy source) is unavailable.
Human Genome Project
The understanding that an organism's genetic information is encoded in the sequence of bases in its DNA led to the ambitious undertaking of determining the complete DNA sequence of the human genome. Variations in DNA sequences between individuals are responsible for their phenotypic differences.
Advances in genetic engineering techniques (like DNA isolation and cloning) and rapid DNA sequencing methods made this project feasible. The Human Genome Project (HGP) was launched in 1990 with the goal of sequencing the entire human genome.
HGP was referred to as a mega project due to its immense scale and scope.
- Estimated size of the human genome: Approximately $3 \times 10^9$ base pairs.
- Initial estimated cost: US $\$ 3$ per base pair, totalling approximately US $\$ 9$ billion.
- The sheer volume of data generated ($3 \times 10^9$ bases) required massive computational resources for storage, retrieval, and analysis. This spurred the development of Bioinformatics, a new field combining biology and computer science.
Goals Of Hg
The main goals of the Human Genome Project were:
- To identify and map all of the genes in the human DNA (estimated to be around 20,000-25,000).
- To determine the complete sequence of the approximately 3 billion base pairs that make up the human genome.
- To store this vast amount of information in publicly accessible databases.
- To develop and improve tools and software for analysing genomic data.
- To transfer related technologies to industry and other sectors.
- To address the Ethical, Legal, and Social Issues (ELSI) that might arise from genomic research.
HGP was primarily coordinated by the U.S. Department of Energy and the National Institute of Health, with significant international contributions (UK, Japan, France, Germany, China, etc.). The project was completed in 2003.
Knowledge from the HGP is expected to revolutionise healthcare by providing new insights into diseases, potentially leading to improved diagnosis, treatment, and prevention. Sequencing the genomes of non-human model organisms also aids in understanding basic biological processes and applying that knowledge to areas like agriculture, energy, and environmental science.
Methodologies
Two main approaches were used in the HGP:
- Expressed Sequence Tags (ESTs): This approach focused on identifying and sequencing only the genes that are actively expressed as RNA molecules. It provided a shortcut to identify potentially protein-coding regions.
- Sequence Annotation: This approach took a 'blind' approach of sequencing the entire genome, including both coding and non-coding regions. After sequencing, functions were assigned to different regions of the sequence through annotation. This method provided the complete genomic sequence.
Steps involved in sequencing the genome:
- Total DNA was isolated from a cell.
- Due to the long nature of DNA, it was fragmented into smaller, manageable pieces using restriction enzymes.
- These fragments were cloned into suitable hosts (bacteria or yeast) using specialised vectors like Bacterial Artificial Chromosomes (BACs) or Yeast Artificial Chromosomes (YACs). Cloning amplified the DNA fragments, making sequencing easier.
- The DNA fragments were sequenced using automated DNA sequencers based on the method developed by Frederick Sanger (who also determined protein amino acid sequences).
- The sequences of the overlapping fragments were then assembled into contiguous sequences. This required sophisticated computer programs and algorithms.
- Finally, these assembled sequences were annotated (assigned functions) and mapped to specific chromosomes. Chromosome 1, being the largest, was the last to be completely sequenced (in 2006).
Mapping the genome also involved using information about polymorphisms, such as variations in restriction enzyme recognition sites and repetitive DNA sequences (like microsatellites).
Salient Features Of Human Genome
Key findings from the Human Genome Project:
- The human genome contains approximately 3164.7 million base pairs.
- The average gene size is about 3000 bases, but gene sizes vary significantly (e.g., dystrophin gene is 2.4 million bases).
- The estimated number of genes is significantly lower than initially thought, around 30,000 (previous estimates were 80,000-140,000).
- Remarkably, 99.9% of the nucleotide bases are exactly the same in all humans. The differences account for human variation.
- The functions of over 50% of the identified genes are still unknown.
- Less than 2% of the human genome actually codes for proteins.
- A very large portion of the genome consists of repeated sequences (repetitive DNA). These are stretches of DNA repeated many times and generally do not code for proteins. They are important for chromosome structure, dynamics, and evolution.
- Chromosome 1 has the most genes (2968), while the Y chromosome has the fewest (231).
- Scientists have identified about 1.4 million locations with single base differences among individuals, known as Single Nucleotide Polymorphisms (SNPs or 'snips'). These SNPs are valuable markers for studying disease susceptibility and human population history.
Applications And Future Challenges
The complete sequencing of the human genome has opened up a new era of biological research. Instead of studying genes one by one, researchers can now study genes and proteins on a much broader, systemic scale (e.g., studying all gene transcripts in a tissue, or how interconnected networks of genes and proteins function). This is leading to a more holistic understanding of biological systems.
Future challenges lie in extracting meaningful biological knowledge from the vast amount of sequence data and applying it to understand complex biological systems and solve problems in health and other fields. This will require continued efforts from scientists across disciplines globally.
Dna Fingerprinting
Given that $99.9\%$ of the human genome sequence is identical across individuals ($3 \times 10^9$ bp total, so differences in roughly $0.1\%$ or $3 \times 10^6$ bp), sequencing the entire genome to identify differences between two individuals is impractical and expensive.
DNA fingerprinting is a technique that offers a quick and efficient way to compare specific DNA sequences between individuals. It focuses on identifying differences in particular regions of the DNA that are highly variable among people.
The basis of DNA fingerprinting is the presence of repetitive DNA sequences in the genome. These are short DNA sequences that are repeated many times in tandem within the genome. Repetitive DNA can be separated from the main bulk of genomic DNA by density gradient centrifugation, forming smaller peaks known as satellite DNA.
Satellite DNA is further classified based on factors like base composition (A-T rich or G-C rich), length of the repeating segment, and the number of repeats (e.g., microsatellites, mini-satellites).
These repetitive sequences typically do not code for proteins but constitute a significant portion of the human genome. They exhibit a high degree of polymorphism (variation at the genetic level), which is the foundation of DNA fingerprinting.
DNA Polymorphism refers to variations in DNA sequences that occur at a relatively high frequency (greater than 0.01) in a population. These variations usually arise from mutations, particularly in non-coding regions, where they are less likely to affect an individual's survival or reproductive ability. Such mutations accumulate over generations, leading to variability.
Since the DNA in all cells of an individual (blood, hair, skin, etc.) has the same genetic makeup and thus the same pattern of polymorphisms, DNA fingerprinting is a powerful tool for identity determination in forensic science and paternity testing.
The technique of DNA fingerprinting was initially developed by Alec Jeffreys. He used a type of satellite DNA called Variable Number of Tandem Repeats (VNTRs) as a probe because VNTRs show a very high degree of polymorphism (different individuals have different numbers of repeats at specific locations).
The original DNA fingerprinting technique (based on Southern blotting) involved several steps:
- Isolation of DNA: Extracting DNA from a sample (e.g., blood, hair, semen).
- Digestion of DNA: Cutting the DNA into fragments using restriction enzymes.
- Separation of DNA fragments: Separating the fragments by size using gel electrophoresis. Smaller fragments move faster through the gel.
- Transferring (Blotting): Transferring the separated DNA fragments from the gel onto a synthetic membrane (like nitrocellulose or nylon).
- Hybridisation: Incubating the membrane with a labelled VNTR probe (a synthetic DNA sequence complementary to the VNTR regions, often made radioactive initially). The probe binds only to the fragments containing VNTR sequences.
- Detection: Detecting the bound probe using autoradiography (exposing the membrane to X-ray film). This produces a pattern of bands corresponding to the different-sized DNA fragments that hybridised with the probe.
Since the number of VNTR repeats at a particular location varies between individuals (except identical twins), the lengths of the restriction fragments containing VNTRs also vary. This results in a unique pattern of bands for each individual when detected by the VNTR probe. This banding pattern is the 'DNA fingerprint'.
The sensitivity of the technique has been enhanced with the advent of the Polymerase Chain Reaction (PCR), which allows amplification of minute amounts of DNA. Now, DNA from a single cell is sufficient for fingerprinting analysis.
Besides forensic applications (identifying suspects in crime scenes, paternity disputes), DNA fingerprinting is used in studies of genetic diversity, evolution, and tracking populations.
Exercises
Question 1. Group the following as nitrogenous bases and nucleosides: Adenine, Cytidine, Thymine, Guanosine, Uracil and Cytosine.
Answer:
Question 2. If a double stranded DNA has $20\%$ of cytosine, calculate the per cent of adenine in the DNA.
Answer:
Question 3. If the sequence of one strand of DNA is written as follows:
$5'-\text{ATGCATGCATGCATGCATGCATGCATGC}-3'$
Write down the sequence of complementary strand in $5' \rightarrow 3'$ direction.
Answer:
Question 4. If the sequence of the coding strand in a transcription unit is written as follows:
$5'-\text{ATGCATGCATGCATGCATGCATGCATGC}-3'$
Write down the sequence of mRNA.
Answer:
Question 5. Which property of DNA double helix led Watson and Crick to hypothesise semi-conservative mode of DNA replication? Explain.
Answer:
Question 6. Depending upon the chemical nature of the template (DNA or RNA) and the nature of nucleic acids synthesised from it (DNA or RNA), list the types of nucleic acid polymerases.
Answer:
Question 7. How did Hershey and Chase differentiate between DNA and protein in their experiment while proving that DNA is the genetic material?
Answer:
Question 8. Differentiate between the followings:
(a) Repetitive DNA and Satellite DNA
(b) mRNA and tRNA
(c) Template strand and Coding strand
Answer:
Question 9. List two essential roles of ribosome during translation.
Answer:
Question 10. In the medium where E. coli was growing, lactose was added, which induced the lac operon. Then, why does lac operon shut down some time after addition of lactose in the medium?
Answer:
Question 11. Explain (in one or two lines) the function of the followings:
(a) Promoter
(b) tRNA
(c) Exons
Answer:
Question 12. Why is the Human Genome project called a mega project?
Answer:
Question 13. What is DNA fingerprinting? Mention its application.
Answer:
Question 14. Briefly describe the following:
(a) Transcription
(b) Polymorphism
(c) Translation
(d) Bioinformatics
Answer: