Pages

.

Showing posts with label DNA strand asymmetry. Show all posts
Showing posts with label DNA strand asymmetry. Show all posts

Why Mitochondrial DNA is Different

Most genomes that are high in A+T content (or low in G+C content) show a surprising DNA strand asymmetry: The message strand of genes tends to be rich in purines. This rule applies across all domains I've looked at except mitochondria, where message strands tend to be pyrimidine-rich rather than purine-rich. The following two graphs makes this clearer.


This is a graph of message-strand (or RNA-synonymous-strand) purine content plotted vertically, against A+T plotted horizontally, for 1,373 bacterial species. Each dot represents a genome. High-GC/low-AT organisms like Streptomyces and Bordetella are on left and low-GC/high-AT organisms like Clostridium botulinum are toward the right. The few dots on the far right are intracellular endosymbionts that have lost a good bit of DNA over the millennia. They tend to be extremely high in A+T.

Compare the above graph with the graph below, which is the same thing (message-strand A+G vs. A+T) for mitochondrial DNA (N=2543 genomes). There is still an upward slope to the data (and in fact it is steeper than it looks, because the range of y-values is different in the graph below than in the graph above). The slope of the regression line is very nearly the same (0.148 vs. 0.149) for both graphs. But you can see that in the graph below, nearly all the points are below y = 0.50. That means message-strands are high in pyrimidines rather than purines.



I speculated in a previous post that the reason mitochondrial DNA is pyrimidine-heavy on the message strand is that mtDNA encodes a very small number of proteins (13, in all), and they tend to be membrane-associated proteins, which use mostly non-polar amino acids. It turns out that codons for the non-polar amino acids are pyrimidine-rich.

To see if that's really what's going on, I obtained the DNA sequences for cytochrome-c oxidase and NADH dehydrogenase (the two must fundamental enzyme systems of mitochondria) from several hundred bacterial species. Actually, I was able to obtain DNA sequences for a total of 942 bacterial NADH dehydrogenase (subunit L) proteins. I also succeeded in obtaining DNA sequences for 647 bacterial cytochrome-c oxidase subunit 1 proteins. In mitochondria, these genes are known as ND5 and Cox1. In bacteria they're better known as nuoL and cyoB.

The graph below shows A+G for the two enzymes versus whole-chromosome A+T, for the relevant organisms.

Message strand purine content was derived from the DNA sequences of cyoB (pink) genes from 942 bacteria, and from nuoL (blue) genes from 647 bacterial species. The A+G values were plotted against host-organism whole-genome A+T content. All cyoB and nuoL sequences tended to be pyrimidine rich. But pyrimidine content was less for organisms with high A+T content. (Note the slightly positive slope of the regression line.)

The pink points are for cytochrome-c oxidase subunit 1 (cyoB) while the blue points are for NADH dehydrogenase subunit 5 (nuoL). Two things are worth noting. One is that the regression line is upward-sloping, meaning that as an organism's DNA gets richer in A+T content, the purine content on the message strand rises. This effect seems to be universal. The second thing to note is that almost all of the points in the graph lie below y = 0.5, as is the case for mitochondria. These two signature "mitochondrial" enzyme systems, critical to oxidative phosphorylation (in bacteria as well as higher organisms), do tend to use pyrimidine-rich codons—rendering the relevant genes pyrminidine-rich on the RNA-synonymous (message) strand of DNA. The hypothesis is upheld.

For you bio students, a bit of homework: You might want to think about why it is that membrane-associated proteins are rich in non-polar amino acids. (In human mitochondria, leucine and isoleucine are the most-used amino acids. Together they account for an amazing 30% of all amino acids used in mtDNA-encoded gene products.) Hint: Most membranes have a lipid bilayer, and lipids don't like water.
reade more... Résuméabuiyad

More about Mitochondrial DNA

To recap my desktop-science experiments of the last month or so, I've found strandwise DNA asymmetry across domains, which is to say in bacteria, Archaea, eukaryotes, viruses, and mitochondrial DNA. In every case except mitochondria, the message (or RNA-synonymous) strand of DNA in coding regions tends to be purine-rich. The opposite strand tends to be pyrimidine-rich. Moreover, in all domains, including mitochondria, message-strand purine content increases in proportion to genome A+T content. (A+T content is a phylogenetic signature. Some genomes are inherently high in A+T content—or low in G+C content—while others are not. Related organisms tend to have similar A+T or G+C contents.)

Mitochondrial genes tend to be pyrmidine-rich on the message strand, seemingly in violation of the finding that in all other domains, message strands are purine-rich. The mitochondrial anomaly is actually very easy to understand (although it took me weeks to realize the explanation). In a nutshell: Mitochondrial DNA is pyrimidine-rich on message strands because mtDNA encodes only a few proteins (13, usually), all of them membrane-associated. Membrane-associated proteins are unusual because they tend to incorporate mostly non-polar amino acids such as leucine, isoleucine, valine, proline, alanine, or phenylalanine—all of which are specified by pyrimidine-rich codons.
The mitochondrion.

It seems to me mitochondrial DNA shouldn't be thought of as a genome, because well over 90% of mitochondrial-associated gene products are encoded by genes in the host nucleus. (In humans, there may be as many as 1500 nuclear-encoded mitochondrial genes.) This point is worth repeating, so let me quote Patrick Chinnery, TRENDS in Genetics (2003) 19:2, 60:

The vast majority of mitochondrial proteins (estimated at >1000) are synthesized in the cytosol from nuclear gene transcripts.

The circular mitochondrial "chromosome" (if it can be called that) is the vestigial remnant of a much larger genome that long ago migrated to the host nucleus, no doubt to avoid oxidative attack. The mitochondrion simply is not a safe place to store DNA. (Would you set up a sperm bank in a rocket-fuel factory?) It's teeming with molecular oxygen, superoxides, peroxides, free protons, and other hazardous materials.

The human mitochondrial chromosome.

Human mitochondrial DNA (which is typical of a lot of mtDNA) encodes just a handful of multi-subnit transmembrane proteins, namely: cytochrome-c oxidase, NADH dehydrogenase, cytochrome-b, and an ATPase. That's it. There are no other protein genes in human mtDNA. All other "mitochondrial proteins" are encoded somewhere else. (That includes 37 out of 44 subunits of the NADH dehydrogenase complex; the DNA polymerase that replicates mitochondrial DNA; the mitochondrial RNA polymerase; about 50 ribosomal proteins; so-called "mitochondrial" catalase; and hundreds of other "mitochondrial" proteins. All are encoded in the nucleus.)

Bottom line: Mitochondrial DNA encodes a very small ensemble of highly specialized membrane-associated proteins. We shouldn't expect this small ensemble to be representative of other genes found in other genomes. (And it's not.) That, in a nutshell, is why mtDNA is not particularly purine-rich in message strands.

But we should test this hypothesis, if possible. (And it is, in fact, possible.) Most bacteria are aerobic, which means most bacterial species have genes for cytochrome-c oxidase, NADH dehydrogenase, etc. The DNA for those genes should be similar to mtDNA with respect to strand-asymmetric purine content. If we analyze bacterial DNA, we should find that genes for cytochrome-c oxidase, NADH dehydrogenase, etc. are pyrimidine-rich on the message strand, just as in mtDNA.

In tomorrow's post: the data.
reade more... Résuméabuiyad

Shedding Light on DNA Strand Asymmetry

In 1950, Erwin Chargaff was the first to report that the amount of adenine (A) in DNA equals the amount of thymine (T), and the amount of guanine (G) equals the amount of cytosine (C). This result was instrumental in helping Watson and Crick (and Rosalind Franklin) determine the structure of DNA.

It's pretty easy to understand that every A on one strand of DNA pairs with a T on the other strand (and every G pairs with an opposite-strand C); this explains DNA complementarity and the associated replication model. But somewhere along the line, Chargaff was credited with the much less obvious rule that A = T and G = C even for individual strands of DNA that aren't paired with anything. This is the so-called second parity rule attributed to Chargaff, although I can't find any record of Chargaff himself having postulated such a rule. The Chargaff papers that are so often cited as supporting this rule (in particular the 3-paper series culminating in this report in PNAS) do not, in fact, offer such a rule, and if you read the papers carefully, what Chargaff and colleagues actually found was that one strand of DNA is heavier than the other (they label the strands 'H' and 'L', for Heavy and Light); not only that, but Chargaff et al. reported a consistent difference in purine content between strands (see Table 1 of this paper).

When I interviewed Linus Pauling in 1977, he cautioned me to always read the Results section of a paper carefully, because people will often conclude something entirely different than what the Results actually showed, or cite a paper as showing "ABC" when the data actually showed "XYZ."

How right he was.

At any rate, it turns out that the "message" strand of a gene hardly ever contains equal amounts of purines and pyrimidines. Codon analysis reveals that as genes become richer in A+T content (or as G+C content goes down), the excess of purines on the message strand becomes larger and larger. This is depicted in the following graph, which shows message-strand purine content (A+G) plotted against A+T content, for 1,373 distinct bacterial species. (No species is represented twice.)

Codon analysis reveals that as A+T content increases, message-strand purine content (A+G) increases. Each point on this graph represents a unique bacterial species (N=1373).

It's quite obvious that when A+T content is above approximately 33%, as it is for most bacterial species, the message strand tends to be comparatively purine-rich. Below A+T = 33%, the message strand becomes more pyrimidine-rich than purine-rich. (Note: In bacteria, where most of the DNA is in coding regions, codon-derived A+T content is very close to whole-genome A+T content. I checked the 1,373 species graphed here and found whole-chromosome A+T to differ from codon-derived A+T by an average of less than 7 parts in 10,000.)

The correlation between A+T and purine content is strong (r=0.85). Still, you can see that quite a few points have drifted far from the regression line, especially in the region of x = 0.5 to x = 0.7, where lots of points lie above y = 0.55. What's going on with those organisms? I decided to do some investigating.

First, some basics. Over time, transition mutations (AT↔GC) can change an organism's A+T content and thus move it along the x-axis of the graph, but transitions cannot move an organism higher or lower on the graph, because (by definition) transitions don't affect the strandwise purine balance.

Transversions, on the other hand, can affect strandwise purine balance (in theory, at least), but only if they occur more often on one strand of DNA than the other. (I should say: occur more often, or are fixed more often, on one strand versus the other.) For example, let's say G-to-T transversions are the most common kind of transversion (which is probably true, given that guanine is the most easily oxidized of the four bases and given the fact that failure to repair 8-oxoguanine lesions does lead to eventual replacement with thymine). And let's say G-to-T transversions are most likely to occur on the non-transcribed strand of DNA, at transcription time. (The non-transcribed strand is uncoiled and unprotected while transcription is taking place on the other strand.) Over time, the non-transcribed strand would lose guanines; they'd be replaced by thymines. The message strand, or RNA-synonymous strand (which is also the non-transcribed strand) would become pyrimidine-rich and the other strand would become purine-rich.

Unfortunately, while that's exactly what happens for organisms with A+T content below 33%, precisely the opposite happens (purines accumulate on the message strand) in organisms with A+T above 33%. And in fact, in some high-AT organisms, the purine content of message strands is rather extreme. How can we explain that?

One possibility is that some organisms have evolved extremely effective transversion repair systems for the message (non-transcribed) strand of genes—systems that are so effective, no G-to-T transversions go unrepaired on the message strand. The transcribed strand, on the other hand, doesn't get the benefit of this repair system, possibly because the repair enzymes can't access the strand: it's engulfed in transcription factors, topoisomerases, RNA polymerase, nearby ribosomal machinery, etc.

If the non-transcribed strand never mutates (because all mutations are swiftly repaired), then the transcribed strand will (in the absence of equally effective repairs) eventually accumulate G-to-T mutations, and the message strand will accumulate adenines (purines). Perhaps.

In the graph further above, you'll notice at x = 0.6 a tiny spur of points hangs down at around y = 0.5. These points belong to some Bartonella species, plus a Parachlamydia and another chlamydial organism. These are endosymbionts that have lost a good portion of their genomes over time. It seems likely they've lost some transversion-repair machinery. During transcription, their message strands are going unrepaired. G-to-T transversions happen on the message strand, rendering it light in purines. Such a scenario seems plausible, at least.

By this reasoning, maybe points far above the regression line represent organisms that have gained repair functionality, such that their message strands never undergo G-to-T transversions (although their transcribed strands do). Is this possible?

Examination of the highest points on the graph shows a predominance of Clostridia. (Not just members of the genus Clostridium, but the class Clostridia, which is a large, ancient, and diverse class of anaerobes.) One thing we know about the Clostridia is that unlike all other bacteria (unlike members of the Gammaproteobacteria, the Alpha- and Betaproteobacteria, the Actinomycetes, the Bacteroidetes, etc.), the Clostridia have Ogg1, otherwise known as 8-oxoguanine glycosylase (which specifically prevents G-to-T transversions). They share this capability with all members of the Archaea, and all higher life forms as well.

Note that while non-Ogg1 enzymes exist for correcting 8-oxoguanine lesions (e.g., MutM, MutY, mfd), there is evidence that Ogg1 is specifically involved in repair of 8oxoG lesions in non-transcribed strands of DNA, at transcription time. (The other 8oxoG repair systems may not be strand-specific.)

If Archaea benefit from Ogg1 the way Clostridia do, they too should fall well above the regression line on a graph of A+G versus A+T. And this is exactly what we find. In the graph below, the pink squares are members of Archaea that came up positive in a protein-Blast query against Drosophila Ogg1. (I'll explain why I used Drosophila in a minute.) The red-orange circles are bacterial species (mostly from class Clostridia) that turned up Ogg1-positive in a similar Blast search.

Ogg1-positive organisms are plotted here. The pink squares are Archaea species. Red-orange circles are bacterial species that came up Ogg1-positive in a protein Blast search using a Drosophila Ogg1 amino-acid sequence. In the background (greyed out) is the graph of all 1,373 bacterial species, for comparison. Note how the Ogg1-positive organisms have a higher purine (A+G) content than the vast majority of bacteria.

The points in this plot are significantly higher on the y-axis than points in the all-bacteria plot (and the regression line is steeper), consistent with a different DNA repair profile.

In identifying Ogg1-positive organisms, I wanted to avoid false positives (organisms with enzymes that share characteristics of Ogg1 but that aren't truly Ogg1), so for the Blast query I used Drosophila's Ogg1 as a reference enzyme, since it is well studied (unlike Archaeal or Clostridial Ogg1). I also set the E-value cutoff at 1e-10, to reduce spurious matches with DNA repair enzymes or nucleases that might have domain similarity with Ogg1 but aren't Ogg1. In addition, I did spot checks to be sure the putative Ogg1 matches that came up were not actually matches of Fpg (MutM), RecA, RadA, MutY, DNA-3-methyladenine glycosidase, or other DNA-binding enzymes.

Bottom line, organisms that have an Archaeal 8-oxoguanine glycosylase enzyme (mostly obligate anaerobes) occupy a unique part of the A+G vs. A+T graph. Which makes sense. It's only logical that anaerobes would have different DNA repair strategies (and a different "repairosome") than oxygen-tolerant bacteria, because oxidative stress is, in general, handled much differently in anaerobes. The fact that they bring different repair tactics to bear on DNA shouldn't come as a surprise.


reade more... Résuméabuiyad