Pages

.

Showing posts with label mutations. Show all posts
Showing posts with label mutations. Show all posts

A Simple Method for Estimating the Rate of Transition vs. Transversion Mutations

Point mutations in DNA fall into two types: transition mutations, and transversion mutations. (See graphic below.)


In a transition mutation, a purine is swapped for a different purine (for example, adenine is swapped with guanine, or vice versa), or a pyrimidine is swapped with another pyrimidine (C for T or T for C); and usually, if a purine is swapped on one strand, the corresponding pyrimidine gets swapped on the other. Thus, a GC pair gets changed out for an AT pair, or vice versa.

A transversion, on the other hand, occurs when a purine is swapped for a pyrimidine. In a pairwise sense, this means a GC pair becomes a TA pair (for example) or an AT pair gets changed out for CG, or possibly AT for TA, or GC for CG.

Of the two types of mutation, transitions are more common. We also know that, in particular, GC-to-AT transitions are much more common than AT-to-GC transitions, for reasons that are well understood but that I won't discuss here. If you're curious to know what the experimental evidence is for the greater rate of GC-to-AT transitions, see Hall's 1991 Genetica paper (paywall protected, unfortunately) or the non-paywall-protected Y2K J. Bact. paper by Zhao. The latter paper is interesting because it shows that GC-to-AT transitions are more common in stationary-phase cells than exponentially-growing cells, and also, transitions in stationary E. coli are repaired by MutS and MutL gene products. (Overexpression of those two genes results in fewer transitions. Mutation of those two genes results in more transitions.)

An open question in molecular genetics is: What are the relative rates of transitions versus transversions, in natural populations? We know transitions are more common, but by what factor? Questions like this are tricky to answer, for a variety of reasons, and the answers obtained tend to vary quite a bit depending on the organism and methodology used. Van Bers et al. found a transition/transversion ratio (usually symbolized as κ) of 1.7 in Parus major (a bird species). Zhang and Gerstein looked at human DNA pseudogenes and found transitions outnumber transversions "by roughly a factor of two." Setti et al. looked at a variety of bacteria and found that the transition/transversion rate ratio for mutations affecting purines was 2.1 whereas the rate ratio for pyrimidines was 6.6. Tamura and Nei looked at nucleotide substitutions in the control region of mitochondrial DNA in chimps and humans (a region known to evolve rapidly) and found κ to be approximately 15. Yang and Yoder looked at mitochondrial cytochrome b in 28 primate species and found an average κ of 6.4. (In general, κ values tend to be considerably higher for mitochondrial DNA than other types of DNA.)

It's important to note that in all likelihood, no single value of κ will be universally applicable to all genes in all lineages, because evolutionary pressures vary from gene to gene and the rates of transition and transversion are different for different nucleotides (and so codon usage biases come into play). For an introduction to the various considerations involved in trying to estimate κ, I recommend Yang and Nielsen's 2000 paper as well as their 1998 and 1999 papers.

The reason I bring all this up is that I want to offer yet another possible way of estimating the transition/transversion rate ratio κ, using DNA composition statistics. Earlier, I presented data showing that the purine (A+G) content of coding regions of DNA correlates directly with genome A+T content. Analyzing the genomes of representatives of 260 bacterial genera, I came up with the following graph of purine mole-percent versus A+T mole-percent:


The correlation between genome A+T content and mRNA purine content is strong and positive (r=0.852) . Szybalski's Rule says that message regions tend to be purine-rich, but that's not exactly accurate. When genome A+T content is below approximately 35%, coding regions are richer in pyrimidines than purines. Above 35%, purines predominate. The concentration of purines in the mRNA-synonymous strand of DNA rises steadily with genome A+T content. It rises with a slope of 0.13013.

If you try to envision evolution taking an organism from one location on this graph to another, you can imagine that GC-to-AT transitions will move an organism to the right, whereas AT-to-GC transitions will move it to the left. To a first approximation (only!) we can say that horizontal movement on this graph essentially represents the net effect of transitions.

Vertical movement on this graph clearly involves transversions, because a net change in relative A+G content implies nothing less. To a very good first approximation, vertical movement in the graph corresponds to transversions.

Therefore, a good approximation of the relative rate of transitions versus transversions is given by the inverse of the slope. The value comes to 1.0/0.13013, or κ = 7.6846.

In an earlier post, I presented a graph like the one above applicable to mitochondrial DNA (N=203 mitochondrial genomes), which had a slope of 0.06702. Taking the inverse of that slope, we get a value of κ =14.92, which is in excellent agreement with Tamura and Nei's estimate of 15 for mitochondrial κ.

When I made a purine plot using plant and animal virus genomes (N=536), the rise rate (slope) was 0.23707, suggesting a κ value of 4.218. This agrees well with the transition/transversion rate for hepatitus C virus (as measured by Machida et al.) of 1.5 to 7.0 depending on the gene.

In short, we get very reasonable estimates of κ from calculations involving the slope of the A+G vs. A+T graph, across multiple domains.

The main methodological proviso that applies here has to do with the fact that technically, some horizontal movement on the graph can be accomplished with transversions (AT-to-CG, for example). We made a simplifying assumption that all horizontal movement was due to transitions. That assumption is not strictly true (although it is approximately true, since transitions do outnumber transversions; and some transversions, such as AT<-->TA and GC<-->CG, have no effect on genome A+T content). Bottom line, my method of estimating κ probably overestimates κ somewhat, by including a small proportion of AT<-->CG transversions in the numerator. Even so, the estimates agree well with other estimates, tending to validate the general approach.

I invite comments from knowledgeable specialists.

reade more... Résuméabuiyad

The Trouble with Darwin

As a biologist, I find Darwin's theory hugely disappointing. It's better than the alternative (which is to believe in magic, basically), but not by much, sadly.
Charles Darwin died before Mendel
proved the existence of genes
.

As scientific theories go, the theory of evolution is easily the weakest of all major scientific theories. It's a commendable piece of work in its ability to stir discussion, but terrible in most other ways.

To be useful, a scientific theory has to do a minimum of two things: explain what can be observed, and provide testable predictions. Darwin's theory is weak on the first count and useless on the second.

Evolutionary theory explains practically nothing, because every explanation of the theory is rooted in "survival of the fittest," which is a circular notion, utterly content-free. "Fittest" means most able to survive. Survival of the fittest means survival of those who survive.

Ironically, Darwin's landmark work was called On the Origin of Species. Yet it doesn't actually explain speciation, except in the most vacuous and speculative of terms. Of course, we can't set too high an expectation for Darwin, since he didn't live to see the publication of Mendel's work (the word "genetics" wouldn't exist until more than 20 years after Darwin's death), but still. Speciation is portrayed by Darwin as the outcome of the accumulation of small, gradual changes. That's all the explanation he offers.

But the explanation is wrong. Or at least it doesn't accord well with the facts. It doesn't explain the Cambrian Explosion, for example, or the sudden appearance of intelligence in hominids, or the rapid recovery (and net expansion!) of the biosphere in the wake of at least five super-massive extinction events in the most recent 15% of Earth's existence.

One of the most frustrating aspects of evolutionary theory (this is no fault of the theory's, though) is that it is so hard to test in the laboratory. The fact is, no one has ever seen speciation happen in the laboratory, under repeatable conditions, and until that happens we're at a distinct disadvantage for understanding speciation. (Incidentally, I don't count plant hybridization or breeding anomalies in fruit flies whose sexuality is under the control of microbial endosymbionts as examples of speciation.)

When I was in school, we were taught that mutations in DNA were the driving force behind evolution, an idea that is now thoroughly discredited. The overwhelming majority of non-neutral mutations are deleterious (they reduce, not increase, survival). Most mutations lead to loss of function (this is easily demonstrated in the lab), not gain of function. Evolutionary theory is great at explaining things like the loss of eyesight by cave-dwelling creatures (e.g., bats). It's terrible at explaining gain of function.

Even if mutations were capable of driving evolution, they simply don't happen fast enough to account for observed rates of speciation. In bacteria, the measured rate of 16S rRNA divergence due to point mutations is only 1% per 50 million years. And yet, there were no flowering plants on earth as recently as 150 million years ago! Does it take a biologist to see the disconnect?

I bring all this up because I've spent some time recently doing genomics research aimed at exploring mechanisms for new-protein creation/differentiation (mechanisms not relying wholly nor even mainly on point mutations), and I wanted to set the stage for discussing that research here. Over the next week or so, I'll be presenting some new ideas and findings. Hopefully, we can put some much-needed flesh on Darwin by exploring testable notions of how new protein motifs can arise quickly (without reliance on magic).

reade more... Résuméabuiyad

Parsing the DNA Crazy Quilt

A measure of how little we know about the real-world workings of evolution is that science still can't explain why some organisms have huge imbalances in the chemical composition of their DNA. If you look at the genome of Clostridium botulinum (the botulism germ), 72% of the bases in its DNA are either 'A' or 'T': adenine or thymine. (The four possibilities are, of course, adenine, thymine, guanine, and cytosine.) Conversely, you can find many examples of organisms in which the DNA is mostly 'G' or 'C.' The question is why A, T, G, and C don't occur in roughly equal proportions (which is what you'd expect after millions of years of genetic averaging; you'd expect some sort of regression to the mean).

Just to give you an idea of what GC/AT imbalance really looks like, here's the gene for the enzyme adenine deaminase from Clostridium botulinum, with all the A and T values in red:

ATGTATAAAAATATACAAAGAGAAATCTATAAAAATACAAAAGGAGACGGGGATATGTTTAATAAATTTGATACAAAGCCTCTTTGGGAGGTAAGTAAAACTTTATCAAGTGTAGCACAGGGGCTTGAACCGGCTGATATGGTTATTATAAATTCAAGGCTTATAAATGTCTGTACAAGAGAAGTCATAGAAAACACAGATGTAGCAATTAGCTGTGGAAGAATTGCTTTAGTAGGTGATGCAAAACATTGCATAGGGGAAAACACAGAGGTAATTGATGCAAAAGGACAATATATTGCACCAGGTTTTTTAGATGGTCATATTCATGTTGAATCATCAATGTTAAGTGTAAGCGAATATGCTCGTTCAGTAGTTCCACATGGTACTGTCGGAATATATATGGATCCACATGAAATTTGTAATGTACTCGGATTAAATGGTGTACGTTATATGATTGAAGATGGCAAGGGTACTCCACTTAAAAATATGGTGACC ACACCATCCTGTGTACCAGCAGTTCCAGGTTTTGAAGATACAGGAGCGGCTGTAGGACCAGAAGATGTTAGAGAAACAATGAAGTGGGATGAAATAGTTGGATTAGGAGAAATGATGAACTTCCCAGGTATACTTTATTCTACAGATCATGCTCATGGAGTAGTAGGAGAAACTTTAAAAGCTAGTAAAACAGTAACAGGACATTATTCTTTACCTGAAACAGGAAAAGGATTAAATGGATATATTGCATCAGGTGTAAGATGTTGTCATGAATCCACAAGAGCGGAAGATGCTCTTGCTAAAATGCGCCTTGGAATGTATGCAATGTTTAGAGAAGGATCTGCATGGCATGACTTAAAGGAAGTAAGTAAAGCCATTACAGAAAATAAGGTAGATAGTAGATTTGCTGTTTTAATATCTGATGATACTCACCCACACACATTGCTTAAGGATGGACATTTAGATCATATTATAAAACGTGCTATAGAAGAAGGG ATAGAGCCATTAACTGCAATTCAAATGGTAACAATAAATTGTGCACAATGTTTCCAAATGGATCATGAATTAGGTTCTATAACTCCAGGAAAATGTGCAGATATTGTATTTATAGAAGATTTAAAAGATGTAAAAATAACAAAGGTTATTATAGATGGAAATTTAGTTGCAAAGGGTGGACTATTAACTACTTCAATAGCTAAATATGATTATCCTGAAGATGCTATGAATTCAATGCATATTAAGAATAAAATAACACCAGATTCCTTTAATATTATGGCTCCTAATAAAGAAAAAATAACTGCAAGGGTTATTGAAATTATACCTGAAAGAGTTGGTACATATGAGAGACATGTTGAACTTAATGTTAAAGATGATAAAGTTCAATGTGATCCAAGTAAAGATGTTTTAAAAGCAGTTGTATTTGAAAGACACCATGAAACAGGAACAGCAGGATATGGTTTTGTTAAAGGTTTTGGTATTAAGAGAGGAGCTATGGCTGCAACAGTTGCCCATGATGCTCACAACTTATTAGTTATAGGAACAAATGATGAAGATATGGCATTAGCTGCTAATACATTAATAGAATGTGGTGGAGGAATGGTAGCCGTACAAGATGGTAAAGTATTAGGCTTAGTTCCATTACCAATAGCAGGACTTATGAGTAATAAGCCTTTAGAAGAAATGGCAGAAATGGTAGAAAAACTAGATAGTGCATGGAAAGAAATAGGATGTGATATAGTTTCACCATTTATGACAATGGCACTTATTCCACTTGCCTGCCTACCAGAATTAAGACTAACTAATAGAGGGTTAGTTGATTGTAATAAGTTTGAATTTGTATCATTATTTGTAGAAGAATAA

View gene at FastaView.


The organism Actinomyces oris (which occurs in the film that builds up on teeth) has an adenine deaminase gene that looks like this:

ATGGCCGATCAACCGTCCGCAGACCTGCTTATCAAGGACGCGCGCATCGTCCCTTTCCGGTCCCGTACCGAACTGGGTGCGCTGCGCCGAGGTGACCCTCACCCCGGCGCCTTGGCCGCGCCGCCGCCCCCGGGTGAGCCCGTGGATGTGCGTATCAAGGCGGGCCGGGTCGTCGAGGTGGGACAGGGGCTGAGTGCTCCCGGGACACGGGTCCTTGAGGCCGAGGGCTCCTTCCTCATTCCCGGCCTGTGGGACGCTCACGCCCACCTGGACATGGAGGCGGCGCGCTCGGCACGC ATCGACACGCTGGCCACCCGCAGCGCGGAGGAGGCCCTGGAGCTGGTGGCACGGGCGCTGCGGGATCATCCGGCCGGTTCGCCTCCGGCCACGATCCAG GGCTTCGGGCACCGCCTGTCCAACTGGCCCCGGGTGCCCACGGTGGCCGAGCTCGACGCCGTCACCGGGGAGGTTCCCACGCTGCTCATCTCCGGGGAC GTGCACTCCGGGTGGCTGAACTCGGCGGCGCTGCGTGTCTTCGGCCTGCCGGGGGCCAGCGCCCAGGACCCGGGAGCACCGATGAAGGAGGACCCGTGG TTCGCCCTACTCGACCGCCTCGATGAGGTCCCGGGGACACGCGAGCTGCGGGAGTCCGGCTACCGACAGGTCCTGGCCGACATGCTGTCCCGGGGCGTC ACCGGCGTGGTGGACATGAGCTGGTCGGAGGATCCCGATGACTGGCCGCGGCGCCTGCGGGCCATGGCGGACGAGGGCGTACTCCCCCAGGTGCTGCCC CGCATCCGCATCGGGGTCTACCGCGACAAGCTGGAACGGTGGATCGCCCGGGGCCTGCGCACCGGGACCGCGCTGGCAGGCTCACCCCGCCTGCCCGAC GGTTCCCCGGTGCTGGTGCAGGGGCCGCTCAAGGTGATCGCAGACGGCTCGATGGGCTCGGGCAGCGCACACATGTGCGAGCCCTATCCCGCCGAGCTG GGCCTGGAGCACGCCTGCGGCGTGGTCAACATCGACCGGGCCGAGCTCACCGACCTCATGGCCCACGCCTCCCGGCAGGGTTATGAGATGGCCATCCAC GCCATCGGGGACGCGGCGGTCGACGACGTCGCCGCGGCCTTCGCGCACTCGGGTGCCGCCGGGCG

For whatever reason (and that's the point: we have no idea why), Actinomyces has chosen an AT-poor dialect for its DNA, even though it has to make many of the same types of genes as Clostridium.

Some people don't see this as a major puzzle: One organism evolved its DNA to a super-AT-rich state, another one didn't. So what? It's all random drift.

I disagree. It's not drift. We know of two strong forces that should keep organisms like Actinomyces from developing high G+C content. First is "AT pressure." It's known that mutations naturally tend to go in the GC-->AT direction. (One study found that in Salmonella typhimurium, GC-->AT mutations outnumbered AT-->GC mutations 50 to 1.) In the absence of corrective measures, natural mutations would very quickly lead all organisms in the direction of DNA with a very low G+C content.

A second important force is that of lateral gene transfer, which we know is common in microorganisms; common enough, certainly, to "even out" GC/AT ratios over evolutionary timescales. Random uptake of foreign genes by cells should tend to make A, G, C, and T levels equal, over time. For organisms like Clostridium and Actinomyces (and many others), this clearly hasn't happened.

In an earlier post I mentioned one possible reason organisms drift away from the 50-50 GC/AT centerline. DNA replication is more efficient when the template is biased toward one extreme (GC) or the other (AT), assuming endogenous nucleotide levels can be regulated in a similarly biased fashion (which they presumably are, in these organisms).

One might speculate that GC/AT extremism also simplifies DNA maintenance and repair. Imagine that your DNA is 70% G+C. A super-simple DNA repair tactic for deaminated purines would be to just replace every defective purine with a guanine. Seven out of ten times, blind replacement of defective purines with guanine would be the correct repair, if you're Actionymyces. And one out of three times, mistakes wouldn't matter anyway, because high-GC codons tend to be fourfold degenerate. (In a fourfold degenerate codon, you can replace the third base with anything—A, G, C, or T—without changing the codon's meaning.) Blind guanine substitution would have a better than 80% success rate in a high-GC organism that needed to replace defective purines.

It turns out there are other reasons to live "away from centerline," if you're a bacterium. I'll talk about those in another post.
reade more... Résuméabuiyad