Pages

.

Showing posts with label DNA repair. Show all posts
Showing posts with label DNA repair. Show all posts

Strand Asymmetry in Mitochondrial DNA

Funny how the availability of so much free DNA data can go to your head. When I learned that DNA sequence data for more than 2,000 mitochondrial genomes could be accessed, free, at genomevolution.org, I couldn't resist: I wrote some scripts that checked the DNA composition of 2,543 mtDNA (mitochondrial DNA) sequences. What I found blew me away.

If you're a biologist, you're accustomed to thinking of genome G+C (guanine plus cytosine) content as a kind of phylogenetic signature. (Related organisms usually have G+C values that are fairly close to one another.) For purposes of the following discussion, I'm going to reference A+T content, which is, of course, just one-minus-GC. (A GC content of 0.25, or 25%, means the AT content is 0.75, or 75%).

What I learned is that mitochondrial DNA shows strand asymmetry in coding regions (regions that actually get transcribed to RNA, as opposed to non-coding "control" regions and junk DNA). In particular, it shows an excess of pyrimidines (T and C) on the "message strand." This is the exact opposite of the situation in Archaea and bacteria, where message strands tend to accumulate purines (G and A).

The interesting thing is, just like bacteria (and Archaea), mitochondrial genomes tend to show a steady, predictable rate of increase of purines on the message strand with increasing A+T, even though purines are outnumbered by pyrimidines on the message strand. A picture might make this clearer:

Purine (A+G) content versus A+T for the message strand of mitochondrial DNA coding regions (N=2543).

Every point in this graph represents a mitochondrial genome (2,543 in all). As you can see, the regression line (which minimizes the sum of squared error) is upward-sloping, with a rise of 0.149, meaning that for every 10% increase in genome A+T content, there's a corresponding 1.49% increase in message-strand purine (A+G) content. What's striking about this is that in a similar graph for 1,373 bacterial genomes (see this post), the regression-line slope turned out to be 0.148.  Chargaff's second parity law predicts a straight horizontal line at y=0.5. Obviously that law is kaput.

I've written before about my repeated finding (in bacteria, Archaea, eukaryotes, viruses, bacteriophage; basically every place I look) that message-strand purine content accumulates in proportion to genome A+T content. Strand asymmetry with respect to purines and pyrimidines seems to be universal. But why?

Strand-asymmetric buildup of purines or pyrimidines is very hard to explain without invoking either a theory of strand-asymmetric DNA repair or a theory of strand-asymmetric mutagenesis, or both. Is it reasonable to suppose that one strand of DNA is more vulnerable to mutagenesis than another? Yes, if you accept that in a growing cell, the strands spend a good portion of their time apart (during transcription and replication). Neither replication nor transcription is symmetric in implementation. I'll spare you the details for the replication side of the argument, but suffice it to say, replication-related asymmetries are not likely (in my opinion) to be behind the purine/pyrminidine strand asymmetries I've been documenting. What we're seeing, I think, is the result of asymmetric repair at transcription time.

During transcription, a gene's DNA strands are separated. One strand is used as a template by RNA polymerase to create messenger RNA and ribosomal RNA. The other strand is free and floppy and vulnerable to attack by mutagens. But it's also readily accessible to repair enzymes.


The above diagram oversimplifies things considerably, but I include it for the benefit of non-biogeeks who might want to follow this argument through. Note that DNA strands have directionality: the sugar bonds face one way in one strand and the other way in the other strand. This is denoted by the so-called 5'-to'3 orientation of strands.(RNAP = RNA polymerase.)

DNA repair is a complex subject. Be assured, every cell, of every kind, has dozens of different kinds of enzymes devoted to DNA repair. Without these enzymes, life as we know it would end, because DNA is constantly undergoing attack and requiring repair.

The Ogg family of DNA base-excision enzymes exhibit
a signature helix-hairpin-helix topology (HhH). See
Faucher et al., Int J Mol Sci 2012; 13(6): 6711–6729.
Some types of repair take place in double-stranded DNA (that is, DNA that is not undergoing replication or transcription). Other types of repair apply to single-stranded DNA. In bacteria as well as higher life forms, there's a transcription-coupled repair system (TCRS) that comes into play when RNA polymerase is stalled by thymine dimers or other DNA damage. This remarkably elaborate system changes out short sections of damaged DNA (at considerable energy cost). Because it involves replacing whole nucleotides (sugar and all), it's categorized as a Nucleotide Execision Repair system (NER). The alternative to NER is Base Excision Repair (BER), which is where a defective base (usually an oxidized guanine) gets snipped out without removing any sugars from the DNA backbone. The enzymes that perform this base-clipping are generically known as glycosylases.

For many years, it was thought that mitochondria did not have DNA repair systems. We now know that's not true. Mitochondrial DNA is subject to constant oxidative attack and it turns out the damage is quickly repaired, in double-stranded DNA. Evidence for repair of single-stranded mtDNA is scant. Those who have looked for a transcription-coupled repair system (or indeed any NER system) in mitochondria have not found one. Mitochondrial BER repair (via Ogg1) does exist, but it seems to operate when the DNA is double-stranded, not during transcription. This makes sense, because for BER to finish, the strand must be nicked by AP endonuclease after the bad base is popped out, then the repair proceeds by matching the opposing base (opposite the abasic site) using the other strand as template. In Clostridia and Archaea (which have an Ogg enzyme that other bacteria do not have; see this post and this paper), Ogg1 can pop out a bad base while the DNA is single-stranded; Ogg1 then binds to the abasic site and is only released by AP endonuclease when it arrives later on.

Bottom line, we know that mitochondrial DNA spends much of its time in the unwound state (because mtDNA products are very highly transcribed) and that the non-transcribed DNA strand is extremely vulnerable to oxidative attack. (The template strand is less vulnerable, because it is cloaked in enzymes: RNA polymerase, transcription factors, ribosomes, etc.) We also know that 8-oxoguanine is the most prevalent form of oxidative damage in mtDNA and that, uncorrected, such damage leads to G-to-T transversion. The finding of consistently high pyrimidine content in the message strand of mitochondrial DNA (see graph further above) is consistent with a slower rate of repair of the non-transcribed strand, and the differential occurrence of G-to-T transversions on that strand. Or at least, that's a possible explanation of the pyrimidine richness of the message strand of mtDNA.

But there are additional factors to consider, such as selection pressure. Mitochondrial DNA tends to encode membrane-associated proteins, and membrane proteins use nonpolar amino acids, which are (in turn) predominantly encoded by pyrimidine-rich codons. More about this in an upcoming post.
reade more... Résuméabuiyad

Highly Expressed Genes: Better-Repaired?

At any given time in any cell, some genes are highly expressed while others are moderately expressed, still others are barely expressed, and quite a few are not expressed at all. The fact that genes vary tremendously in their levels of expression is nothing new, of course, but we still have a lot to learn about how and why some genes have the "Transcribe Me!" knob cranked wide open and others remain dormant until called upon. (For a great paper on this subject, I recommend Samuel Karlin and Jan Mrázek, "Predicted Highly Expressed Genes of Diverse Prokaryotic Genomes," J. Bact. 2000 182:18, 5238-5250, free copy here.)

Reading up on this subject got me to thinking: If DNA undergoes damage and repair at transcription time (when genes are being expressed), shouldn't highly expressed genes differ in mutation rate from rarely expressed genes? (But, in which direction?) Also: Does one strand of highly expressed DNA (the strand that gets transcribed) mutate or repair at a different rate than the other strand?

We know that in most organisms, there is quite an elaborate repair apparatus dedicated to fixing DNA glitches at transcription time. (This is the so-called Transcription Coupled Repair System.) We also know that the TCRS has a preference for the template strand of DNA, just as RNA polymerase does. In fact, it's when RNA polymerase stalls at the site of a thymine dimer (or other major DNA defect) that TCRS kicks into action. Stalled RNAP is the trigger mechanism for TCRS.

But TCRS isn't the only repair option for DNA at transcription time. I've written before about the Archaeal Ogg1 enzyme (which detects and snips out oxidized guanine residues from DNA). The Ogg1 system is a much simpler Base Excision Repair system, fundamentally low-tech compared to the heavy-duty TCRS mechanism. The latter involves nucleotide-excision repair (NER), which means cutting sugars (deoxyribose) out of the DNA backbone and replacement of a whole section of DNA (at great energy cost). BER just snips bases and leaves the underlying sugar(s) in place.

Being a fan of desktop science, I wanted to see if I couldn't devise an experiment of my own to shed light on the question: Does differential repair of DNA strands at transcription time lead to strand asymmetry in highly expressed genes?

Methanococcus maripaludis
Happily, there's a database of highly expressed genes at http://genomes.urv.cat/HEG-DB, which is the perfect starting point for this sort of investigation. For my experiment, I chose the microbe Methanococcus maripaludis strain C5, This tiny organism (isolated from a salt marsh in South Carolina) is a strict anaerobe that lives off hydrogen gas and carbon dioxide. It has a relatively small genome (just under 1.7 million base pairs, enough to code for around 1400 genes). The complete genome is available from here (but don't click unless you want to start a 2-meg download). More to the point, a list of 123 of the creature's most highly expressed genes (HEGs) is available from this page (safe to click; no downloads). The HEGs are putative HEGs inferred from Codon Adaptation Index analysis relative to a reference set of (known-good) high-expression genes. For more details on the HEG ranking process see this excellent paper.

The DNA sequence data for M. maripaludis was easy to match up against the list of HEGs obtained from http://genomes.urv.cat/HEG-DB. In fact, I was able to do all the data-crunching I needed to do with a few lines of JavaScript, in the Chrome console. In no time, I had the adenine (A), guanine (G), and thymine (T) content for all of M. maripaludis's genes, which allowed me to make the following graph:

Purine content (y-axis) plotted against adenine-plus-thymine content for all genes of Methanococcus maripaludis. Each dot represents a gene. The red dots represent the most highly expressed genes. Click to enlarge.

What we're looking at here is message-strand purine content (A+G) on the y-axis versus A+T content (which is a common phylogenetic metric, akin to G+C content) on the x-axis. As you know if you've been following this blog, I have used purine-vs.-AT plots quite successfully to uncover coding-region strand asymmetries. (See this post and/or this one for details.) The important thing to notice above is that while points tend to fall in a shotgun-blast centered roughly at x=0.66 and y=0.55, the Highly Expressed Genes (HEGs, in red) cover the upper left quadrant of the shotgun blast.

What does it mean? Consider the following. Of the four bases in DNA, guanine (G) is the most vulnerable to oxidative damage. When such damage is left uncorrected, it eventually results in a G-to-T transversion mutation. A large number of such mutations will cause overall A+T to increase (shifting points on the above graph to the right). If G-to-T transversions accumulate preferentially on one strand, the strand in question will see a reduction in purine content (as G, a purine, is replaced by T, a pyrimidine) while the other strand will see a corresponding increase in purine content (via the addition of adenines to pair with the new T's). Bottom line, if G-to-T transversions happen on the message strand, points in the above graph will move to the right and down. If they happen on the template (or transcribed) strand, points will move left and up. What we see in this graph is that HEGs have gone left and up.

The fact that highly expressed genes appear in the upper left quadrant of the distribution means that yes, differential repair is indeed (apparently) happening at transcription time; highly expressed genes are more intensively repaired; and the beneficiary of said repair(s), at least in M. maripaludis, is the message strand (also called the RNA-synonymous or non-transcribed strand) of DNA, which is where our sequence data come from, ultimately. A relative excess of unrepaired 8-oxoguanine on the template strand (or transcribed strand) means guanines are being replaced by thymines on that strand, and new adenines are showing up opposite the thymines, on the message strand, boosting A+G.

I don't know too many other explanations that are consistent with the above graph.

I hasten to add that one graph is just one graph. A single graph isn't enough to prove any kind of universal phenomenon. What we see here applies to Methanococcus maripaludis, an Archaeal anaerobe that may or may not share similarities (vis-a-vis DNA repair) with other organisms.




reade more... Résuméabuiyad

Do-It-Yourself Phylogenetic Trees

I've been doing a lot of desktop science lately, and I'm happy to report that superb, easy-to-use online tools exist for creating your own phylogenetic trees based on gene similarities, something that's non-trivial to implement yourself.

The other day, I speculated that the fruit-fly Ogg1 gene, which encodes an enzyme designed to repair oxidatively damaged guanine residues in DNA, might derive from Archaea. The Archaea (in case you're not a microbiologist) comprise one of three super-kingdoms in the tree of life. Basically, all life on earth can be classified as either Archaeal, Eukaryotic, or Eubacterial. The Eubacteria are "true bacteria": they're what you and I think of when we think "bacteria." (So, think Staphylococcus and tetanus bacteria and E. coli and all the rest.) The Eukaryota are higher life forms, starting with yeast and fungi and algae and plankton, progressing up through grass and corn and pine trees, worms and rabbits and donkeys, all the way to the highest life form of all, Stephen Colbert. (A little joke there.) Eukaryotes have big, complex cells with a distinct nucleus, complex organelles (like mitochondria and chloroplasts), and a huge amount of DNA packaged into pairs of chromosomes.

Archaea look a lot like bacteria (they're tiny and lack a distinct nucleus, organelles, etc.), and were in fact considered bacteria until recently. But around the turn of the 21st century, Carl Woese and George E. Fox provided persuasive evidence that members of this group of organisms were so different in genetic profile (not to mention lifestyle) that they deserved their own taxonomic domain. Thus, we now recognize certain bacteria-like creatures as Archaea.

The technical considerations behind the distinction between bacteria and archeons are rather deep and have to do with codon usage patterns, ribosomal RNA structure, cell-wall details, lipid metabolism, and other esoterica, but one distinguishing feature of archeons that's easy to understand is their willingness to live under harsh conditions. Archaeal species tend to be what we call extremophiles: They usually (not always) take up residence in places that are incredibly salty, or incredibly hot, or incredibly alkaline or acidic.

While it's generally agreed that eukaryotes arose after Archaea and bacteria appeared, it's by no means clear whether Archaea and bacteria branched off independently from a common ancestor, or perhaps one arose from the other. (A popular theory right now is that Archaea arose from gram-positive bacteria and sought refuge in inhospitable habitats to escape the chemical-warfare tactics of the gram-positives.) A complication that makes studying this sort of thing harder is the fact that horizontal gene transfer has been known to happen (with surprising frequency, actually) across domains.

Is it possible to study phylogenetic relationships, yourself, on the desktop? Of course. One way to do it: Obtain the DNA sequences of a given gene as produced by a variety of organisms, then feed those gene sequences to a tool like the tree-making tool at http://www.phylogeny.fr. Voila! Instant phylogeny.

The Ogg1 gene is an interesting case, because although the DNA-repair enzyme encoded by this gene occurs in a wide variety of higher life forms, plus Archaea, it is not widespread among bacteria. Aside from a couple of Spirochaetes and one Bacteroides species, the only bacteria that have this particular gene are the members of class Clostridia (which are all strict anaerobes). Question: Did the Clostridia get this gene from anaerobic Archaea?

Using the excellent online CoGeBlast tool, I was able to build a list of organisms that have Ogg1 and obtain the relevant gene sequences, all with literally just a few mouse clicks. Once you run a search using CoGeBlast, you can check the checkboxes next to organisms in the results list, then select "Phylogenetics" from the dropdown menu at the bottom of the results list. (See screenshot.)


When you click the Go button, a new FastaView window will open up, containing the gene sequences of all the items whose checkboxes you checked in CoGeBlast. At the bottom of this FastaView window, there's a small box that looks like this:


Click Phylogeny.fr button (red arrow). Immediately, your sequences are sent to the French server where they'll be converted to a phylogenetic tree in a matter of one to two minutes (usually). The result is a tree that looks something like this:


I've color-coded this tree to make the results easier to interpret. Creating a tree of this kind is not without potential pitfalls, because for one thing, if your DNA sequences are of vastly unequal lengths, the groupings made by Phylogeny.fr are likely to reflect gene lengths more than true phylogeny. For this tree, I did various data checks to make sure we're comparing apples and apples. Even so, a sanity check is in order. Do the groupings make sense? They do, actually. At the very top of the diagram (color-coded in green) we find all the eukaryotes grouped together: fruit-fly (Drosophila), yeast (Saccharomyces), fungus (Aspergillus). At the bottom of the diagram, Clostridium species (purplish red) fall into a subtree of their own, next to a tiny subtree of Methoanobrevibacter. This actually makes a good deal of sense, because the two Methanobrevibacter species shown are inhabitants of feces, as are the nearby Clostridium bartletti and C. diff. The fact that all the salt-loving Archaea members group together (organisms with names starting with 'H') is also indicative of a sound grouping. Overall, the tree looks sound.

If you're wondering what all the numbers are, the scale bar at the bottom (0.4) shows the approximate percentage difference in DNA sequences associated with that particular length of tree depth. The red numbers on the tree branches are indicative of the probability that the immediately underlying nodes are related. Probably the most important thing to know is that the evolutionary distance between any two leaves in the tree is proportional to the sums of the branch lengths connecting them. (The branch lengths are not explicitly specified; you have to eyeball it.) At the top of the diagram, you can see that the branch lengths of the two Drosophila instances are very short. This means they're closely related. By contrast, the branch lengths for Saccharomyces and the ancestor to Drosophila are long, meaning that these organisms are distantly related.

Just to give you an idea of the relatedness, I checked the C. botulinum Ogg1 protein amino-acid sequence against C. tetani, and found 63% identity of amino acids. When I compared C. botulinum's enzyme against C. difficile's, there was 52% identity. With Drosophila there is only 32% identity, and even that applies only to a 46% coverage area (versus 90%+ for C. tetani and C. diff). Bottom line, the Blast-wise relatedness does appear to correspond, in sound fashion, to tree-wise relatedness.

Two things stand out. One is that not all of the Clostridium species group together. (There's a small cluster of Clostridia near the salt-lovers, then a main branch near the methane-producing Archaea. The out-group of Clostridia near the salt-lovers happen to all have chromosomal G+C content of 50% or more, which makes them quite different from the rest of the Clositridia, whose G+C is under 30%.) The other thing that stands out is that it does appear as if Clostridial Ogg1 could be Archaeal in origin, based on the relationship of Methanoplanus and Methanobrevibacter to the main group of Clostridia. (Also, the C. leptum group's Ogg1 may share an ancestor with the halophilic Archaea.) One thing we can say for sure is that Ogg1 is ancient.

It's tempting to speculate that the eukaryotes obtained Ogg1 from early mitochondria, and that early mitochondria were actually Archaeal endosymbionts. The first part is easily true, because we know that early mitochondria quickly exported most of their DNA to the host nucleus. (Today's mitochondrial DNA is vestigial. Well over 90% of mitochondrial genes are actually in the host nucleus. Things like mitochondrial DNA polymerase have to be transcribed from nucleus-generated RNA.) Whether or not early mitochondria were Archaeal endosymbionts, no one knows.

Anyway, I hope this shows how easy it is to generate phylogenetic trees from the comfort of a living room sofa, using nothing more than a laptop with wireless internet connection. Try making your own phylo-trees using CoGeBlast and Phylogeny.fr—and let me know what you find out.
reade more... Résuméabuiyad

Shedding Light on DNA Strand Asymmetry

In 1950, Erwin Chargaff was the first to report that the amount of adenine (A) in DNA equals the amount of thymine (T), and the amount of guanine (G) equals the amount of cytosine (C). This result was instrumental in helping Watson and Crick (and Rosalind Franklin) determine the structure of DNA.

It's pretty easy to understand that every A on one strand of DNA pairs with a T on the other strand (and every G pairs with an opposite-strand C); this explains DNA complementarity and the associated replication model. But somewhere along the line, Chargaff was credited with the much less obvious rule that A = T and G = C even for individual strands of DNA that aren't paired with anything. This is the so-called second parity rule attributed to Chargaff, although I can't find any record of Chargaff himself having postulated such a rule. The Chargaff papers that are so often cited as supporting this rule (in particular the 3-paper series culminating in this report in PNAS) do not, in fact, offer such a rule, and if you read the papers carefully, what Chargaff and colleagues actually found was that one strand of DNA is heavier than the other (they label the strands 'H' and 'L', for Heavy and Light); not only that, but Chargaff et al. reported a consistent difference in purine content between strands (see Table 1 of this paper).

When I interviewed Linus Pauling in 1977, he cautioned me to always read the Results section of a paper carefully, because people will often conclude something entirely different than what the Results actually showed, or cite a paper as showing "ABC" when the data actually showed "XYZ."

How right he was.

At any rate, it turns out that the "message" strand of a gene hardly ever contains equal amounts of purines and pyrimidines. Codon analysis reveals that as genes become richer in A+T content (or as G+C content goes down), the excess of purines on the message strand becomes larger and larger. This is depicted in the following graph, which shows message-strand purine content (A+G) plotted against A+T content, for 1,373 distinct bacterial species. (No species is represented twice.)

Codon analysis reveals that as A+T content increases, message-strand purine content (A+G) increases. Each point on this graph represents a unique bacterial species (N=1373).

It's quite obvious that when A+T content is above approximately 33%, as it is for most bacterial species, the message strand tends to be comparatively purine-rich. Below A+T = 33%, the message strand becomes more pyrimidine-rich than purine-rich. (Note: In bacteria, where most of the DNA is in coding regions, codon-derived A+T content is very close to whole-genome A+T content. I checked the 1,373 species graphed here and found whole-chromosome A+T to differ from codon-derived A+T by an average of less than 7 parts in 10,000.)

The correlation between A+T and purine content is strong (r=0.85). Still, you can see that quite a few points have drifted far from the regression line, especially in the region of x = 0.5 to x = 0.7, where lots of points lie above y = 0.55. What's going on with those organisms? I decided to do some investigating.

First, some basics. Over time, transition mutations (AT↔GC) can change an organism's A+T content and thus move it along the x-axis of the graph, but transitions cannot move an organism higher or lower on the graph, because (by definition) transitions don't affect the strandwise purine balance.

Transversions, on the other hand, can affect strandwise purine balance (in theory, at least), but only if they occur more often on one strand of DNA than the other. (I should say: occur more often, or are fixed more often, on one strand versus the other.) For example, let's say G-to-T transversions are the most common kind of transversion (which is probably true, given that guanine is the most easily oxidized of the four bases and given the fact that failure to repair 8-oxoguanine lesions does lead to eventual replacement with thymine). And let's say G-to-T transversions are most likely to occur on the non-transcribed strand of DNA, at transcription time. (The non-transcribed strand is uncoiled and unprotected while transcription is taking place on the other strand.) Over time, the non-transcribed strand would lose guanines; they'd be replaced by thymines. The message strand, or RNA-synonymous strand (which is also the non-transcribed strand) would become pyrimidine-rich and the other strand would become purine-rich.

Unfortunately, while that's exactly what happens for organisms with A+T content below 33%, precisely the opposite happens (purines accumulate on the message strand) in organisms with A+T above 33%. And in fact, in some high-AT organisms, the purine content of message strands is rather extreme. How can we explain that?

One possibility is that some organisms have evolved extremely effective transversion repair systems for the message (non-transcribed) strand of genes—systems that are so effective, no G-to-T transversions go unrepaired on the message strand. The transcribed strand, on the other hand, doesn't get the benefit of this repair system, possibly because the repair enzymes can't access the strand: it's engulfed in transcription factors, topoisomerases, RNA polymerase, nearby ribosomal machinery, etc.

If the non-transcribed strand never mutates (because all mutations are swiftly repaired), then the transcribed strand will (in the absence of equally effective repairs) eventually accumulate G-to-T mutations, and the message strand will accumulate adenines (purines). Perhaps.

In the graph further above, you'll notice at x = 0.6 a tiny spur of points hangs down at around y = 0.5. These points belong to some Bartonella species, plus a Parachlamydia and another chlamydial organism. These are endosymbionts that have lost a good portion of their genomes over time. It seems likely they've lost some transversion-repair machinery. During transcription, their message strands are going unrepaired. G-to-T transversions happen on the message strand, rendering it light in purines. Such a scenario seems plausible, at least.

By this reasoning, maybe points far above the regression line represent organisms that have gained repair functionality, such that their message strands never undergo G-to-T transversions (although their transcribed strands do). Is this possible?

Examination of the highest points on the graph shows a predominance of Clostridia. (Not just members of the genus Clostridium, but the class Clostridia, which is a large, ancient, and diverse class of anaerobes.) One thing we know about the Clostridia is that unlike all other bacteria (unlike members of the Gammaproteobacteria, the Alpha- and Betaproteobacteria, the Actinomycetes, the Bacteroidetes, etc.), the Clostridia have Ogg1, otherwise known as 8-oxoguanine glycosylase (which specifically prevents G-to-T transversions). They share this capability with all members of the Archaea, and all higher life forms as well.

Note that while non-Ogg1 enzymes exist for correcting 8-oxoguanine lesions (e.g., MutM, MutY, mfd), there is evidence that Ogg1 is specifically involved in repair of 8oxoG lesions in non-transcribed strands of DNA, at transcription time. (The other 8oxoG repair systems may not be strand-specific.)

If Archaea benefit from Ogg1 the way Clostridia do, they too should fall well above the regression line on a graph of A+G versus A+T. And this is exactly what we find. In the graph below, the pink squares are members of Archaea that came up positive in a protein-Blast query against Drosophila Ogg1. (I'll explain why I used Drosophila in a minute.) The red-orange circles are bacterial species (mostly from class Clostridia) that turned up Ogg1-positive in a similar Blast search.

Ogg1-positive organisms are plotted here. The pink squares are Archaea species. Red-orange circles are bacterial species that came up Ogg1-positive in a protein Blast search using a Drosophila Ogg1 amino-acid sequence. In the background (greyed out) is the graph of all 1,373 bacterial species, for comparison. Note how the Ogg1-positive organisms have a higher purine (A+G) content than the vast majority of bacteria.

The points in this plot are significantly higher on the y-axis than points in the all-bacteria plot (and the regression line is steeper), consistent with a different DNA repair profile.

In identifying Ogg1-positive organisms, I wanted to avoid false positives (organisms with enzymes that share characteristics of Ogg1 but that aren't truly Ogg1), so for the Blast query I used Drosophila's Ogg1 as a reference enzyme, since it is well studied (unlike Archaeal or Clostridial Ogg1). I also set the E-value cutoff at 1e-10, to reduce spurious matches with DNA repair enzymes or nucleases that might have domain similarity with Ogg1 but aren't Ogg1. In addition, I did spot checks to be sure the putative Ogg1 matches that came up were not actually matches of Fpg (MutM), RecA, RadA, MutY, DNA-3-methyladenine glycosidase, or other DNA-binding enzymes.

Bottom line, organisms that have an Archaeal 8-oxoguanine glycosylase enzyme (mostly obligate anaerobes) occupy a unique part of the A+G vs. A+T graph. Which makes sense. It's only logical that anaerobes would have different DNA repair strategies (and a different "repairosome") than oxygen-tolerant bacteria, because oxidative stress is, in general, handled much differently in anaerobes. The fact that they bring different repair tactics to bear on DNA shouldn't come as a surprise.


reade more... Résuméabuiyad

DNA Repair 101

You don't have to be a biologist to know that anything that can damage DNA is potentially harmful, because it can cause mutations (which are, in fact, mostly harmful; very few mutations are beneficial). Fortunately, cells contain dozens of different kinds of repair enzymes, and most DNA damage is repaired quickly. When damage isn't repaired quickly (or properly), you have a mutation.

It's not much of a stretch to say that DNA repair enzymes play a front-and-center role in evolution (or at least the portion of evolution that's driven by mutations). Which is why molecular geneticists tend to pay a lot of attention to DNA repair processes. Anything that can affect the composition of DNA can change the course of evolution.

DNA is remarkably stable, chemically. Nonetheless, it is vulnerable to oxidative attack (by hydroxyl radicals, superoxides, nitric oxide, and other Reactive Oxygenated Species generated in the course of cell metabolism—never mind exogenous poisons).

Of the four bases in DNA—guanine (G), cytosine (C), adenine (A), thymine (T)—guanine is the most susceptible to oxidative attack. When it's exposed to an oxidant, it can form 7,8-dihydro-8-oxoguanine, OG for short. What can happen then is, the OG residue in DNA pivots around its ribosyl bond until the amino group is facing the other way (see diagram), and when that happens, OG can pair up with adenine instead of guanine's usual partner, cytosine.

When guanine is oxidized to form 7,8-dihydro-8-oxoguanine,
it mispairs with adenine instead of its usual partner, cytosine.
Rest assured, there are proofreading enzymes that can and will detect such funny business in short order. But if OG isn't detected and replaced with a normal guanine before replication occurs, OG may get paired up with an adenine during replication (and then it'll eventually be swapped out with thymine, adenine's usual partner). That's bad, because what it means is that a G:C pair ended up getting changed to a T:A pair. (The place of the G got taken first by OG and then T. The place of G's opposite-strand partner, C, eventually got taken by A.) In so many words: that's a mutation.

It turns out there's a special enzyme designed to prevent the G↔T funny business we've just been talking about. It's called oxoguanine glycosylase, or Ogg1 for short. You'll sometimes see it called 8-oxoguanine-DNA-glycosylase, and from a capabilities standpoint it's often (wrongly) compared to the Fpg enzyme (formamidopyrimidine-DNA glycosylase), which is not the same as Ogg1 at all. 

Just about all higher life forms have an Ogg1 enzyme (which clips OG out of DNA and ensures it gets replaced with a brand-new guanine before any funny business can happen). Surprisingly few bacteria have this enzyme, instead preferring to let the more general-purpose Fpg (MutM) take its place. If you run a Blast search of a reference Ogg1 gene (the Drosophila version works well) against all bacterial genomes, you'll get only a few hundred matches (out of around 10,000 sequenced bacterial genomes), the vast majority belonging to members of the class Clostridia (a truly fearsome group of anaerobic spore-formers containing the botulism germ, the tetanus bacterium, the notorious C. difficile—also known as C. diff—and some other creatures you probably don't want to meet). If you run the same Blast search against Archaea (this is the other major "germ-like" microbial domain, along with true bacteria), you'll get hits against almost every member species of the Archaea. Personally, I think it's likely the Ogg1 enzyme originated with a common ancestor of today's Archaea and Eukaryota, and arrived in Clostridia by lateral gene transfer (not terribly recently, though).

One thing is certain: E. coli does not have Ogg1, nor does Staphylococcus, nor Streptococcus, nor any germ you've ever heard of (other than the aforementioned Clostridia members, plus Archaea). And yet, every yeast and fungus has it, every plant, every fruit fly, every fish, every human—every higher life form. Ironically, only five members of Archaea turned up positive for the Fpg enzyme when I did a check, whereas almost all Eubacteria ("true bacteria") have it, including Clostridia. Bottom line, Clostridia have the best of both worlds: Fpg, plus Ogg1. Belt and suspenders, both.

This is just a tiny intro to the subject of DNA repair, which is a vast subject indeed. For more, see this article, or just start rummaging around in Google Scholar.
reade more... Résuméabuiyad

Parsing the DNA Crazy Quilt

A measure of how little we know about the real-world workings of evolution is that science still can't explain why some organisms have huge imbalances in the chemical composition of their DNA. If you look at the genome of Clostridium botulinum (the botulism germ), 72% of the bases in its DNA are either 'A' or 'T': adenine or thymine. (The four possibilities are, of course, adenine, thymine, guanine, and cytosine.) Conversely, you can find many examples of organisms in which the DNA is mostly 'G' or 'C.' The question is why A, T, G, and C don't occur in roughly equal proportions (which is what you'd expect after millions of years of genetic averaging; you'd expect some sort of regression to the mean).

Just to give you an idea of what GC/AT imbalance really looks like, here's the gene for the enzyme adenine deaminase from Clostridium botulinum, with all the A and T values in red:

ATGTATAAAAATATACAAAGAGAAATCTATAAAAATACAAAAGGAGACGGGGATATGTTTAATAAATTTGATACAAAGCCTCTTTGGGAGGTAAGTAAAACTTTATCAAGTGTAGCACAGGGGCTTGAACCGGCTGATATGGTTATTATAAATTCAAGGCTTATAAATGTCTGTACAAGAGAAGTCATAGAAAACACAGATGTAGCAATTAGCTGTGGAAGAATTGCTTTAGTAGGTGATGCAAAACATTGCATAGGGGAAAACACAGAGGTAATTGATGCAAAAGGACAATATATTGCACCAGGTTTTTTAGATGGTCATATTCATGTTGAATCATCAATGTTAAGTGTAAGCGAATATGCTCGTTCAGTAGTTCCACATGGTACTGTCGGAATATATATGGATCCACATGAAATTTGTAATGTACTCGGATTAAATGGTGTACGTTATATGATTGAAGATGGCAAGGGTACTCCACTTAAAAATATGGTGACC ACACCATCCTGTGTACCAGCAGTTCCAGGTTTTGAAGATACAGGAGCGGCTGTAGGACCAGAAGATGTTAGAGAAACAATGAAGTGGGATGAAATAGTTGGATTAGGAGAAATGATGAACTTCCCAGGTATACTTTATTCTACAGATCATGCTCATGGAGTAGTAGGAGAAACTTTAAAAGCTAGTAAAACAGTAACAGGACATTATTCTTTACCTGAAACAGGAAAAGGATTAAATGGATATATTGCATCAGGTGTAAGATGTTGTCATGAATCCACAAGAGCGGAAGATGCTCTTGCTAAAATGCGCCTTGGAATGTATGCAATGTTTAGAGAAGGATCTGCATGGCATGACTTAAAGGAAGTAAGTAAAGCCATTACAGAAAATAAGGTAGATAGTAGATTTGCTGTTTTAATATCTGATGATACTCACCCACACACATTGCTTAAGGATGGACATTTAGATCATATTATAAAACGTGCTATAGAAGAAGGG ATAGAGCCATTAACTGCAATTCAAATGGTAACAATAAATTGTGCACAATGTTTCCAAATGGATCATGAATTAGGTTCTATAACTCCAGGAAAATGTGCAGATATTGTATTTATAGAAGATTTAAAAGATGTAAAAATAACAAAGGTTATTATAGATGGAAATTTAGTTGCAAAGGGTGGACTATTAACTACTTCAATAGCTAAATATGATTATCCTGAAGATGCTATGAATTCAATGCATATTAAGAATAAAATAACACCAGATTCCTTTAATATTATGGCTCCTAATAAAGAAAAAATAACTGCAAGGGTTATTGAAATTATACCTGAAAGAGTTGGTACATATGAGAGACATGTTGAACTTAATGTTAAAGATGATAAAGTTCAATGTGATCCAAGTAAAGATGTTTTAAAAGCAGTTGTATTTGAAAGACACCATGAAACAGGAACAGCAGGATATGGTTTTGTTAAAGGTTTTGGTATTAAGAGAGGAGCTATGGCTGCAACAGTTGCCCATGATGCTCACAACTTATTAGTTATAGGAACAAATGATGAAGATATGGCATTAGCTGCTAATACATTAATAGAATGTGGTGGAGGAATGGTAGCCGTACAAGATGGTAAAGTATTAGGCTTAGTTCCATTACCAATAGCAGGACTTATGAGTAATAAGCCTTTAGAAGAAATGGCAGAAATGGTAGAAAAACTAGATAGTGCATGGAAAGAAATAGGATGTGATATAGTTTCACCATTTATGACAATGGCACTTATTCCACTTGCCTGCCTACCAGAATTAAGACTAACTAATAGAGGGTTAGTTGATTGTAATAAGTTTGAATTTGTATCATTATTTGTAGAAGAATAA

View gene at FastaView.


The organism Actinomyces oris (which occurs in the film that builds up on teeth) has an adenine deaminase gene that looks like this:

ATGGCCGATCAACCGTCCGCAGACCTGCTTATCAAGGACGCGCGCATCGTCCCTTTCCGGTCCCGTACCGAACTGGGTGCGCTGCGCCGAGGTGACCCTCACCCCGGCGCCTTGGCCGCGCCGCCGCCCCCGGGTGAGCCCGTGGATGTGCGTATCAAGGCGGGCCGGGTCGTCGAGGTGGGACAGGGGCTGAGTGCTCCCGGGACACGGGTCCTTGAGGCCGAGGGCTCCTTCCTCATTCCCGGCCTGTGGGACGCTCACGCCCACCTGGACATGGAGGCGGCGCGCTCGGCACGC ATCGACACGCTGGCCACCCGCAGCGCGGAGGAGGCCCTGGAGCTGGTGGCACGGGCGCTGCGGGATCATCCGGCCGGTTCGCCTCCGGCCACGATCCAG GGCTTCGGGCACCGCCTGTCCAACTGGCCCCGGGTGCCCACGGTGGCCGAGCTCGACGCCGTCACCGGGGAGGTTCCCACGCTGCTCATCTCCGGGGAC GTGCACTCCGGGTGGCTGAACTCGGCGGCGCTGCGTGTCTTCGGCCTGCCGGGGGCCAGCGCCCAGGACCCGGGAGCACCGATGAAGGAGGACCCGTGG TTCGCCCTACTCGACCGCCTCGATGAGGTCCCGGGGACACGCGAGCTGCGGGAGTCCGGCTACCGACAGGTCCTGGCCGACATGCTGTCCCGGGGCGTC ACCGGCGTGGTGGACATGAGCTGGTCGGAGGATCCCGATGACTGGCCGCGGCGCCTGCGGGCCATGGCGGACGAGGGCGTACTCCCCCAGGTGCTGCCC CGCATCCGCATCGGGGTCTACCGCGACAAGCTGGAACGGTGGATCGCCCGGGGCCTGCGCACCGGGACCGCGCTGGCAGGCTCACCCCGCCTGCCCGAC GGTTCCCCGGTGCTGGTGCAGGGGCCGCTCAAGGTGATCGCAGACGGCTCGATGGGCTCGGGCAGCGCACACATGTGCGAGCCCTATCCCGCCGAGCTG GGCCTGGAGCACGCCTGCGGCGTGGTCAACATCGACCGGGCCGAGCTCACCGACCTCATGGCCCACGCCTCCCGGCAGGGTTATGAGATGGCCATCCAC GCCATCGGGGACGCGGCGGTCGACGACGTCGCCGCGGCCTTCGCGCACTCGGGTGCCGCCGGGCG

For whatever reason (and that's the point: we have no idea why), Actinomyces has chosen an AT-poor dialect for its DNA, even though it has to make many of the same types of genes as Clostridium.

Some people don't see this as a major puzzle: One organism evolved its DNA to a super-AT-rich state, another one didn't. So what? It's all random drift.

I disagree. It's not drift. We know of two strong forces that should keep organisms like Actinomyces from developing high G+C content. First is "AT pressure." It's known that mutations naturally tend to go in the GC-->AT direction. (One study found that in Salmonella typhimurium, GC-->AT mutations outnumbered AT-->GC mutations 50 to 1.) In the absence of corrective measures, natural mutations would very quickly lead all organisms in the direction of DNA with a very low G+C content.

A second important force is that of lateral gene transfer, which we know is common in microorganisms; common enough, certainly, to "even out" GC/AT ratios over evolutionary timescales. Random uptake of foreign genes by cells should tend to make A, G, C, and T levels equal, over time. For organisms like Clostridium and Actinomyces (and many others), this clearly hasn't happened.

In an earlier post I mentioned one possible reason organisms drift away from the 50-50 GC/AT centerline. DNA replication is more efficient when the template is biased toward one extreme (GC) or the other (AT), assuming endogenous nucleotide levels can be regulated in a similarly biased fashion (which they presumably are, in these organisms).

One might speculate that GC/AT extremism also simplifies DNA maintenance and repair. Imagine that your DNA is 70% G+C. A super-simple DNA repair tactic for deaminated purines would be to just replace every defective purine with a guanine. Seven out of ten times, blind replacement of defective purines with guanine would be the correct repair, if you're Actionymyces. And one out of three times, mistakes wouldn't matter anyway, because high-GC codons tend to be fourfold degenerate. (In a fourfold degenerate codon, you can replace the third base with anything—A, G, C, or T—without changing the codon's meaning.) Blind guanine substitution would have a better than 80% success rate in a high-GC organism that needed to replace defective purines.

It turns out there are other reasons to live "away from centerline," if you're a bacterium. I'll talk about those in another post.
reade more... Résuméabuiyad