Pages

.

Showing posts with label Archaea. Show all posts
Showing posts with label Archaea. Show all posts

Shedding Light on DNA Strand Asymmetry

In 1950, Erwin Chargaff was the first to report that the amount of adenine (A) in DNA equals the amount of thymine (T), and the amount of guanine (G) equals the amount of cytosine (C). This result was instrumental in helping Watson and Crick (and Rosalind Franklin) determine the structure of DNA.

It's pretty easy to understand that every A on one strand of DNA pairs with a T on the other strand (and every G pairs with an opposite-strand C); this explains DNA complementarity and the associated replication model. But somewhere along the line, Chargaff was credited with the much less obvious rule that A = T and G = C even for individual strands of DNA that aren't paired with anything. This is the so-called second parity rule attributed to Chargaff, although I can't find any record of Chargaff himself having postulated such a rule. The Chargaff papers that are so often cited as supporting this rule (in particular the 3-paper series culminating in this report in PNAS) do not, in fact, offer such a rule, and if you read the papers carefully, what Chargaff and colleagues actually found was that one strand of DNA is heavier than the other (they label the strands 'H' and 'L', for Heavy and Light); not only that, but Chargaff et al. reported a consistent difference in purine content between strands (see Table 1 of this paper).

When I interviewed Linus Pauling in 1977, he cautioned me to always read the Results section of a paper carefully, because people will often conclude something entirely different than what the Results actually showed, or cite a paper as showing "ABC" when the data actually showed "XYZ."

How right he was.

At any rate, it turns out that the "message" strand of a gene hardly ever contains equal amounts of purines and pyrimidines. Codon analysis reveals that as genes become richer in A+T content (or as G+C content goes down), the excess of purines on the message strand becomes larger and larger. This is depicted in the following graph, which shows message-strand purine content (A+G) plotted against A+T content, for 1,373 distinct bacterial species. (No species is represented twice.)

Codon analysis reveals that as A+T content increases, message-strand purine content (A+G) increases. Each point on this graph represents a unique bacterial species (N=1373).

It's quite obvious that when A+T content is above approximately 33%, as it is for most bacterial species, the message strand tends to be comparatively purine-rich. Below A+T = 33%, the message strand becomes more pyrimidine-rich than purine-rich. (Note: In bacteria, where most of the DNA is in coding regions, codon-derived A+T content is very close to whole-genome A+T content. I checked the 1,373 species graphed here and found whole-chromosome A+T to differ from codon-derived A+T by an average of less than 7 parts in 10,000.)

The correlation between A+T and purine content is strong (r=0.85). Still, you can see that quite a few points have drifted far from the regression line, especially in the region of x = 0.5 to x = 0.7, where lots of points lie above y = 0.55. What's going on with those organisms? I decided to do some investigating.

First, some basics. Over time, transition mutations (AT↔GC) can change an organism's A+T content and thus move it along the x-axis of the graph, but transitions cannot move an organism higher or lower on the graph, because (by definition) transitions don't affect the strandwise purine balance.

Transversions, on the other hand, can affect strandwise purine balance (in theory, at least), but only if they occur more often on one strand of DNA than the other. (I should say: occur more often, or are fixed more often, on one strand versus the other.) For example, let's say G-to-T transversions are the most common kind of transversion (which is probably true, given that guanine is the most easily oxidized of the four bases and given the fact that failure to repair 8-oxoguanine lesions does lead to eventual replacement with thymine). And let's say G-to-T transversions are most likely to occur on the non-transcribed strand of DNA, at transcription time. (The non-transcribed strand is uncoiled and unprotected while transcription is taking place on the other strand.) Over time, the non-transcribed strand would lose guanines; they'd be replaced by thymines. The message strand, or RNA-synonymous strand (which is also the non-transcribed strand) would become pyrimidine-rich and the other strand would become purine-rich.

Unfortunately, while that's exactly what happens for organisms with A+T content below 33%, precisely the opposite happens (purines accumulate on the message strand) in organisms with A+T above 33%. And in fact, in some high-AT organisms, the purine content of message strands is rather extreme. How can we explain that?

One possibility is that some organisms have evolved extremely effective transversion repair systems for the message (non-transcribed) strand of genes—systems that are so effective, no G-to-T transversions go unrepaired on the message strand. The transcribed strand, on the other hand, doesn't get the benefit of this repair system, possibly because the repair enzymes can't access the strand: it's engulfed in transcription factors, topoisomerases, RNA polymerase, nearby ribosomal machinery, etc.

If the non-transcribed strand never mutates (because all mutations are swiftly repaired), then the transcribed strand will (in the absence of equally effective repairs) eventually accumulate G-to-T mutations, and the message strand will accumulate adenines (purines). Perhaps.

In the graph further above, you'll notice at x = 0.6 a tiny spur of points hangs down at around y = 0.5. These points belong to some Bartonella species, plus a Parachlamydia and another chlamydial organism. These are endosymbionts that have lost a good portion of their genomes over time. It seems likely they've lost some transversion-repair machinery. During transcription, their message strands are going unrepaired. G-to-T transversions happen on the message strand, rendering it light in purines. Such a scenario seems plausible, at least.

By this reasoning, maybe points far above the regression line represent organisms that have gained repair functionality, such that their message strands never undergo G-to-T transversions (although their transcribed strands do). Is this possible?

Examination of the highest points on the graph shows a predominance of Clostridia. (Not just members of the genus Clostridium, but the class Clostridia, which is a large, ancient, and diverse class of anaerobes.) One thing we know about the Clostridia is that unlike all other bacteria (unlike members of the Gammaproteobacteria, the Alpha- and Betaproteobacteria, the Actinomycetes, the Bacteroidetes, etc.), the Clostridia have Ogg1, otherwise known as 8-oxoguanine glycosylase (which specifically prevents G-to-T transversions). They share this capability with all members of the Archaea, and all higher life forms as well.

Note that while non-Ogg1 enzymes exist for correcting 8-oxoguanine lesions (e.g., MutM, MutY, mfd), there is evidence that Ogg1 is specifically involved in repair of 8oxoG lesions in non-transcribed strands of DNA, at transcription time. (The other 8oxoG repair systems may not be strand-specific.)

If Archaea benefit from Ogg1 the way Clostridia do, they too should fall well above the regression line on a graph of A+G versus A+T. And this is exactly what we find. In the graph below, the pink squares are members of Archaea that came up positive in a protein-Blast query against Drosophila Ogg1. (I'll explain why I used Drosophila in a minute.) The red-orange circles are bacterial species (mostly from class Clostridia) that turned up Ogg1-positive in a similar Blast search.

Ogg1-positive organisms are plotted here. The pink squares are Archaea species. Red-orange circles are bacterial species that came up Ogg1-positive in a protein Blast search using a Drosophila Ogg1 amino-acid sequence. In the background (greyed out) is the graph of all 1,373 bacterial species, for comparison. Note how the Ogg1-positive organisms have a higher purine (A+G) content than the vast majority of bacteria.

The points in this plot are significantly higher on the y-axis than points in the all-bacteria plot (and the regression line is steeper), consistent with a different DNA repair profile.

In identifying Ogg1-positive organisms, I wanted to avoid false positives (organisms with enzymes that share characteristics of Ogg1 but that aren't truly Ogg1), so for the Blast query I used Drosophila's Ogg1 as a reference enzyme, since it is well studied (unlike Archaeal or Clostridial Ogg1). I also set the E-value cutoff at 1e-10, to reduce spurious matches with DNA repair enzymes or nucleases that might have domain similarity with Ogg1 but aren't Ogg1. In addition, I did spot checks to be sure the putative Ogg1 matches that came up were not actually matches of Fpg (MutM), RecA, RadA, MutY, DNA-3-methyladenine glycosidase, or other DNA-binding enzymes.

Bottom line, organisms that have an Archaeal 8-oxoguanine glycosylase enzyme (mostly obligate anaerobes) occupy a unique part of the A+G vs. A+T graph. Which makes sense. It's only logical that anaerobes would have different DNA repair strategies (and a different "repairosome") than oxygen-tolerant bacteria, because oxidative stress is, in general, handled much differently in anaerobes. The fact that they bring different repair tactics to bear on DNA shouldn't come as a surprise.


reade more... Résuméabuiyad

DNA Repair 101

You don't have to be a biologist to know that anything that can damage DNA is potentially harmful, because it can cause mutations (which are, in fact, mostly harmful; very few mutations are beneficial). Fortunately, cells contain dozens of different kinds of repair enzymes, and most DNA damage is repaired quickly. When damage isn't repaired quickly (or properly), you have a mutation.

It's not much of a stretch to say that DNA repair enzymes play a front-and-center role in evolution (or at least the portion of evolution that's driven by mutations). Which is why molecular geneticists tend to pay a lot of attention to DNA repair processes. Anything that can affect the composition of DNA can change the course of evolution.

DNA is remarkably stable, chemically. Nonetheless, it is vulnerable to oxidative attack (by hydroxyl radicals, superoxides, nitric oxide, and other Reactive Oxygenated Species generated in the course of cell metabolism—never mind exogenous poisons).

Of the four bases in DNA—guanine (G), cytosine (C), adenine (A), thymine (T)—guanine is the most susceptible to oxidative attack. When it's exposed to an oxidant, it can form 7,8-dihydro-8-oxoguanine, OG for short. What can happen then is, the OG residue in DNA pivots around its ribosyl bond until the amino group is facing the other way (see diagram), and when that happens, OG can pair up with adenine instead of guanine's usual partner, cytosine.

When guanine is oxidized to form 7,8-dihydro-8-oxoguanine,
it mispairs with adenine instead of its usual partner, cytosine.
Rest assured, there are proofreading enzymes that can and will detect such funny business in short order. But if OG isn't detected and replaced with a normal guanine before replication occurs, OG may get paired up with an adenine during replication (and then it'll eventually be swapped out with thymine, adenine's usual partner). That's bad, because what it means is that a G:C pair ended up getting changed to a T:A pair. (The place of the G got taken first by OG and then T. The place of G's opposite-strand partner, C, eventually got taken by A.) In so many words: that's a mutation.

It turns out there's a special enzyme designed to prevent the G↔T funny business we've just been talking about. It's called oxoguanine glycosylase, or Ogg1 for short. You'll sometimes see it called 8-oxoguanine-DNA-glycosylase, and from a capabilities standpoint it's often (wrongly) compared to the Fpg enzyme (formamidopyrimidine-DNA glycosylase), which is not the same as Ogg1 at all. 

Just about all higher life forms have an Ogg1 enzyme (which clips OG out of DNA and ensures it gets replaced with a brand-new guanine before any funny business can happen). Surprisingly few bacteria have this enzyme, instead preferring to let the more general-purpose Fpg (MutM) take its place. If you run a Blast search of a reference Ogg1 gene (the Drosophila version works well) against all bacterial genomes, you'll get only a few hundred matches (out of around 10,000 sequenced bacterial genomes), the vast majority belonging to members of the class Clostridia (a truly fearsome group of anaerobic spore-formers containing the botulism germ, the tetanus bacterium, the notorious C. difficile—also known as C. diff—and some other creatures you probably don't want to meet). If you run the same Blast search against Archaea (this is the other major "germ-like" microbial domain, along with true bacteria), you'll get hits against almost every member species of the Archaea. Personally, I think it's likely the Ogg1 enzyme originated with a common ancestor of today's Archaea and Eukaryota, and arrived in Clostridia by lateral gene transfer (not terribly recently, though).

One thing is certain: E. coli does not have Ogg1, nor does Staphylococcus, nor Streptococcus, nor any germ you've ever heard of (other than the aforementioned Clostridia members, plus Archaea). And yet, every yeast and fungus has it, every plant, every fruit fly, every fish, every human—every higher life form. Ironically, only five members of Archaea turned up positive for the Fpg enzyme when I did a check, whereas almost all Eubacteria ("true bacteria") have it, including Clostridia. Bottom line, Clostridia have the best of both worlds: Fpg, plus Ogg1. Belt and suspenders, both.

This is just a tiny intro to the subject of DNA repair, which is a vast subject indeed. For more, see this article, or just start rummaging around in Google Scholar.
reade more... Résuméabuiyad

More Science on the Desktop

If you took Bacteriology 101, you were probably subjected to (maybe even tested on) the standard mythology about anaerobes lacking the enzyme catalase. The standard mythology goes like this: Almost all life forms (from bacteria to dandelions to humans) have a special enzyme called catalase that detoxifies hydrogen peroxide by breaking it down to water and molecular oxygen. The only exception: strict anaerobes (bacteria that cannot live in the presence of oxygen). They seem to lack catalase.

I've written on this subject before, so I won't bore you with a proper debunking of all aspects of the catalase myth here. (For that, see this post.) Right now, I just want to emphasize one point, which is that, contrary to myth, quite a few strict anaerobes do have catalase. I've listed 87 examples by name below. (Scroll down.)

I have to admit, even I was shocked to find there are 87 species of catalase-positive strict anaerobes among the eubacteria. It's about quadruple the number I would have expected.

If you're curious how I came up with a list of 87 catalase-positive anaerobes, here's how. First, I assembled a sizable (N=1373) list of bacteria, unduplicated at the species level. (So in other words, E. coli is listed only once, Staphylococcus aureus is listed only once, etc. No species is listed twice.) I then used the free/online CoGeBlast tool to run two Blast searches: one designed to identify aerobes, and another to identify catalase-positive organisms. In the end, I had all 1,373 organisms tagged as to whether each was aerobic, anaerobic, catalase-positive, or catalase-negative.

It's not as easy as you'd think to identify strict anaerobes. There is no single enzymatic marker that can be used to identify anaerobes reliably (across 1,373 species), as far as I know. I took the opposite approach, tagging as aerobic any organism that produces cytochrome c oxidase and/or NADH dehydrogenase. (These are enzymes involved in classic oxidative phosphorylation of the kind no strict anaerobe participates in.) In particular, I used the following set of amino acid sequences as markers of aerobic respiration (non-biogeeks, scroll down):

>sp|Q6MIR4|NUOB_BDEBA NADH-quinone oxidoreductase subunit B OS=Bdellovibrio bacteriovorus (strain ATCC 15356 / DSM 50701 / NCIB 9529 / HD100) GN=nuoB PE=3 SV=1
MHNEQVQGLVSHDGMTGTQAVDDMSRGFAFTSKLDAIVAWGRKNSLWPMPYGTACCGIEF MSVMGPKYDLARFGAEVARFSPRQADLLVVAGTITEKMAPVIVRIYQQMLEPKYVLSMGA CASSGGFYRAYHVLQGVDKVIPVDVYIPGCPPTPEAVMDGIMALQRMIATNQPRPWKDNW KSPYEQA
>sp|P0ABJ3|CYOC_ECOLI Cytochrome o ubiquinol oxidase subunit 3 OS=Escherichia coli (strain K12) GN=cyoC PE=1 SV=1
MATDTLTHATAHAHEHGHHDAGGTKIFGFWIYLMSDCILFSILFATYAVLVNGTAGGPTG KDIFELPFVLVETFLLLFSSITYGMAAIAMYKNNKSQVISWLALTWLFGAGFIGMEIYEF HHLIVNGMGPDRSGFLSAFFALVGTHGLHVTSGLIWMAVLMVQIARRGLTSTNRTRIMCL SLFWHFLDVVWICVFTVVYLMGAM 

>sp|Q9I425|CYOC_PSEAE Cytochrome o ubiquinol oxidase subunit 3 OS=Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) GN=cyoC PE=3 SV=1
MSTAVLNKHLADAHEVGHDHDHAHDSGGNTVFGFWLYLMTDCVLFASVFATYAVLVHHTA GGPSGKDIFELPYVLVETAILLVSSCTYGLAMLSAHKGAKGQAIAWLGVTFLLGAAFIGM EINEFHHLIAEGFGPSRSAFLSSFFTLVGMHGLHVSAGLLWMLVLMAQIWTRGLTAQNNT RMMCLSLFWHFLDIVWICVFTVVYLMGAL
>tr|Q7VDD9|Q7VDD9_PROMA Cytochrome c oxidase subunit III OS=Prochlorococcus marinus (strain SARG / CCMP1375 / SS120) GN=cyoC PE=3 SV=1
MTTISSVDKKAEELTSQTEEHPDLRLFGLVSFLVADGMTFAGFFAAYLTFKAVNPLLPDA IYELELPLPTLNTILLLVSSATFHRAGKALEAKESEKCQRWLLITAGLGIAFLVSQMFEY FTLPFGLTDNLYASTFYALTGFHGLHVTLGAIMILIVWWQARSPGGRITTENKFPLEAAE LYWHFVDGIWVILFIILYLL
>sp|Q8KS19|CCOP2_PSEST Cbb3-type cytochrome c oxidase subunit CcoP2 OS=Pseudomonas stutzeri GN=ccoP2 PE=1 SV=1
MTSFWSWYVTLLSLGTIAALVWLLLATRKGQRPDSTEETVGHSYDGIEEYDNPLPRWWFM LFVGTVIFALGYLVLYPGLGNWKGILPGYEGGWTQVKEWQREMDKANEQYGPLYAKYAAM PVEEVAKDPQALKMGGRLFASNCSVCHGSDAKGAYGFPNLTDDDWLWGGEPETIKTTILH GRQAVMPGWKDVIGEEGIRNVAGYVRSLSGRDTPEGISVDIEQGQKIFAANCVVCHGPEA KGVTAMGAPNLTDNVWLYGSSFAQIQQTLRYGRNGRMPAQEAILGNDKVHLLAAYVYSLS QQPEQ
>sp|P57542|CYOC_BUCAI Cytochrome o ubiquinol oxidase subunit 3 OS=Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) GN=cyoC PE=3 SV=1
MIENKFNNTILNSNSSTHDKISETKKLFGLWIYLMSDCIMFAVLFAVYAIVSSNISINLI SNKIFNLSSILLETFLLLLSSLSCGFVVIAMNQKRIKMIYSFLTITFIFGLIFLLMEVHE FYELIIENFGPDKNAFFSIFFTLVATHGVHIFFGLILILSILYQIKKLGLTNSIRTRILC FSVFWHFLDIIWICVFTFVYLNGAI
>sp|O24958|CCOP_HELPY Cbb3-type cytochrome c oxidase subunit CcoP OS=Helicobacter pylori (strain ATCC 700392 / 26695) GN=ccoP PE=3 SV=1
MDFLNDHINVFGLIAALVILVLTIYESSSLIKEMRDSKSQGELVENGHLIDGIGEFANNV PVGWIASFMCTIVWAFWYFFFGYPLNSFSQIGQYNEEVKAHNQKFEAKWKHLGQKELVDM GQGIFLVHCSQCHGITAEGLHGSAQNLVRWGKEEGIMDTIKHGSKGMDYLAGEMPAMELD EKDAKAIASYVMAELSSVKKTKNPQLIDKGKELFESMGCTGCHGNDGKGLQENQVFAADL TAYGTENFLRNILTHGKKGNIGHMPSFKYKNFSDLQVKALLNLSNR
>sp|P0ABI8|CYOB_ECOLI Ubiquinol oxidase subunit 1 OS=Escherichia coli (strain K12) GN=cyoB PE=1 SV=1
MFGKLSLDAVPFHEPIVMVTIAGIILGGLALVGLITYFGKWTYLWKEWLTSVDHKRLGIM YIIVAIVMLLRGFADAIMMRSQQALASAGEAGFLPPHHYDQIFTAHGVIMIFFVAMPFVI GLMNLVVPLQIGARDVAFPFLNNLSFWFTVVGVILVNVSLGVGEFAQTGWLAYPPLSGIE YSPGVGVDYWIWSLQLSGIGTTLTGINFFVTILKMRAPGMTMFKMPVFTWASLCANVLII ASFPILTVTVALLTLDRYLGTHFFTNDMGGNMMMYINLIWAWGHPEVYILILPVFGVFSE IAATFSRKRLFGYTSLVWATVCITVLSFIVWLHHFFTMGAGANVNAFFGITTMIIAIPTG VKIFNWLFTMYQGRIVFHSAMLWTIGFIVTFSVGGMTGVLLAVPGADFVLHNSLFLIAHF HNVIIGGVVFGCFAGMTYWWPKAFGFKLNETWGKRAFWFWIIGFFVAFMPLYALGFMGMT RRLSQQIDPQFHTMLMIAASGAVLIALGILCLVIQMYVSIRDRDQNRDLTGDPWGGRTLE WATSSPPPFYNFAVVPHVHERDAFWEMKEKGEAYKKPDHYEEIHMPKNSGAGIVIAAFST IFGFAMIWHIWWLAIVGFAGMIITWIVKSFDEDVDYYVPVAEIEKLENQHFDEITKAGLK NGN
>sp|P0ABK2|CYDB_ECOLI Cytochrome d ubiquinol oxidase subunit 2 OS=Escherichia coli (strain K12) GN=cydB PE=1 SV=1
MIDYEVLRFIWWLLVGVLLIGFAVTDGFDMGVGMLTRFLGRNDTERRIMINSIAPHWDGN QVWLITAGGALFAAWPMVYAAAFSGFYVAMILVLASLFFRPVGFDYRSKIEETRWRNMWD WGIFIGSFVPPLVIGVAFGNLLQGVPFNVDEYLRLYYTGNFFQLLNPFGLLAGVVSVGMI ITQGATYLQMRTVGELHLRTRATAQVAALVTLVCFALAGVWVMYGIDGYVVKSTMDHYAA SNPLNKEVVREAGAWLVNFNNTPILWAIPALGVVLPLLTILTARMDKAAWAFVFSSLTLA CIILTAGIAMFPFVMPSSTMMNASLTMWDATSSQLTLNVMTWVAVVLVPIILLYTAWCYW KMFGRITKEDIERNTHSLY
>sp|Q6MIR4|NUOB_BDEBA NADH-quinone oxidoreductase subunit B OS=Bdellovibrio bacteriovorus (strain ATCC 15356 / DSM 50701 / NCIB 9529 / HD100) GN=nuoB PE=3 SV=1
MHNEQVQGLVSHDGMTGTQAVDDMSRGFAFTSKLDAIVAWGRKNSLWPMPYGTACCGIEF MSVMGPKYDLARFGAEVARFSPRQADLLVVAGTITEKMAPVIVRIYQQMLEPKYVLSMGA CASSGGFYRAYHVLQGVDKVIPVDVYIPGCPPTPEAVMDGIMALQRMIATNQPRPWKDNW KSPYEQA
>sp|Q89AU5|NUOB_BUCBP NADH-quinone oxidoreductase subunit B OS=Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp) GN=nuoB PE=3 SV=1
MKYTLTRVNISDDDQNYPREKKIQVSDPTKKYIQKNVFMGTLSKVLHNLVNWGRKNSLWP YNFGLSCCYVEMVTSFTSVHDISRFGSEVLRASPRQADFMVIAGTPFIKMVPIIQRLYDQ MLEPKWVISMGSCANSGGMYDIYSVVQGVDKFLPVDVYIPGCPPRPEAYIHGLMLLQKSI SKERRPLSWIIGEQGIYKANFNSEKKNLRKMRNLVKYSQDKN
>sp|Q82DY0|NUOB1_STRAW NADH-quinone oxidoreductase subunit B 1 OS=Streptomyces avermitilis (strain ATCC 31267 / DSM 46492 / JCM 5070 / NCIMB 12804 / NRRL 8165 / MA-4680) GN=nuoB1 PE=3 SV=1
MGLEEKLPSGFLLTTVEQAAGWVRKASVFPATFGLACCAIEMMTTGAGRYDLARFGMEVF RGSPRQADLMIVAGRVSQKMAPVLRQVYDQMPNPKWVISMGVCASSGGMFNNYAIVQGVD HIVPVDIYLPGCPPRPEMLIDAILKLHQKIQSSKLGVNAEEAAREAEEAALKALPTIEMK GLLR


Astonishingly, certain bacteria that "everyone knows" are anaerobic turned up as aerobic when checked with the above Blast-query. (For example: Bacteroides fragilis, Desulfovibrio gigas, Moorella thermoacetica, and others.) It seems quite a number of so-called anaerobes have non-copper (heme only) cytochrome oxidases. (See this paper for further discussion.)

In any event, my Blast search turned up 1,089 positives (putative aerobes, some facultatively anaerobic) out of 1,373 bacterial species. I tagged the non-positives as anaerobes.

Of the 284 putative anaerobes, 87 scored positive in a Blast protein search (t-blast-n) for catalase. I used the following catalase sequences in my query: 


>sp|B0C4G1|KATG_ACAM1 Catalase-peroxidase OS=Acaryochloris marina (strain MBIC 11017) GN=katG PE=3 SV=1
MSSASKCPFSGGALKFTAGSGTANRDWWPNQLNLQILRQHSPKSNPMDKAFNYAEAFKSL DLADVKQDIFDLMKSSQDWWPADYGHYGPLFIRMAWHSAGTYRIGDGRGGAGTGNQRFAP INSWPDNANLDKARMLLWPIKQKYGAKISWADLMILAGNCALESMGFKTFGFAGGREDIW EPEEDIYWGAETEWLGDQRYTGDRDLEATLGAVQMGLIYVNPEGPNGHPDPVASGRDIRE TFGRMAMNDEETVALTAGGHTFGKCHGAGDDAHVGPEPEGARIEDQCLGWKSSFGTGKGV HAITSGIEGAWTTNPTQWDNNYFENLFGYEWELTKSPAGANQWVPQGGAGANTVPDAHDP SRRHAPIMTTADMAMRMDPIYSPISRRFLDNPDQFADAFARAWFKLTHRDMGPRSRYLGP EVPEEELIWQDPVPAVNHELINEQDIATLKSQILATNLTVSQLVSTAWASAVTYRNSDKR GGANGARIRLAPQRDWEVNQPAQLATVLQTLEAVQTTFNHSQIGGKRVSLADLIVLGGCA GVEQAAKNAGWYDVKVPFKPGRTDATQAQTDVTSFAVLEPRADGFRNYLKGHYPVSAEEL LVDKAQLLTLTAPEMTVLVGGLRVLNANVGQAQHGVFTHRPESLTNDFFLNLLDMSVTWA ATSEAEEVFEGRDRKTGALKWTGTRVDLIFGSNSQLRALAEVYGCEDSQQRFVQDFVAAW DKVMNLDRFDLA
>tr|D9RGS2|D9RGS2_STAAJ Catalase OS=Staphylococcus aureus (strain JKD6159) GN=katE PE=3 SV=1
MSQQDKKLTGVFGHPVSDRENSMTAGPRGPLLMQDIYFLEQMSQFDREVIPERRMHAKGS GAFGTFTVTKDITKYTNAKIFSEIGKQTEMFARFSTVAGERGAADAERDIRGFALKFYTE EGNWDLVGNNTPVFFFRDPKLFVSLNRAVKRDPRTNMRDAQNNWDFWTGLPEALHQVTIL MSDRGIPKDLRHMHGFGSHTYSMYNDSGERVWVKFHFRTQQGIENLTDEEAAEIIASDRD SSQRDLFEAIEKGDYPKWTMYIQVMTEEQAKSHKDNPFDLTKVWYHDEYPLIEVGEFELN RNPDNYFMDVEQAAFAPTNIIPGLDFSPDKMLQGRLFSYGDAQRYRLGVNHWQIPVNQPK GVGIENICPFSRDGQMRVVDNNQGGGTHYYPNNHGKFDSQPEYKKPPFPTDGYGYEYNQR QDDDNYFEQPGKLFRLQSEDAKERIFTNTANAMEGVTDDVKRRHIRHCYKADPEYGKGVA KALGIDINSIDLETENDETYENFEK
>sp|P60355|MCAT_LACPN Manganese catalase OS=Lactobacillus plantarum PE=1 SV=1
MFKHTRKLQYNAKPDRSDPIMARRLQESLGGQWGETTGMMSYLSQGWASTGAEKYKDLLL DTGTEEMAHVEMISTMIGYLLEDAPFGPEDLKRDPSLATTMAGMDPEHSLVHGLNASLNN PNGAAWNAGYVTSSGNLVADMRFNVVRESEARLQVSRLYSMTEDEGVRDMLKFLLARETQ HQLQFMKAQEELEEKYGIIVPGDMKEIEHSEFSHVLMNFSDGDGSKAFEGQVAKDGEKFT YQENPEAMGGIPHIKPGDPRLHNHQG
>sp|P42321|CATA_PROMI Catalase OS=Proteus mirabilis GN=katA PE=1 SV=1
MEKKKLTTAAGAPVVDNNNVITAGPRGPMLLQDVWFLEKLAHFDREVIPERRMHAKGSGA FGTFTVTHDITKYTRAKIFSEVGKKTEMFARFSTVAGERGAADAERDIRGFALKFYTEEG NWDMVGNNTPVFYLRDPLKFPDLNHIVKRDPRTNMRNMAYKWDFFSHLPESLHQLTIDMS DRGLPLSYRFVHGFGSHTYSFINKDNERFWVKFHFRCQQGIKNLMDDEAEALVGKDRESS QRDLFEAIERGDYPRWKLQIQIMPEKEASTVPYNPFDLTKVWPHADYPLMDVGYFELNRN PDNYFSDVEQAAFSPANIVPGISFSPDKMLQGRLFSYGDAHRYRLGVNHHQIPVNAPKCP FHNYHRDGAMRVDGNSGNGITYEPNSGGVFQEQPDFKEPPLSIEGAADHWNHREDEDYFS QPRALYELLSDDEHQRMFARIAGELSQASKETQQRQIDLFTKVHPEYGAGVEKAIKVLEG KDAK
>sp|Q9Z598|CATA_STRCO Catalase OS=Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) GN=katA PE=3 SV=1
MSQRVLTTESGAPVADNQNSASAGIGGPLLIQDQHLIEKLARFNRERIPERVVHARGSGA YGHFEVTDDVSGFTHADFLNTVGKRTEVFLRFSTVADSLGGADAVRDPRGFALKFYTEEG NYDLVGNNTPVFFIKDPIKFPDFIHSQKRDPFTGRQEPDNVFDFWAHSPEATHQITWLMG DRGIPASYRHMDGFGSHTYQWTNARGESFFVKYHFKTDQGIRCLTADEAAKLAGEDPTSH QTDLVQAIERGVYPSWTLHVQLMPVAEAANYRFNPFDVTKVWPHADYPLKRVGRLVLDRN PDNVFAEVEQAAFSPNNFVPGIGPSPDKMLQGRLFAYADAHRYRLGVNHTQLAVNAPKAV PGGAANYGRDGLMAANPQGRYAKNYEPNSYDGPAETGTPLAAPLAVSGHTGTHEAPLHTK DDHFVQAGALYRLMSEDEKQRLVANLAGGLSQVSRNDVVEKNLAHFHAADPEYGKRVEEA VRALRED
>Haloarcula marismortui strain ATCC 43049(v1, unmasked), Name: YP_136584.1, katG1, rrnAC2018, Type: CDS, Feature Location: (Chr: I, complement(1808213..1810405)) Genomic Location: 1808213-1810405
MLKTVLMPSPSKCSLMAKRDQDWSPNQLRLDILDQNARDADPRGTGFDYAEEFQELDLDAVKADLEELMTSSQDWWPADYGHYGPLFIRMAWHSAGTYRTTDGRGGASGGRQRFAPLNSWPDNANLDKARRLLWPIKKKYGRKLSWADLIVLAGNHAIESMGLKTFGWAGGREDAFEPDEAVDWGPEDEMEAHQSERRTDDGELKEPLGAAVMGLIYVDPEGPNGNPDPLASAENIRESFGRMAMNDEETAALIAGGHTFGKVHGADDPEENLGDVPEDAPIEQMGLGWENDYGSGKAGDTITSGIEGPWTQAPIEWDNGYIDNLLDYEWEPEKGPGGAWQWTPTDEALANTVPDAHDPSEKQTPMMLTTDIALKRDPDYREVMERFQENPMEFGINFARAWYKLIHRDMGPPERFLGPDAPDEEMIWQDPVPDVDHDLIGDEEVAELKTDILETDLTVSQLVKTAWASASTYRDSDKRGGANGARIRLEPQKNWEVNEPAQLETVLATLEEIQAEFNSARTDDTRVSLADLIVLGGNAAVEQAAADAGYDVTVPFEPGRTDATPEQTDVDSFEALKPRADGFRNYARDDVDVPAEELLVDRADLLDLTPEEMTVLVGGLRSLGATYQDSDLGVFTDEPGTLTNDFFEVVLGMDTEWEPVSESKDVFEGYDRETGEQTWAASRVDLIFGSHSRLRAIAEVYGADGAEAELVDDFVDAWHKVMRLDRFDLE
>sp|B2TJE9|KATG_CLOBB Catalase-peroxidase OS=Clostridium botulinum (strain Eklund 17B / Type B) GN=katG PE=3 SV=1
MTENKCPVTGKMGKATAGSGTTNKDWWPNQLNLNILHQNSQLSNPMSKDFNYAEEFKKLD FQALKVDLYMLMTDSQIWWPADYGNYGPLFIRMAWHSAGTYRVGDGRGGGSLGLQRFAPL NSWPDNINLDKARRLLWPIKKKYGNKISWADLLILTGNCALESMGLKTLGFGGGRVDVWE PQEDIYWGSEKEWLGDEREKGDKELENPLAAVQMGLIYVNPEGPNGNPDPLGSAHDVRET FARMAMNDEETVALIAGGHTFGKCHGAASPSYVGPAPEAAPIEEQGLGWKNTYGSGNGDD TIGSGLEGAWKANPTKWTMGYLKTLFKYDWELVKSPAGAYQWLAKNVDEEDMVIDAEDST KKHRPMMTTADLGLRYDPIYEPIARNYLKNPEKFAHDFASAWFKLTHRDMGPISRYLGPE VPKESFIWQDPIPLVKHKLITKKDITHIKKKILDSGLSISDLVATAWASASTFRGSDKRG GANGGRIRLEPQKNWEVNEPKKLNNVLNTLKQIKENFNSSHSKDKKVSLADIIILGGCVG IEQAAKRAGYNINVPFIPGRTDAIQEQTDVKSFAVLEPKEDGFRNYLKTKYVVKPEDMLI DRAQLLTLTAPEMTVLIGGMRVLNCNYNKSKDGVFTNRPECLTNDFFVNLLDMNTVWKPK SEDKDRFEGFDRETGELKWTATRVDLIFGSNSQLRAIAEVYACDDNKEKFIQDFIFAWNK IMNADRFEIK
>sp|Q59635|CATB_PSEAE Catalase OS=Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228) GN=katB PE=3 SV=1
MNPSLNAFRPGRLLVAASLTASLLSLSVQAATLTRDNGAPVGDNQNSQTAGPNGSVLLQD VQLLQKLQRFDRERIPERVVHARGTGAHGEFVASADISDLSMAKVFRKGEKTPVFVRFSA VVHGNHSPETLRDPRGFATKFYTADGNWDLVGNNFPTFFIRDAIKFPDMVHAFKPDPRSN LDDDSRRFDFFSHVPEATRTLTLLYSNEGTPASYREMDGNSVHAYKLVNARGEVHYVKFH WKSLQGQKNLDPKQVAEVQGRDYSHMTNDLVSAIRKGDFPKWDLYIQVLKPEDLAKFDFD PLDATKIWPGIPERKIGQMVLNRNVDNFFQETEQVAMAPSNLVPGIEPSEDRLLQGRLFA YADTQMYRVGANGLGLPVNRPRSEVNTVNQDGALNAGHSTSGVNYQPSRLDPREEQASAR YVRTPLSGTTQQAKIQREQNFKQTGELFRSYGKKDQADLIASLGGALAITDDESKYIMLS YFYKADSDYGTGLAKVAGADLQRVRQLAAKLQD


The first of these is a cyanobacterial katG (large subunit) type of catalase, perhaps representative of primitive protobacterial catalase. The second sequence in the above list is classic Staphylococcus catalase (katE). The third is a manganese-containing catalase from Lactobacillus. (This brought the most hits, by the way.) The others are, in turn, katA catalase from Proteus and Streptomyces, two organisms that are far apart in genomic G+C content (and rather distant phylogenetically); an Archaeal catalase (even though none of the 1,373 species in my organism list was Archaeal in origin; but you never know whether a given bacterium may have obtained its catalase through horizontal gene transfer); then a known-valid anaerobic catalase from Clostridium botulinum, and finally a Pseudomonas katB catalase. The idea was to cover as much ground, phylogenetically and enzymatically, as possible, with big and small-subunit catalases, of the heme as well as the manganese variety, from aerobic and anaerobic bacteria of high and low genomic G+C content, as well as an archaeal catalase for good measure.

Here, then, finally, is the list of 87 catalase-positive strict anaerobes:

Acetohalobium arabaticum strain DSM 5501
Alkaliphilus metalliredigens strain QYMF
Alkaliphilus oremlandii strain OhILAs
Anaerococcus prevotii strain ACS-065-V-Col13
Anaerococcus vaginalis strain ATCC 51170
Anaerofustis stercorihominis strain DSM 17244
Anaerostipes caccae strain DSM 14662
Anaerostipes sp. strain 3_2_56FAA
Anaerotruncus colihominis strain DSM 17241
Bacteroides capillosus strain ATCC 29799
Bacteroides pectinophilus strain ATCC 43243
Brachyspira hyodysenteriae strain ATCC 49526; WA1
Brachyspira intermedia strain PWS/A
Brachyspira pilosicoli strain 95/1000
Candidatus Arthromitus sp. SFB-mouse-Japan
Carnobacterium sp. strain 17-4
Clostridium acetobutylicum strain ATCC 824
Clostridium asparagiforme strain DSM 15981
Clostridium bartlettii strain DSM 16795
Clostridium bolteae strain ATCC BAA-613
Clostridium botulinum A2 strain Kyoto
Clostridium butyricum strain 5521
Clostridium cellulovorans strain 743B
Clostridium cf. saccharolyticum strain K10
Clostridium citroniae strain WAL-17108
Clostridium clostridioforme strain 2_1_49FAA
Clostridium difficile QCD-37x79
Clostridium hathewayi strain WAL-18680
Clostridium hylemonae strain DSM 15053
Clostridium kluyveri strain DSM 555
Clostridium lentocellum strain DSM 5427
Clostridium leptum strain DSM 753
Clostridium ljungdahlii strain ATCC 49587
Clostridium novyi strain NT
Clostridium ramosum strain DSM 1402
Clostridium saccharolyticum strain WM1
Clostridium scindens strain ATCC 35704
Clostridium spiroforme strain DSM 1552
Clostridium sporogenes strain ATCC 15579
Clostridium tetani strain Massachusetts substrain E88
Coprobacillus sp. strain 3_3_56FAA
Coprococcus comes strain ATCC 27758
Coprococcus sp. strain ART55/1
Dethiobacter alkaliphilus strain AHT 1
Dorea formicigenerans strain 4_6_53AFAA
Dorea longicatena strain DSM 13814
Erysipelotrichaceae bacterium strain 21_3
Eubacterium dolichum strain DSM 3991
Eubacterium eligens strain ATCC 27750
Eubacterium siraeum strain 70/3
Eubacterium ventriosum strain ATCC 27560
Flavonifractor plautii strain ATCC 29863
Halothermothrix orenii strain DSM 9562; H 168
Holdemania filiformis strain DSM 12042
Lachnospiraceae bacterium strain 1_1_57FAA
Lactobacillus curvatus strain CRL 705
Lactobacillus sakei subsp. sakei strain 23K
Mahella australiensis strain 50-1 BON
Natranaerobius thermophilus strain JW/NM-WN-LF
Oscillibacter valericigenes strain Sjm18-20
Parabacteroides distasonis strain ATCC 8503
Parabacteroides johnsonii strain DSM 18315
Parabacteroides sp. strain D13
Pediococcus acidilactici strain DSM 20284
Pediococcus pentosaceus strain ATCC 25745
Pelotomaculum thermopropionicum strain SI
Pseudoflavonifractor capillosus strain ATCC 29799
Pseudoramibacter alactolyticus strain ATCC 23263
Roseburia hominis strain A2-183
Roseburia intestinalis strain M50/1
Ruminococcaceae bacterium strain D16
Ruminococcus bromii strain L2-63
Ruminococcus obeum strain A2-162
Ruminococcus sp. strain 18P13
Ruminococcus torques strain L2-14
Sphaerochaeta pleomorpha strain Grapes
Spirochaeta coccoides strain DSM 17374
Spirochaeta sp. strain Buddy
Subdoligranulum sp. strain 4_3_54A2FAA
Tepidanaerobacter sp. strain Re1
Thermoanaerobacter brockii subsp. finnii strain Ako-1
Thermoanaerobacter ethanolicus strain CCSD1
Thermoanaerobacter pseudethanolicus strain 39E; ATCC 33223
Thermoanaerobacter sp. strain X514
Thermosediminibacter oceani strain DSM 16646
Treponema brennaborense strain DSM 12168
Turicibacter sanguinis strain PC909


Note that these are all bacteria; no archaeons are included. (And yes, there are catalase-positive anaerobes among the Archaea.) The reason you don't see Bacteroides fragilis (which is catalase-positive) on the list is that, as explained before, B. fragilis ended up being classified an aerobe by my cytochrome-oxidase-based initial search. Even though "everybody knows" B. fragilis is anaerobic.

Incidentally, Blast searches were done with an E-value cutoff of 1e-5, to reduce the chance of false positives. (E-value is a measure of how likely it is that a given Blast match could have occurred due to chance. A threshold value of 1e-5 means the only matches that will be accepted are those that have less than a 1-in-100,000 chance of occurring by chance.)

If you learn of any other catalase-positive anaerobes that should be on this list, do be sure to let me know!
reade more... Résuméabuiyad

A Catalase Conundrum

When I was in grad school (U.C. Davis) in the late 1970s, the bacterial world was simply the prokaryotic world, and vice versa. There hadn't yet come a distinction between eubacteria and Archaea. But now we know, or think we know, that prokaryota come in two fundamental flavors: the true bacteria (eubacteria), and the Archaea (primitive extremophiles). If you were to want to count organelles (mitochondria, chloroplasts, others) as a third fundamental grouping, I suppose you could, with some justification.

At this writing, about 400 distinct Archaeal isolates, belonging to around 75 genera, have been DNA-sequenced. You can see a list of them by going to http://genomevolution.org/CoGe/OrganismView.pl?org_desc=Archaea and looking in the Organisms box. You'll see over 200 organisms listed, but bear in mind they belong to only about 75 genera. (Most genera are represented by more than one species and/or more than one isolate per species, in other words.)

Salt-loving Archaea species have been found growing in borax-saturated
desert ponds. The species growing in this small lake produce a
carotenoid pigment that gives the water a pink appearance.
The Archaea were once thought to be exclusively anaerobes, but it turns out there are a couple dozen aerobic (or facultatively anaerobic) genera in the group. In my own spare-time research, I've found that about 20% of the 75 sequenced Archaeons (all of them obligate anaerobes) have a catalase gene. (Catalase is the enzyme that breaks hydrogen peroxide down to water and oxygen.) Oddly, very few of the aerobic Archaea (except for the Halobacteriaceae group) show any evidence of having catalase. This is exactly the reverse of what's expected. In the rest of the living kingdom (from bacteria to higher plants and animals), aerobes universally have catalase; strict anaerobes don't have catalase (or at least, they aren't supposed to; but see this post for some surprising exceptions).

This is a hugely unexpected finding: Many anaerobic Archaeons have catalase, but not all aerobic ones do. Some enterprising grad student should tackle this and make a thesis project out of it.

In case you're that student, here are some additional clues.

Let's back up for a second and look at the Big Picture. No matter where on the Tree of Life you go, catalases come in only a few major types. (See the excellent 2003 review paper by Chelikani, Fita, and Loewen for details.) For example, there are heme-containing and non-heme catalases. Most of the time, what we think of as "catalase" is heme-containing catalase (and yes, that means it contains iron). In the heme-containing group, you have monofunctional catalase as well as bifunctional catalase-peroxidases or hydroperoxidases (katG). The monofunctionals come in big- and small-subunit varieties. (The biggies have subunits of 75 kDa or more and comprise just over 2100 base-pairs of DNA. The smalls have subunits under 60 kDa and typically top out at 1500 base-pairs.)

Here's what you really need to know: Within the monofunctionals, there are three clades (major subgroupings) of catalase. Clades 1 and 3 are small-subunit enzymes. Clade 1 is primarily of plant origin and is relatively rare in bacteria (the best-known examples probably being katX of Bacillus subtilis and catF of Pseudomonas syringae). Clade 3 takes in a huge number of catalases from bacteria, fungi, and various eukaryotes. (For Clade 3, think Staphylococcus catalase.)  Clade 2 is the large-subunit enzyme (think E. coli katE catalase).

The multifunctionals tend to be large (over 2100 base-pairs of DNA).

The non-heme catalases contain manganese instead of iron and are not your typical catalases. Let's leave it at that.

What do the Archaeons produce? From what little probing I've done, it seems the anaerobic Archaeons that have catalase use a modified Clade 3 type of enzyme that has little in common with other Clade 3 catalases. A few of the methane producers show good sequence agreement with Bacteroides fragilis catalase, but most anaerobic Archaeal catalases do not show good sequence concordance with any known eubacterial catalases. So it's entirely possible that a fourth clade of purely Archaeal small-subunit catalases (unlike anything else in the plant or animal worlds) awaits characterization.

The aerobic Archaeons that have catalase are all halophiles (members of the Halobacteriaceae), and all have large-subunit multifunctional peroxidases similar to those of the Cyanobacteria.

Mysteries waiting to be solved:
  • Why is it the aerobes Sulfolobus, Pyrobaculum, and Aeropyrum do not appear to have catalase? Is it that they don't have catalase, or do they have some as-yet-undiscovered new type of catalase?
  • Why is it that certain methane-generating anaerobes (e.g., Methanosarcina) have Clade 3 catalases but the rest of the methane-producing Archaea have catalases that don't match anything else in the living world? Did the former group get their catalase(s) by way of horizontal gene transfer from anaerobic eubacteria?
  • Did the multifunctional catalases of the Halobacteriaceae originally come from cyanobacteria (perhaps by way of plasmids)?
  • What overlap, if any, exists between Archaeal catalases and the catalases of algal chloroplasts?
If you find the answers to any of these, let me know!



reade more... Résuméabuiyad