Pages

.

Showing posts with label mitochondria. Show all posts
Showing posts with label mitochondria. Show all posts

Why Mitochondrial DNA is Different

Most genomes that are high in A+T content (or low in G+C content) show a surprising DNA strand asymmetry: The message strand of genes tends to be rich in purines. This rule applies across all domains I've looked at except mitochondria, where message strands tend to be pyrimidine-rich rather than purine-rich. The following two graphs makes this clearer.


This is a graph of message-strand (or RNA-synonymous-strand) purine content plotted vertically, against A+T plotted horizontally, for 1,373 bacterial species. Each dot represents a genome. High-GC/low-AT organisms like Streptomyces and Bordetella are on left and low-GC/high-AT organisms like Clostridium botulinum are toward the right. The few dots on the far right are intracellular endosymbionts that have lost a good bit of DNA over the millennia. They tend to be extremely high in A+T.

Compare the above graph with the graph below, which is the same thing (message-strand A+G vs. A+T) for mitochondrial DNA (N=2543 genomes). There is still an upward slope to the data (and in fact it is steeper than it looks, because the range of y-values is different in the graph below than in the graph above). The slope of the regression line is very nearly the same (0.148 vs. 0.149) for both graphs. But you can see that in the graph below, nearly all the points are below y = 0.50. That means message-strands are high in pyrimidines rather than purines.



I speculated in a previous post that the reason mitochondrial DNA is pyrimidine-heavy on the message strand is that mtDNA encodes a very small number of proteins (13, in all), and they tend to be membrane-associated proteins, which use mostly non-polar amino acids. It turns out that codons for the non-polar amino acids are pyrimidine-rich.

To see if that's really what's going on, I obtained the DNA sequences for cytochrome-c oxidase and NADH dehydrogenase (the two must fundamental enzyme systems of mitochondria) from several hundred bacterial species. Actually, I was able to obtain DNA sequences for a total of 942 bacterial NADH dehydrogenase (subunit L) proteins. I also succeeded in obtaining DNA sequences for 647 bacterial cytochrome-c oxidase subunit 1 proteins. In mitochondria, these genes are known as ND5 and Cox1. In bacteria they're better known as nuoL and cyoB.

The graph below shows A+G for the two enzymes versus whole-chromosome A+T, for the relevant organisms.

Message strand purine content was derived from the DNA sequences of cyoB (pink) genes from 942 bacteria, and from nuoL (blue) genes from 647 bacterial species. The A+G values were plotted against host-organism whole-genome A+T content. All cyoB and nuoL sequences tended to be pyrimidine rich. But pyrimidine content was less for organisms with high A+T content. (Note the slightly positive slope of the regression line.)

The pink points are for cytochrome-c oxidase subunit 1 (cyoB) while the blue points are for NADH dehydrogenase subunit 5 (nuoL). Two things are worth noting. One is that the regression line is upward-sloping, meaning that as an organism's DNA gets richer in A+T content, the purine content on the message strand rises. This effect seems to be universal. The second thing to note is that almost all of the points in the graph lie below y = 0.5, as is the case for mitochondria. These two signature "mitochondrial" enzyme systems, critical to oxidative phosphorylation (in bacteria as well as higher organisms), do tend to use pyrimidine-rich codons—rendering the relevant genes pyrminidine-rich on the RNA-synonymous (message) strand of DNA. The hypothesis is upheld.

For you bio students, a bit of homework: You might want to think about why it is that membrane-associated proteins are rich in non-polar amino acids. (In human mitochondria, leucine and isoleucine are the most-used amino acids. Together they account for an amazing 30% of all amino acids used in mtDNA-encoded gene products.) Hint: Most membranes have a lipid bilayer, and lipids don't like water.
reade more... Résuméabuiyad

Strand Asymmetry in Mitochondrial DNA

Funny how the availability of so much free DNA data can go to your head. When I learned that DNA sequence data for more than 2,000 mitochondrial genomes could be accessed, free, at genomevolution.org, I couldn't resist: I wrote some scripts that checked the DNA composition of 2,543 mtDNA (mitochondrial DNA) sequences. What I found blew me away.

If you're a biologist, you're accustomed to thinking of genome G+C (guanine plus cytosine) content as a kind of phylogenetic signature. (Related organisms usually have G+C values that are fairly close to one another.) For purposes of the following discussion, I'm going to reference A+T content, which is, of course, just one-minus-GC. (A GC content of 0.25, or 25%, means the AT content is 0.75, or 75%).

What I learned is that mitochondrial DNA shows strand asymmetry in coding regions (regions that actually get transcribed to RNA, as opposed to non-coding "control" regions and junk DNA). In particular, it shows an excess of pyrimidines (T and C) on the "message strand." This is the exact opposite of the situation in Archaea and bacteria, where message strands tend to accumulate purines (G and A).

The interesting thing is, just like bacteria (and Archaea), mitochondrial genomes tend to show a steady, predictable rate of increase of purines on the message strand with increasing A+T, even though purines are outnumbered by pyrimidines on the message strand. A picture might make this clearer:

Purine (A+G) content versus A+T for the message strand of mitochondrial DNA coding regions (N=2543).

Every point in this graph represents a mitochondrial genome (2,543 in all). As you can see, the regression line (which minimizes the sum of squared error) is upward-sloping, with a rise of 0.149, meaning that for every 10% increase in genome A+T content, there's a corresponding 1.49% increase in message-strand purine (A+G) content. What's striking about this is that in a similar graph for 1,373 bacterial genomes (see this post), the regression-line slope turned out to be 0.148.  Chargaff's second parity law predicts a straight horizontal line at y=0.5. Obviously that law is kaput.

I've written before about my repeated finding (in bacteria, Archaea, eukaryotes, viruses, bacteriophage; basically every place I look) that message-strand purine content accumulates in proportion to genome A+T content. Strand asymmetry with respect to purines and pyrimidines seems to be universal. But why?

Strand-asymmetric buildup of purines or pyrimidines is very hard to explain without invoking either a theory of strand-asymmetric DNA repair or a theory of strand-asymmetric mutagenesis, or both. Is it reasonable to suppose that one strand of DNA is more vulnerable to mutagenesis than another? Yes, if you accept that in a growing cell, the strands spend a good portion of their time apart (during transcription and replication). Neither replication nor transcription is symmetric in implementation. I'll spare you the details for the replication side of the argument, but suffice it to say, replication-related asymmetries are not likely (in my opinion) to be behind the purine/pyrminidine strand asymmetries I've been documenting. What we're seeing, I think, is the result of asymmetric repair at transcription time.

During transcription, a gene's DNA strands are separated. One strand is used as a template by RNA polymerase to create messenger RNA and ribosomal RNA. The other strand is free and floppy and vulnerable to attack by mutagens. But it's also readily accessible to repair enzymes.


The above diagram oversimplifies things considerably, but I include it for the benefit of non-biogeeks who might want to follow this argument through. Note that DNA strands have directionality: the sugar bonds face one way in one strand and the other way in the other strand. This is denoted by the so-called 5'-to'3 orientation of strands.(RNAP = RNA polymerase.)

DNA repair is a complex subject. Be assured, every cell, of every kind, has dozens of different kinds of enzymes devoted to DNA repair. Without these enzymes, life as we know it would end, because DNA is constantly undergoing attack and requiring repair.

The Ogg family of DNA base-excision enzymes exhibit
a signature helix-hairpin-helix topology (HhH). See
Faucher et al., Int J Mol Sci 2012; 13(6): 6711–6729.
Some types of repair take place in double-stranded DNA (that is, DNA that is not undergoing replication or transcription). Other types of repair apply to single-stranded DNA. In bacteria as well as higher life forms, there's a transcription-coupled repair system (TCRS) that comes into play when RNA polymerase is stalled by thymine dimers or other DNA damage. This remarkably elaborate system changes out short sections of damaged DNA (at considerable energy cost). Because it involves replacing whole nucleotides (sugar and all), it's categorized as a Nucleotide Execision Repair system (NER). The alternative to NER is Base Excision Repair (BER), which is where a defective base (usually an oxidized guanine) gets snipped out without removing any sugars from the DNA backbone. The enzymes that perform this base-clipping are generically known as glycosylases.

For many years, it was thought that mitochondria did not have DNA repair systems. We now know that's not true. Mitochondrial DNA is subject to constant oxidative attack and it turns out the damage is quickly repaired, in double-stranded DNA. Evidence for repair of single-stranded mtDNA is scant. Those who have looked for a transcription-coupled repair system (or indeed any NER system) in mitochondria have not found one. Mitochondrial BER repair (via Ogg1) does exist, but it seems to operate when the DNA is double-stranded, not during transcription. This makes sense, because for BER to finish, the strand must be nicked by AP endonuclease after the bad base is popped out, then the repair proceeds by matching the opposing base (opposite the abasic site) using the other strand as template. In Clostridia and Archaea (which have an Ogg enzyme that other bacteria do not have; see this post and this paper), Ogg1 can pop out a bad base while the DNA is single-stranded; Ogg1 then binds to the abasic site and is only released by AP endonuclease when it arrives later on.

Bottom line, we know that mitochondrial DNA spends much of its time in the unwound state (because mtDNA products are very highly transcribed) and that the non-transcribed DNA strand is extremely vulnerable to oxidative attack. (The template strand is less vulnerable, because it is cloaked in enzymes: RNA polymerase, transcription factors, ribosomes, etc.) We also know that 8-oxoguanine is the most prevalent form of oxidative damage in mtDNA and that, uncorrected, such damage leads to G-to-T transversion. The finding of consistently high pyrimidine content in the message strand of mitochondrial DNA (see graph further above) is consistent with a slower rate of repair of the non-transcribed strand, and the differential occurrence of G-to-T transversions on that strand. Or at least, that's a possible explanation of the pyrimidine richness of the message strand of mtDNA.

But there are additional factors to consider, such as selection pressure. Mitochondrial DNA tends to encode membrane-associated proteins, and membrane proteins use nonpolar amino acids, which are (in turn) predominantly encoded by pyrimidine-rich codons. More about this in an upcoming post.
reade more... Résuméabuiyad

A New Biological Constant?

Earlier, I gave evidence for a surprising relationship between the amount of G+C (guanine plus cytosine) in DNA and the amount of "purine loading" on the message strand in coding regions. The fact that message strands are often purine-rich is not new, of course; it's called Szybalski's Rule. What's new and unexpected is that the amount of G+C in the genome lets you predict the amount of purine loading. Also, Szybalski's rule is not always right.

Genome A+T content versus message-strand purine content (A+G) for 260 bacterial genera. Chargaff's second parity rule predicts a horizontal line at Y = 0.50. (Szybalski's rule says that all points should lie at or above 0.50.) Surprisingly, as A+T approaches 1.0, A/T approaches the Golden Ratio.
When you look at coding regions from many different bacterial species, you find that if a species has DNA with a G+C content below about 68%, it tends to have more purines than pyrimidines on the message strand (thus purine-rich mRNA). On the other hand, if an organism has extremely GC-rich DNA (G+C > 68%), a gene's message strand tends to have more pyrimidines than purines. What it means is that Szybalski's Rule is correct only for organisms with genome G+C content less than 68%. And Chargaff's second parity rule (which says that A=T an G=C even within a single strand of DNA) is flat-out wrong all the time, except at the 68% G+C point, where Chargaff is right now and then by chance.

Since the last time I wrote on this subject, I've had the chance to look at more than 1,000 additional genomes. What I've found is that the relationship between purine loading and G+C content applies not only to bacteria (and archaea) and eukaryotes, but to mitochondrial DNA, chloroplast DNA, and virus genomes (plant, animal, phage), as well.

The accompanying graphs tell the story, but I should explain a change in the way these graphs are prepared versus the graphs in my earlier posts. Earlier, I plotted G+C along the X-axis and purine/pyrmidine ratio on the Y-axis. I now plot A+T on the X-axis instead of G+C, in order to convert an inverse relationship to a direct relationship. Also, I now plot A+G (purines, as a mole fraction) on the Y-axis. Thus, X- and Y-axes are now both expressed in mole fractions, hence both are normalized to the unit interval (i.e., all values range from 0..1).

The graph above shows the relationship between genome A+T content and purine content of message strands in genomes for 260 bacterial genera. The straight line is regression-fitted to minimize the sum of squared absolute error. (Software by http://zunzun.com.) The line conforms to:

y = a + bx
 
where:
a =  0.45544384965539358
b = 0.14454244707261443


The line predicts that if a genome were to consist entirely of G+C (guanine and cytosine), it would be 45.54% guanine, whereas if (in some mythical creature) the genome were to consist entirely of A+T (adenine and thymine), adenine would comprise 59.99% of the DNA. Interestingly, the 95% confidence interval permits a value of 0.61803 at X = 1.0, which would mean that as guanine and cytosine diminish to zero, A/T approaches the Golden Ratio.

Do the most primitive bacteria (Archaea) also obey this relationship? Yes, they do. In preparing the graph below, I analyzed codon usage in 122 Archaeal genera to obtain A, G, T,  and C relative proportions in coding regions of genes. As you can see, the same basic relationship exists between purine content and A+T in Archaea as in Eubacteria. Regression analysis yielded a line with a slope of 0.16911 and a vertical offset 0.45865. So again, it's possible (or maybe it's just a very strange coincidence) that A/T approaches the Golden Ratio as A+T approaches unity.

Analysis of coding regions in 122 Archaea reveals that the same relationship exists between A+T content and purine mole-fraction (A+G) as exists in eubacteria.
For the graph below, I analyzed 114 eukaryotic genomes (everything from fungi and protists to insects, fish, worms, flowering and non-flowering plants, mosses, algae, and sundry warm- and cold-blooded animals). The slope of the generated regression line is 0.11567 and the vertical offset is 0.46116.

Eukaryotic organisms (N=114).

Mitochondria and chloroplasts (see the two graphs below) show a good bit more scatter in the data, but regression analysis still comes back with positive slopes (0.06702 and .13188, respectively) for the line of least squared absolute error.

Mitochondrial DNA (N=203).
Chloroplast DNA (N=227).
To see if this same fundamental relationship might hold even for viral genetic material, I looked at codon usage in 229 varieties of bacteriophage and 536 plant and animal viruses ranging in size from 3Kb to over 200 kilobases. Interestingly enough, the relationship between A+T and message-strand purine loading does indeed apply to viruses, despite the absence of dedicated protein-making machinery in a virion.

Plant and animal viruses (N=536).
Bacteriophage (N=229).
For the 536 plant and animal viruses (above left), the regression line has a slope of 0.23707 and meets the Y-axis at 0.62337 when X = 1.0. For bacteriophage (above right), the line's slope is 0.13733 and the vertical offset is 0.46395. (When inspecting the graphs, take note that the vertical-axis scaling is not the same for each graph. Hence the slopes are deceptive.) The Y-intercept at X = 1.0 is 0.60128. So again, it's possible A/T approaches the golden ratio as A+T approaches 100%.

The fact that viral nucleic acids follow the same purine trajectories as their hosts perhaps shouldn't come as a surprise, because viral genetic material is (in general) highly adapted to host machinery. Purine loading appropriate to the A+T milieu is just another adaptation.

It's striking that so many genomes, from so many diverse organisms (eubacteria, archaea, eukaryotes, viruses, bacteriophages, plus organelles), follow the same basic law of approximately

A+G = 0.46 + 0.14 * (A+T)

The above law is as universal a law of biology as I've ever seen. The only question is what to call the slope term. It's clearly a biological constant of considerable significance. Its physical interpretation is clear: It's the rate at which purines are accumulated in mRNA as genome A+T content increases. It says that a 1% increase in A+T content (or a 1% decrease in genome  G+C content) is worth a 0.14% increase in purine content in message strands. Maybe it should be called the purine rise rate? The purine amelioration rate?

Biologists, please feel free to get in touch to discuss. I'm interested in hearing your ideas. Reach out to me on LinkedIn, or simply leave a comment below.





reade more... Résuméabuiyad

Deep-Sea Vents: The Mosquito Connection

Quick: What species of life on earth is the most abundant? (Which species has more living members than any other species?) Hint: If an alien probe lands in a random location on earth, chances are better than 70% that the probe will encounter this organism.

If you're thinking in terms of the ocean, you're on the right track. What may surprise you is the connection between the world's-most-populous-organism (to be revealed shortly) and the mosquitoes that've been dive-bombing your neck all week. Equally amazing is the link between the mosquitoes in your back yard and hydrothermal vents in the ocean floor.

The hundreds of bright little particles at the
narrow end of this wasp egg are Wolbachia cells.
I wasn't thinking about marine biology or deep-sea hydrothermal vents when I went online at http://genomevolution.org the other day to do a little nosing around into the genome of Wolbachia pipientis, the ultra-tiny bacterial parasite carried by nearly every mosquito on earth. (Caution: Don't attempt the following DNA-analysis tricks on your own unless you want to become thoroughly addicted to desktop omics. I'm a microbiologist by training. I can do these stunts safely.) "Parasite" is actually the wrong word. Our tiny friend Wolbachia doesn't just parasitize the mosquito; it's an integral part of the mosquito. Wolbachia can't live outside its insect host—and guess what? The host frequently can't live without Wolbachia. The two provide essential services for each other, an arrangement known as mutualism.

I would argue that Wolbachia is more than a mutualistic symbiont: It's a proto-organelle, something very close to what Lynn Margulis had in mind as the ancestor of today's mitochondrion.

Wolbachia can't live on its own in the outside world (as far as anybody knows): it needs to live inside a host (generally an arthropod, although filarial worms also carry Wolbachia). Inside its host it occupies a very special niche: It lives in the nursery cells of the insect's ovary—the cells that will go on to become egg cells.

This is no ordinary symbiosis. I mentioned in an earlier post that Wolbachia carries with it genes for reverse-transcriptases, resolvases, recombinases, transposases, translocases, DNA polymerases, RNA polymerases, and phage integrases—a complete suite of retroviral machinery, designed for export of foreign DNA into host DNA. And indeed, researchers have found that Wolbachia DNA is quite often embedded in the host's own nuclear DNA. (One group, looking at four insect hosts and four nematode hosts, found anywhere from 500 base-pairs to over a million base pairs of Wolbachia DNA residing in the nucleus. Another group found 45 Wolbachia genes incorporated in a fruit-fly host's nuclear DNA.) The situation with Wolbachia thus parallels the situation with mitochondria, where we know that 97% of the gene products that go to make up a mitochondrion are actually encoded in nuclear DNA, not mitochondrial DNA.

When you encounter an organism as baffling as Wolbachia, oftentimes you want to know what its relatives are—what it's most closely related to. When a new or poorly understood organism has a close relative that's already well-studied, sometimes you learn a lot in a hurry. That's particularly true of pathogens (not that Wolbachia is a pathogen per se). Pathogens have virulence strategies of various kinds. Maybe Wolbachia has symbiosis strategies that it learned from a relative?

The problem with a lot of the super-tiny microbes (which Wolbachia definitely is, with only a quarter as much DNA as E. coli) is that their relatedness is not always well understood. Organisms are assigned a taxonomic slot, then the assignment changes a few years later, after they're better-studied. (So for example, Cowdria ruminantium was eventually renamed Ehrlichia ruminantium, and a bunch of former Ehrlichias are now Neorickettsias, except the ones that attack red blood cells, which are now Anaplasmas.) Taxonomy at this end of the evolutionary tree is definitely a work in progress.
Deep-sea thermal vents like this one
are home to organisms like Thiomicrospira
that can grow on sulfide, CO2, and basic salts.

Fortunately, it's easy nowadays (what with so many organisms' DNA sequences available online) to go on the web and compare genomes directly, using a tool like SynMap, which is what I started doing with Wolbachia. I started going down the list of mini-microorganisms and began running DNA similarity tests of Wolbachia against Ehrlichia, Neorickettsia, Anaplasma, Chlamydia, and "the usual suspects" at the ultra-small-chromosome end of the tree of life.

What I found surprised me. A bizarre little bacterium called Thiomicrospira kept showing up in my BLAST searches as having many genes in common with Wolbachia (based on sequence matches in large numbers of genes). None of the taxonomy charts showed the two to be related. But DNA doesn't lie. I kept coming up with matches across hundreds of genes. (Bear in mind, Wolbachia has only about 1300 genes to begin with, which is very small, even for a bacterium.)

What's bizarre about Thiomicrospira is that it's one of those fairly newly discovered microbes that lives on sulfur, heat, and CO2 at the bottom of the ocean, in total darkness, in the vicinity of thermal vents. Thiomicrospira is the kind of life form NASA takes a great interest in, because it could be a prototype for exactly the type of survive-in-the-dark CO2-using organism that might live under the ice crust of Europa (Saturn's moon). In theory, there could be geothermal vents on the floor of the large ocean of liquid water that NASA is pretty sure exists under Europa's ice. If there's life down there, it could very well look like Thiomicrospira.

But why should Thiomicrospira have so many genes in common with a mosquito symbiont? Thiomicrospira organism lives at the bottom of the ocean; Wolbachia lives inside arthropod eggs. One obtains its carbon in the form of CO2; the other produces CO2 as a waste product. One is adapted to live in warm salt water; the other lives in cold-blooded insects. In theory, these two germs couldn't be further apart. And yet, oddly enough, they not only have hundreds of genes in common, the genes are well-matched from a DNA sequence-similarity standpoint. Thiomicrospira's DNA even incorporates a prophage module, and some of its phage genes show a high percentage base-pair similarity with the phage genes of Wolbachia. (See screen shot below.)
Remarkably, Thiomicrospira and Wolbachia share certain phage genes in common, as shown here. The genes have a DNA sequence identity of about 60%.
After doing a little more detective work, I found an organism that might very well form a "missing link" between the mosquito symbiont and the thermal-vent dweller. This organism kept showing up in my analyses as having a high degree of DNA similarity with both Thiomicrospira and Wolbachia. The organism in question is Pelagibacter ubique (now known as Candidatus pelagibacter, although some might question this taxonomic assignment since all other Candidatus members are obligate intracellular symbionts), and it's an astonishing organism in two ways: First, it's the smallest non-parasitic (free-living) bacterium known to science, with only 1.3 million base-pairs in its DNA (making it slightly smaller than Wolbachia and its tiny cousins). Secondly, it's the most numerous living thing on earth. It's present in large amounts in every one of earth's oceans.

Pelagibacter was placed in the Candidatus clade in 2007 due to its small genome and cell size and certain ribosomal markers. It has a very mitochondria-like genetic profile, and in fact some people think Pelagibacter is the ancestor of today's mitochondrion, a theory that's all the more satisfying when you consider that Pelagibacter is both ancient and tied to the sea.

My analysis using SynMap found that Pelagibacter and its thermal-vent-dwelling cousin Thiomicrospira share about 660 genes (out of 1480 or so for Pelagibacter), whereas Wolbachia and Pelagibacter share around 581, and Thiomicrospira and Wolbachia share around 1000. These are so-called non-syntenous point matches between genes; instances where the same gene occurs in both organisms, with a high percentage of base-pair matching. Synteny is a concept that takes gene-matching one step further and says that clusters of similar genes are what count. Synteny at the level of higher plants and animals is one thing, but at the level of a mini-microbe it tends to lack meaning, because the genes of bugs like Wolbachia are notoriously mobile: They find new positions on the chromosome over time (probably because of the large number of transposases, nucleases, and integrases in the genome). Even so, I decided to carry out a bit of syntenic analysis to see what I could find out.

For purposes of my analysis I defined a "syntenon" as three or more co-proximal genes that match three or more genes on the other organism's genome. But to be part of a syntenon, all three genes in a triplet have to occur within a 30-gene span (and match 3 genes in a 30-gene span on the other organism's DNA) plus the genes have to be in the same order in both organisms.

A planet-spanning waterworld is thought to exist under
Europa's icy outer crust. If thermal vents exist at the
bottom, any life that exists may look a lot like Thiomicrospira.
Using SynMap, I found that whereas Wolbachia and Pelagibacter share around 157 syntenic genes, and Thiomicrospira and Wolbachia share around 132, Thiomicrospira and Pelagibacter share 250 (which makes sense in that both are ocean-dwellers). For comparison-and-control purposes, I did a triplet match of Thiomicrospira against another chemoautotroph (an organism that gets energy from inorganic chemicals, and carbon from CO2), namely Methanothermobacter marburgensis. There were only 53 syntenic triplets in common between the two chemoautotrophs. (Between Wolbachia and Methanothermobacter, on the other hand, there were only 3 triplet-matches.) Doing a match between two Wolbachia species (a mosquito-dwelling variety and a fruit-fly-dwelling cousin) produced 522 gene matches in syntenic triplets.

It seems reasonable to me, based not just on the previous sorts of analysis but also direct inspection of the genomes (in terms of their respective protein products), that Thiomicrospira evolved from PelagibacterPelagibacter is the most abundant life form in the ocean, and perhaps the oldest. Pelagibacter is also very mitochondria-like, and so is Thiomicrospira, which has rhodanese-like proteins, the full cytochrome system, redox enzymes, citric-acid-cycle enzymes, plus certain characteristic membrane and sensor proteins, flippases, etc. (For what it's worth, Thiomicrospira has the highest signal-transduction profile I've ever seen at http://mistdb.com, again making it very mitochondrial-feeling.)

I'm tempted to say, similarly, that Thiomicrospira and Wolbachia are related. They have phage proteins in common. They both have genes for patatin proteins. They share multiple drug resistance genes. (That's not so strange. Antibiotics occur naturally in the environment.) They share genes for Flp-type pilins. Plus many more coincidences, big and small.

At first blush, a deep-sea thermal vent seems pretty far removed, environmentally, from the egg cell of a mosquito. How to reconcile the difference? Actually, I see similarities. Thiomicrospira thrives at temperatures of 28 to 32 degrees Celsius (which is also true of mosquitoes, although they prefer the 28-degree end of the scale). And blood (the preferred food source for mosquitoes) is comparable in pH and salinity to seawater. Also, mosquitoes have an aquatic lifecycle: they require brackish water in which to lay eggs. Mosquitoes and salt marshes go back millions of years.

It's even possible that Wolbachia might live in deep-sea-vent-dwelling host organisms. In fact, I predict they will be found there. Why? Because in addition to inhabiting flying insects, spiders, mites, and ticks (and filarial worms), Wolbachia have also been found in a very high percentage of crustaceans. We know that crustaceans are often found living near deep-sea thermal vents; and many crustaceans show the characteristic feminization of genetic males that's so often the tipoff to a massive Wolbachia presence in insect populations.

Insects and crustaceans represent two of the oldest, most successful, and most widely distributed life forms of the animal kingdom. Would it really be so surprising if the bacteria that colonize these life forms are closely related to the most common marine bacteria on the planet? I don't think so. Stranger things have happened.



reade more... Résuméabuiyad

Science on the Desktop

For decades, I've been hoping I'd live long enough to see a day when serious science could be done on the desktop by dedicated amateurs. Amateur astronomers know what I'm talking about. You can't do much particle physics on the desktop, and there are no affordable desktop electron microscopes (yet), but if comparative genomics is your thing? Get ready to rock and roll, my friend.

Over the weekend I discovered http://genomevolution.org and promptly went nuts. Let me take you on a tour of what's possible.

First I should explain that my background is in microbiology, and I've always had a soft spot in my heart (not literally) for organisms with ultra-tiny genomes: things like Chlamydia trachomatis, the sexually transmitted parasite. It's technically a bacterium, but you can't grow it in a dish. It requires a host cell in which to live.

It turns out there are many of these itty-bitty obligate endosymbionts (at least a dozen major families are known), and because of their small size and obligate intracellular lifestyle, they have a lot in common with mitochondria. Which is to say, like mitochondria, they're about a micron in size, they divide on their own, they have circular DNA, and they provide services to the host in exchange for living quarters.

When you look at one of these little creatures under the microscope (whether it's Chlamydia or Ehrlichia or Anaplasma or what have you), you see pretty much the same thing. (See photo.) Namely, a tiny bacterium living in cytoplasm, mimicking a mitochondrion.

When Lynn Margulis wrote her classic 1967 paper suggesting that mitochondria were once tiny bacterial endosymbionts, it seemed laughable at the time, and her ideas were widely criticized (in fact her paper was "rejected by about fifteen journals," she once recalled). Now it's taught in school, of course. But we have a long way to go before we understand how mitochondria work. And we really, really need to know how they work, because for one thing, mitochondria seem to be deeply involved in orchestrating apoptosis (programmed cell death) and various kinds of signal transduction, and until we understand how all that works, we're going to be hindered in understanding cancer.

When I discovered the tools at http://genomevolution.org, one of the first things I did, on a what-the-hell basis, was compare the genomes of two small endosymbionts, Wolbachia pipientis and Neorickettsia sennetsu. The former lives in insects; the latter, in flatworms that infect fish, bats, birds, horses, and probably lots else. Note that for a horse to get Potomac horse fever, first the Neorickettsia has to infect a tiny flatworm; then the flatworm has to be ingested by a dragonfly, caddisfly, or mayfly; then the horse has to eat (or maybe be bitten by, although only infection-by-ingestion has been demonstrated) the worm-infected fly. The parasite-of-a-parasite chain of events is not only fascinating in its own right, it suggests (to me) that parasites enable each other through shared strategies at the biochemical level, and I might as well spoil some suspense here by revealing that there's even yet another layer of parasitism (and biochemical enablement) going on in this picture, involving viruses. But we're getting ahead of ourselves.

I mentioned Wolbachia a second ago. Wolbachia is a fascinating little critter, because it's found in the reproductive tract of anywhere from 20% to 70% of all insects (plus an undetermined number of spiders, mites, crustaceans, and nematodes), but they don't cause disease, and in fact it appears many insects are unable to survive without them. Wolbachia are unusual in that the extracellular phase of their lifecycle (the part where they spread from one host to another) isn't known; no one has observed it. What's more (and this part is incredible), Wolbachia have adapted to a stem-cell niche: They live in the cells that give rise to insect egg cells. Thus, all newborn female progeny of an infected mother are infected, and all eggs pass on the Wolbachia. In this sense, the genetics of Wolbachia obey mitochondrial genetics (whereby the mother passes on the organelle and its genome).

I quickly found, via Sunday afternoon desktop genomics, that Wolbachia and Neorickettsia (and other endosymbionts: Anaplasma, Ehrlichia, etc.) have many genes in common—hundreds, in fact. And when I say "genes in common," I mean that the genes often show better-than-50% similarity in DNA base-pair matching.

It's important to put some context on this. These little organisms have DNA that encodes only 1,000 genes. (By comparison, E. coli has around 4,400 genes.) Endosymbionts lack genes for common metabolic pathways. They cannot biosynthesize amino acids, for example; instead they rely on the host to provide such nutrients ready-made. If 400 to 500 of an endosymbiont's 1,000 genes are shared across major endosymbiont families, that's a huge percentage. It suggests there's a set of core genes, numbering in the low hundreds, that encapsulate the basic "strategy" of endosymbiosis.

A little more context: Mitochondria have their own DNA and look a lot like endosymbionts. But here's the thing: Mitochondrial DNA is tiny (only about 15,000 base pairs, versus a million for an endosymbiont). It turns out, 97% of the "stuff" that makes up a mitochondrion is encoded in the nucleus of the host. If you include these nuclear genes, mitochondria actually rely on about 1,000 genes total, of which only 3% are in the organelle's DNA. Lynn Margulis would say that what happened is, the endosymbiont ancestor of today's mitochondrion originally had DNA of about a million base-pairs (1,000 genes), but some time after taking up residency in the host cell, the invader's DNA mostly migrated to the host nucleus.

Why did symbiont-to-host DNA migration stop at 97%? Why not 100%? If we look at that 3%, we find genes coding for tRNA and bacterial ribosomes (specialized protein-making machinery) plus genes for enormous, complex transmembrane enzyme systems: cytochrome c oxidase and NADH dehydrogenase. (The former is the endpoint of oxidative respiration; the latter the entry-point.) Obviously it must be advantageous for these genes to be proximal to the organelle.

But why even have an organelle (a physical compartment)? One might ask why it's necessary to have a mitochondrial parasite swimming around in the cytoplasm at all, when most of the genes are part of the host's DNA? The answer is, the stuff that goes on inside the confines of the mitochondrion needs to be contained, because it's violently toxic stuff involving superoxide radicals, redox reactions, "proton pumps," and Fenton chemistry (transition-metal peroxide reactions). A containment structure is definitely called for, to segregate this toxic chemistry from the rest of the cell.

We might ask how it is that the DNA of the protobacterial ancestor of today's mitochondria wound up in the host nucleus in the first place. Let's consider the possibilities. Protobacterial (symbiont) DNA may have transferred to the host all at once, or it might have migrated piecemeal, over time. Or both. Is it realistic that huge amounts of endosymbiont DNA could have migrated to the host nucleus all at once? Yes. It's been suggested that vacuolar phagocytosis drove invader DNA to the nucleus in a big gulp. Evidence? Wolbachia inhabits the vacuolar space.

But export of genes and gene products to the host might have occurred piecemeal as well. A little desktop exploration provides some clues. If you use GenomeView or any number of other online tools to explore the DNA of Wolbachia, several things pop out at you. First is that many Wolbachia genes are mitochondria-like: They encode for things like cytochrome c oxidase, cytochrome b, NADH dehydrogenase, succinyl-CoA synthetase, Fenton-chemistry enzymes, and a slew of oxidases and reductases (including a nitroreductase). Wolbachia is clearly engaged in providing what might be called redox-detox services for the host—the same value proposition that mitochondria offer. This makes sense, because if Wolbachia cells were a net drag on the respiratory potential of host-cell mitochondria (if they couldn't at least hold their own with respect to mitochondria), the host would die.

The second thing that jumps out at you when you look at the Wolbachia genome is the abundance of genes devoted to export processes: membrane proteins, permeases, type I, II, and IV secretion systems, ABC transporters, etc., plus at least 60 ankyrin-repeat-domain genes—all powerful evidence of specializations aimed at export of genes and gene products to the host. But the most stunning "smoking gun" of all is the presence, in Wolbachia DNA, of five reverse-transcriptase genes, plus genes for resolvases, recombinases, transposases, DNA polymerases, RNA polymerases, and phage integrases. In essence, there's a complete suite of retroviral machinery, designed for export of foreign DNA into host DNA.

An example of one of 113 phage-derived genes in Wolbachia (lower gene array). In this case, the gene matches a phage gene found in Candidatus hamiltonella (upper gene array). The two isoforms exhibit 59% DNA sequence similarity, despite widely differing GC ratios. See text for discussion.

But wait. There's more. The third thing that jumps straight in your face when you start looking at the Wolbachia genome is the presence of (are you ready?) no less than 113 genes for phage-related proteins, including major and minor capsid and HK97-style prohead proteins, plus tail proteins, baseplate, tail tube, tail tape-measure, and sheath proteins; late control gene D; phage DNA methylases; and so on. (For non-biologists: phage is the term for viruses that attack bacteria.)

In the above screenshot, I'm comparing Wolbachia DNA (lower strip) to DNA from another insect-infecting endosymbiont, Candidatus hamiltonella, which is known to contain an intact virus (phage) in its DNA. Many phage proteins in Wolbachia have corresponding matches in the Candidatus genome. In this case, we're looking at a gene (the gold-colored stretch pointed at by red arrows) that is 1440 nucleotides long, with a 59% sequence match across genomes. The match percentage is remarkably high given that the Candidatus version of this gene has a 51.7% GC content while the Wolbachia version has a 40.6% GC. Also, note that Wolbachia itself has an overall GC of 34.2%. The fact that Wolbachia's putative phage genes are significantly higher in GC content than Wolbachia's non-phage genes is good confirmation that the genes really are from phage.

It's 100% clear that viral DNA has made its way into the DNA of Wolbachia (either recently or long ago), and it's reasonable to hypothesize that Wolbachia has repurposed the retrovirus-like phage genes for packaging and exporting Wolbachia DNA to the host nucleus.

Okay, so maybe you have to be a biologist for any of this stuff to make your hairs stand on end. To me, it's a dream come true to be able to do this kind of detective work on a Sunday afternoon while sitting on the living-room couch, using nothing more than a decrepit five-year-old Dell laptop with a wireless connection. The notion that you can do comparative genomics and proteomics while watching an Ancient Aliens rerun on TV is (for me) totally cerebrum-blowing. It makes me wonder what's just around the corner.

reade more... Résuméabuiyad