Pages

.

Showing posts with label synteny. Show all posts
Showing posts with label synteny. Show all posts

A Tale of Two Microbes

One area where Big Data has started to pay big dividends is in genome research, and you can begin to taste the payoff yourself, right now, if you want to come along as I show you how to mine genetic data from public databases in the service of a little desktop microbial genetics. You'll be amazed at what you can do.
No one knows why, but when Ralstonia eutropha
eats too much, it produces plastic granules
instead of, say, starch or fat. Go figure.

For today's experiment, we're going to compare the genomes of two bacteria, one of which you know very well, the other of which you don't, unless you've got way too much time on your hands. The germ you already know is Bordetella, the whooping cough bug. The bug you haven't heard of is Ralstonia eutropha, a soil organism that has the amazing ability to subsist only on hydrogen gas, nitrate, and carbon dioxide. In return, it produces wicked-crazy quantities of plastic (yes, plastic—it stores carbon as polyhydroxybutyrate), and because it's potentially useful to industry, Ralstonia's DNA, like Bordetella's, has been fully sequenced.

If you go right now to http://genomevolution.org/r/8o1x, you'll see that I've set up a little experiment for you. You shouldn't have to press the pink "Generate SynMap" button on that page. It should run automatically (but if you don't see an image like the one below, hit the button).

Every dot in this dot-plot represents a match between
a gene in Bordetella bronchiseptica and a gene in
Ralstonia eutropha. See text for discussion.
What has happened is that the SynMap server has been instructed to go find the complete DNA sequence of Ralstonia eutropha Strain H16 as well as the complete DNA sequence for Bordetella bronchiseptica Strain RB50, and run a comparison of one against the other. It so happens Bordetella has a single chromosome with 5,339,179 base pairs, whereas our hydrogen-loving, plastic-storing friend Ralstonia has 3 chromosomes totalling 7,416,678 base pairs. (It has one main chromosome, and two small auxiliary chromosomes called plasmids.)

Every point on the above graph represents a match between a gene in Bordetella and a gene in Ralstonia. The X-axis represents locations on the Bordetella genome (starting from one end and going to the other). The Y-axis plots locations on the Ralstonia genome. All we're doing is mapping one genome to another and tallying the significant matches.

This is a massive number of matches (well over 10,000), just to let you know. Usually, when you compare organisms, you don't see this many dots. I chose Bordetella and Ralstonia because I knew there'd be a lot of hits, based on my own prior experiments. And by the way, I don't think most microbiologists are aware (yet) that Bordetella and Ralstonia are extremely closely related. This is new information I'm sharing with you.

It's one thing to get a bunch of points on a dot-plot, but how do we really know these two organisms are related? This is where synteny comes in. Synteny is the degree to which two chromosomes share blocks of order. The key intuition is that merely sharing genes isn't enough; what counts is whether matching genes are in the same arrangements. If genome A has genes X, Y, and Z, in that order, and genome B also has genes X, Y, and Z (in the same order), we say that A and B share a syntenous triplet. The genomes have a degree of synteny.

The SynMap tool is very powerful because it lets you find syntenous regions in DNA, and it's tunable. If you go to the Analysis Options tab on the SynMap page, you'll see that you can set two parameters called Maximum Distance Between Two Matches, and Minimum Number of Aligned Pairs. The URL that I sent you to (for our experiment) has values of 50 and 2, respectively, already dialed in. That means the graph is plotting every occurrence of 2 gene-pair matches that occurred between genes no more than 50 genes apart. That's a pretty liberal setting. If two organisms are related, you can expect to see a lot of matches.

But what I propose you try (if you want) is setting "Maximum Distance Between Two Matches" to 500 and "Minimum Number of Aligned Pairs" to 250. (Then click the Generate SynMap button to refresh the graph.) This is a much more stringent requirement: It tells SynMap to try to find 250 matched genes within any given 500-gene region, do it for all regions of both genomes, and plot the results, if any. A 250-gene chunk is a pretty large syntenous region for a creature that has only 10,000-or-so genes to begin with.

The result of our hunt for super-large 250-gene syntenous regions is shown in the first graph below. The red dots represent the regions. They run from the top of the Y-axis to the lower right corner. Remember that the axes map directly to positions on the genome. What the diagonal line says is that there's a near-linear mapping of syntenous regions from one genome to the other.

The second graph below shows what happens when we re-tune our DNA-matching parameters to find blocks of 200 ordered genes within each 500-gene domain. We're looking for shorter runs of genes (200 instead of 250), which should be more plentiful. And they are. This time our graph looks like an 'X'. Why? Bacterial chromosomes do a lot of rearranging, and one of the most common events is a symmetric inversion around the origin of replication (and/or the terminus of replication). If you get enough of these inversions of various sizes, you end up with pieces of DNA that used to be near the start of the chromosome ending up near the end, and vice versa. (Repeat for all intermediate locations as well.) If you want to know more about how and why this ends up making an X-pattern on a dot-plot, be sure and read the classic paper by Eisen et al. called "Evidence for symmetric chromosomal inversions around the replication origin in bacteria," Genome Biology 2000, 1(6):research0011.1–0011.9 (unlocked PDF here).

Genomes compared with synteny-block size 250.
Synteny block size 200.
Block size 175.
Block size 120, max domain size 180 genes.
Block size 90, max domain 130.
Block size 2, max domain size 50.
 
The third and fourth graphs in this series show what happens when we tune our match for smaller block sizes. In the third graph, we've set "Maximum Distance Between Two Matches" to 500 and "Minimum Number of Aligned Pairs" to 175, which produces what looks like two really poorly drawn X's superimposed on each other. As we get more permissive with our synteny matches, we start to see the results of more inversion events. It makes sense that shorter synteny blocks will be swept up in more successful inversions, because an inversion that cuts across a large synteny block is probably fatal in many cases. (Some large groups of genes need to be kept together, for proper gene regulation. If an inversion event cuts through a critical regulon at the wrong spot, the cell might not go on to reproduce.)

As we keep tuning the "Minimum Number of Aligned Pairs" downward, the graphs become more cluttered as we see the results of many thousands of inversion events in the history of the chromosomes.

The fourth graph uses values of 180 and 120 for Max Distance and Minimum Number of Aligned Pairs, then in graph five we have values of 130 and 90. And finally, in the last graph, we have 50 and 2. The final graph is mostly noise. But buried in the noise are many faint signals that can be seen by twiddling the knobs on the synteny settings.

I hope this bit of desktop genomics has convinced you that desktop genomics has reached an exciting stage indeed. (I've only scratched the surface, here, of what the tools at http://genomevolution.org can do.) I also hope I've convinced any microbial geneticists who might be reading this that Bordetella and Ralstonia are very closely related indeed. (Which should come as news. I don't think it's been reported.) You wouldn't think a hydrogen-loving soil organism would have much in common with a throat-dwelling pathogen, but as I like to say: DNA doesn't lie!
reade more... Résuméabuiyad

Deep-Sea Vents: The Mosquito Connection

Quick: What species of life on earth is the most abundant? (Which species has more living members than any other species?) Hint: If an alien probe lands in a random location on earth, chances are better than 70% that the probe will encounter this organism.

If you're thinking in terms of the ocean, you're on the right track. What may surprise you is the connection between the world's-most-populous-organism (to be revealed shortly) and the mosquitoes that've been dive-bombing your neck all week. Equally amazing is the link between the mosquitoes in your back yard and hydrothermal vents in the ocean floor.

The hundreds of bright little particles at the
narrow end of this wasp egg are Wolbachia cells.
I wasn't thinking about marine biology or deep-sea hydrothermal vents when I went online at http://genomevolution.org the other day to do a little nosing around into the genome of Wolbachia pipientis, the ultra-tiny bacterial parasite carried by nearly every mosquito on earth. (Caution: Don't attempt the following DNA-analysis tricks on your own unless you want to become thoroughly addicted to desktop omics. I'm a microbiologist by training. I can do these stunts safely.) "Parasite" is actually the wrong word. Our tiny friend Wolbachia doesn't just parasitize the mosquito; it's an integral part of the mosquito. Wolbachia can't live outside its insect host—and guess what? The host frequently can't live without Wolbachia. The two provide essential services for each other, an arrangement known as mutualism.

I would argue that Wolbachia is more than a mutualistic symbiont: It's a proto-organelle, something very close to what Lynn Margulis had in mind as the ancestor of today's mitochondrion.

Wolbachia can't live on its own in the outside world (as far as anybody knows): it needs to live inside a host (generally an arthropod, although filarial worms also carry Wolbachia). Inside its host it occupies a very special niche: It lives in the nursery cells of the insect's ovary—the cells that will go on to become egg cells.

This is no ordinary symbiosis. I mentioned in an earlier post that Wolbachia carries with it genes for reverse-transcriptases, resolvases, recombinases, transposases, translocases, DNA polymerases, RNA polymerases, and phage integrases—a complete suite of retroviral machinery, designed for export of foreign DNA into host DNA. And indeed, researchers have found that Wolbachia DNA is quite often embedded in the host's own nuclear DNA. (One group, looking at four insect hosts and four nematode hosts, found anywhere from 500 base-pairs to over a million base pairs of Wolbachia DNA residing in the nucleus. Another group found 45 Wolbachia genes incorporated in a fruit-fly host's nuclear DNA.) The situation with Wolbachia thus parallels the situation with mitochondria, where we know that 97% of the gene products that go to make up a mitochondrion are actually encoded in nuclear DNA, not mitochondrial DNA.

When you encounter an organism as baffling as Wolbachia, oftentimes you want to know what its relatives are—what it's most closely related to. When a new or poorly understood organism has a close relative that's already well-studied, sometimes you learn a lot in a hurry. That's particularly true of pathogens (not that Wolbachia is a pathogen per se). Pathogens have virulence strategies of various kinds. Maybe Wolbachia has symbiosis strategies that it learned from a relative?

The problem with a lot of the super-tiny microbes (which Wolbachia definitely is, with only a quarter as much DNA as E. coli) is that their relatedness is not always well understood. Organisms are assigned a taxonomic slot, then the assignment changes a few years later, after they're better-studied. (So for example, Cowdria ruminantium was eventually renamed Ehrlichia ruminantium, and a bunch of former Ehrlichias are now Neorickettsias, except the ones that attack red blood cells, which are now Anaplasmas.) Taxonomy at this end of the evolutionary tree is definitely a work in progress.
Deep-sea thermal vents like this one
are home to organisms like Thiomicrospira
that can grow on sulfide, CO2, and basic salts.

Fortunately, it's easy nowadays (what with so many organisms' DNA sequences available online) to go on the web and compare genomes directly, using a tool like SynMap, which is what I started doing with Wolbachia. I started going down the list of mini-microorganisms and began running DNA similarity tests of Wolbachia against Ehrlichia, Neorickettsia, Anaplasma, Chlamydia, and "the usual suspects" at the ultra-small-chromosome end of the tree of life.

What I found surprised me. A bizarre little bacterium called Thiomicrospira kept showing up in my BLAST searches as having many genes in common with Wolbachia (based on sequence matches in large numbers of genes). None of the taxonomy charts showed the two to be related. But DNA doesn't lie. I kept coming up with matches across hundreds of genes. (Bear in mind, Wolbachia has only about 1300 genes to begin with, which is very small, even for a bacterium.)

What's bizarre about Thiomicrospira is that it's one of those fairly newly discovered microbes that lives on sulfur, heat, and CO2 at the bottom of the ocean, in total darkness, in the vicinity of thermal vents. Thiomicrospira is the kind of life form NASA takes a great interest in, because it could be a prototype for exactly the type of survive-in-the-dark CO2-using organism that might live under the ice crust of Europa (Saturn's moon). In theory, there could be geothermal vents on the floor of the large ocean of liquid water that NASA is pretty sure exists under Europa's ice. If there's life down there, it could very well look like Thiomicrospira.

But why should Thiomicrospira have so many genes in common with a mosquito symbiont? Thiomicrospira organism lives at the bottom of the ocean; Wolbachia lives inside arthropod eggs. One obtains its carbon in the form of CO2; the other produces CO2 as a waste product. One is adapted to live in warm salt water; the other lives in cold-blooded insects. In theory, these two germs couldn't be further apart. And yet, oddly enough, they not only have hundreds of genes in common, the genes are well-matched from a DNA sequence-similarity standpoint. Thiomicrospira's DNA even incorporates a prophage module, and some of its phage genes show a high percentage base-pair similarity with the phage genes of Wolbachia. (See screen shot below.)
Remarkably, Thiomicrospira and Wolbachia share certain phage genes in common, as shown here. The genes have a DNA sequence identity of about 60%.
After doing a little more detective work, I found an organism that might very well form a "missing link" between the mosquito symbiont and the thermal-vent dweller. This organism kept showing up in my analyses as having a high degree of DNA similarity with both Thiomicrospira and Wolbachia. The organism in question is Pelagibacter ubique (now known as Candidatus pelagibacter, although some might question this taxonomic assignment since all other Candidatus members are obligate intracellular symbionts), and it's an astonishing organism in two ways: First, it's the smallest non-parasitic (free-living) bacterium known to science, with only 1.3 million base-pairs in its DNA (making it slightly smaller than Wolbachia and its tiny cousins). Secondly, it's the most numerous living thing on earth. It's present in large amounts in every one of earth's oceans.

Pelagibacter was placed in the Candidatus clade in 2007 due to its small genome and cell size and certain ribosomal markers. It has a very mitochondria-like genetic profile, and in fact some people think Pelagibacter is the ancestor of today's mitochondrion, a theory that's all the more satisfying when you consider that Pelagibacter is both ancient and tied to the sea.

My analysis using SynMap found that Pelagibacter and its thermal-vent-dwelling cousin Thiomicrospira share about 660 genes (out of 1480 or so for Pelagibacter), whereas Wolbachia and Pelagibacter share around 581, and Thiomicrospira and Wolbachia share around 1000. These are so-called non-syntenous point matches between genes; instances where the same gene occurs in both organisms, with a high percentage of base-pair matching. Synteny is a concept that takes gene-matching one step further and says that clusters of similar genes are what count. Synteny at the level of higher plants and animals is one thing, but at the level of a mini-microbe it tends to lack meaning, because the genes of bugs like Wolbachia are notoriously mobile: They find new positions on the chromosome over time (probably because of the large number of transposases, nucleases, and integrases in the genome). Even so, I decided to carry out a bit of syntenic analysis to see what I could find out.

For purposes of my analysis I defined a "syntenon" as three or more co-proximal genes that match three or more genes on the other organism's genome. But to be part of a syntenon, all three genes in a triplet have to occur within a 30-gene span (and match 3 genes in a 30-gene span on the other organism's DNA) plus the genes have to be in the same order in both organisms.

A planet-spanning waterworld is thought to exist under
Europa's icy outer crust. If thermal vents exist at the
bottom, any life that exists may look a lot like Thiomicrospira.
Using SynMap, I found that whereas Wolbachia and Pelagibacter share around 157 syntenic genes, and Thiomicrospira and Wolbachia share around 132, Thiomicrospira and Pelagibacter share 250 (which makes sense in that both are ocean-dwellers). For comparison-and-control purposes, I did a triplet match of Thiomicrospira against another chemoautotroph (an organism that gets energy from inorganic chemicals, and carbon from CO2), namely Methanothermobacter marburgensis. There were only 53 syntenic triplets in common between the two chemoautotrophs. (Between Wolbachia and Methanothermobacter, on the other hand, there were only 3 triplet-matches.) Doing a match between two Wolbachia species (a mosquito-dwelling variety and a fruit-fly-dwelling cousin) produced 522 gene matches in syntenic triplets.

It seems reasonable to me, based not just on the previous sorts of analysis but also direct inspection of the genomes (in terms of their respective protein products), that Thiomicrospira evolved from PelagibacterPelagibacter is the most abundant life form in the ocean, and perhaps the oldest. Pelagibacter is also very mitochondria-like, and so is Thiomicrospira, which has rhodanese-like proteins, the full cytochrome system, redox enzymes, citric-acid-cycle enzymes, plus certain characteristic membrane and sensor proteins, flippases, etc. (For what it's worth, Thiomicrospira has the highest signal-transduction profile I've ever seen at http://mistdb.com, again making it very mitochondrial-feeling.)

I'm tempted to say, similarly, that Thiomicrospira and Wolbachia are related. They have phage proteins in common. They both have genes for patatin proteins. They share multiple drug resistance genes. (That's not so strange. Antibiotics occur naturally in the environment.) They share genes for Flp-type pilins. Plus many more coincidences, big and small.

At first blush, a deep-sea thermal vent seems pretty far removed, environmentally, from the egg cell of a mosquito. How to reconcile the difference? Actually, I see similarities. Thiomicrospira thrives at temperatures of 28 to 32 degrees Celsius (which is also true of mosquitoes, although they prefer the 28-degree end of the scale). And blood (the preferred food source for mosquitoes) is comparable in pH and salinity to seawater. Also, mosquitoes have an aquatic lifecycle: they require brackish water in which to lay eggs. Mosquitoes and salt marshes go back millions of years.

It's even possible that Wolbachia might live in deep-sea-vent-dwelling host organisms. In fact, I predict they will be found there. Why? Because in addition to inhabiting flying insects, spiders, mites, and ticks (and filarial worms), Wolbachia have also been found in a very high percentage of crustaceans. We know that crustaceans are often found living near deep-sea thermal vents; and many crustaceans show the characteristic feminization of genetic males that's so often the tipoff to a massive Wolbachia presence in insect populations.

Insects and crustaceans represent two of the oldest, most successful, and most widely distributed life forms of the animal kingdom. Would it really be so surprising if the bacteria that colonize these life forms are closely related to the most common marine bacteria on the planet? I don't think so. Stranger things have happened.



reade more... Résuméabuiyad