Pages

.

Showing posts with label evolution. Show all posts
Showing posts with label evolution. Show all posts

What Came Before 'RNA World'?

I go to bed sometimes wondering what early earth was like. I try to imagine how it's possible that life could have arisen when this planet was perhaps only 1% of its current age, barely cool enough for the oceans not to boil off.

It's generally understood that life originated around 3.8 billion years ago in tide pools, swamps, lakes, or possibly the deep ocean, while organic molecules rained down from lightning-filled skies heavy with pyroclastic gases. This is the so-called Primordial Soup Theory of Haldane and Oparin, given experimental weight by Miller and Urey. It leaves open rather a lot of important details, but clearly implies that biopoiesis arose in an aqueous phase through interaction of co-solutes.

Did life begin in, under, or near hydrothermal vents?
Some researchers believe serpentinite rock structures
associated with white chimneys could have provided
pH gradients suitable for biopoesis.
From a chemical standpoint, the characteristic defining feature of life is catalysis; in particular, the catalytic formation of catalysts that catalyze their own formation. In the standard Crick dogma of DNA -> RNA -> protein, we leave undrawn the many monomer/protein interactions that lead back to DNA. Nevertheless, it's clear that 85% to 90% of proteins and 10% to 15% of RNA molecules play mainly catalytic roles in cell chemistry.

For precisely this reason, aqueous-phase Soup Theory should probably be reconsidered. Any chemist will tell you that surface catalysis and phase boundary catalysis are orders of magnitude more effective than pure liquid-phase catalysis. This is why catalytic converters on cars are not giant bongs with fluid in them but instead contain a ceramic honeycomb core overlaid with a solid-phase platinum-palladium washcoat. It is also why the largest industrial catalytic operations (including fluid catalytic cracking of petroleum oil, which is fluid only in terms of the flow of ingredients; the catalyst itself is a solid powder) employ surface catalysis. Indeed, catalysts are often used in powdered, sintered, or coated-bead form specifically to maximize surface area. In living cells, enzymes are only partially solvated (interior portions are typically hygrophobic), and most enzymes can in fact be imagined as solid fixtures onto which reactants are adsorbed. (Surely no one thinks of ribosomes as being "in solution" in the way that, say, a sodium ion is in solution.) Surface catalysis characterizes living systems as well as industrial processes.

We also know that crowding effects are important in controlling enzyme shape and activity, and in the absence of crowding, some enzymes tend to partially unfold. Indeed, it seems likely molecular confinement has (to some extent) driven the evolution of protein primary and tertiary structure. Some would argue that biological macromolecules resembling those of today could not reasonably have arisen in a confine-free aqueous phase and that (therefore) the proto-biotic "soup" envisioned by Oparinn and Haldane is unlikely to have produced cellular life. Some say it's much more likely that biopoiesis began in an environment of solvated clay particles, serpentine rock near hydrothermal vents, or (perhaps) a feldspar lattice of some kind. A colloid (such as clay) offers many advantages. For a clay to be a clay, particles must be no larger, on average, than 2 microns. This is a perfect substrate size for growth of loosely bound biological macromolecules. Such particles offer a huge amount of surface area per unit volume, much more than could be realized through, say, the attachment of catalytic foci to sheets of silica-laden rock.

Such is the state of our ignorance on biopoiesis that there's still no clear agreement on whether proteins appeared first, or nucleic acids (or perhaps biologically active lipids). The jury is still out. The so-called RNA World theory has gained a tremendous following in the last 30 years, based in part on work by Cech and Altman showing that RNA is capable of catalyzing protein formation by itself.  But a fundamental unanswered problem in RNA World theory is how pyrimidines, purines, or other monomers managed to link up with sugars and then form the first RNA molecules in the absence of a suitable catalyst. (RNA can catalyze the formation of RNA, but how did the first RNA-like oligomer arise, without a catalyst?) Pyrimidines and purines are not known to spontaneously bind to ribose, much less form phosphorylated nucleotides, on their own. By contrast, amino acids can easily condense to form dipeptides, and dipeptides can catlyze the formation of other peptides. (For example, the dipeptide histidyl-histidine has been shown to catalyze the formation of polyglycine in wet-dry cycled clay.) Thus, it's at least plausible that proteins came first.

Ironically, abiotic formation of purines and pyrimidines is not, in itself, an insurmountable problem, provided we accept that hydrogen cyanide and formaldehyde were present in the primordial "soup." (Both HCN and formaldehyde have been produced with good yields in spark-discharge experiments involving diatomic nitrogen, CO2, water, and hydrogen. Even in the absence of molecular hydrogen, the yield of HCN and H2CO can approach 2%.) HCN undergoes a base-catalyzed tetramerization reaction to produce diaminomaleonitrile (DAMN), which, with the aid of u.v. light, can go on to yield a variety of purines. Acid hydrolysis of the HCN oligomers thus produced can lead (somewhat circuitously) to pyrimidines.

Abiotic formation of sugars is also possible if formaldehyde is present. Condensation of formaldehyde in the presence of calcium carbonate or alumina yields glycoaldehyde, which can begin a cascade of aldol condensations and enolizations that produce a formidable array of trioses, tetroses, pentoses, and higher sugars via Butlerow chemistry (also called the formose reaction).

The greatest problem with RNA World theory thus isn't the ab initio creation of bases or sugars, but rather their attachment to one another. In current biologic systems, pyrimidines are attached to sugars by displacement of pyrophosphate at the sugar's C1 position (something that has not succeeded in the lab under prebiotic conditions). In living systems, purine nucleosides are created by piecing together the purine base on a preexisting ribose-5-phosphate. It's hard to see how that could occur abiotically.

It's worth noting, too, that while spontaneous creation of sugars and bases can occur through condensations and other reactions, the result would not simply be just the riboses and purines and pyrimidines seen today; rather, there would arise a zoo of different products, including all the stereoisomers of such products. (There are, among the pentoses alone, twelve different possible stereoisomers.) Somehow, early systems would have to have converged on just the sugars, just the bases, and just the isomers of them needed to promulgate living systems.

Not that an abundance of isomers is a bad thing. Maybe pre-cellular "miasmal" life actually comprised a remarkable zoo of thousands (or hundreds of thousands) of potential biomolecular precursors, of which only the most catalytogenic survived. If muds and clays offered the particle substrates on which these molecules were formed, one can imagine that sticky molecules (those with the power to adhere tenciously to clay particles, sealing them off from other, competing molecules) would have eventually won control over the means of catalysis. This would have meant micron-sized clay particles covered over with what would today be called nonsense proteins: ad-hoc polypeptides made of whatever amino acids (and other reactive species) might most easily polymerize.

What might these nonsense proteins have been capable of? In a Shakespeare-monkey typing pool world, any kind of protein is possible, subject only to steric hindrance, crowding effects, and the laws of chemistry. It seems likely that a one-micron clay particle coated with Shakespeare-monkey proteins would expose, if only by accident, hundreds of thousands of active sites of various kinds, creating catalytic opportunities of exactly the sort needed to take chemical evolution to the next stage.

Some enterprising 21st-century Urey or Miller needs to affix tens or hundreds of thousands of nonsense proteins to hundreds of thousands (or better, millions) of clay particles, soak it all in monomers of various kinds (amino acids, sugars, bases, lipids), and see what comes out. Experiments need to be done with activated colloids of various kinds, using temperature cycling as an energy source, using (and not using) oxidizing and reducing agents, with and without wet/dry cycling, with and without freezing and thawing, electrical energy, etc. We need to focus our efforts on what came before RNA World, what life was like before there were templates, before there was a genetic code, before Crick dogma. What were proteins like before the invention of the start codon or the stop codon? (Was protein size determined by Brownian dynamics? Reactant exhaustion? Molecular crowding? Intervention by chaperones or proteases?) What kinds of "protein worlds" might have existed under acidic conditions? Basic conditions? High redox-potential conditions? High or low temperature conditions? Phosphate-rich (or -poor) conditions? Repeat all of the above with and without u.v. light. With and without pyroclastic gases. With and without lightning. With and without cosmic rays. With and without adenylated coenzymes.

Experiments are waiting to be done—by the thousands—in vitro, in silico, in lutum.
reade more... Résuméabuiyad

Do-It-Yourself Phylogenetic Trees

I've been doing a lot of desktop science lately, and I'm happy to report that superb, easy-to-use online tools exist for creating your own phylogenetic trees based on gene similarities, something that's non-trivial to implement yourself.

The other day, I speculated that the fruit-fly Ogg1 gene, which encodes an enzyme designed to repair oxidatively damaged guanine residues in DNA, might derive from Archaea. The Archaea (in case you're not a microbiologist) comprise one of three super-kingdoms in the tree of life. Basically, all life on earth can be classified as either Archaeal, Eukaryotic, or Eubacterial. The Eubacteria are "true bacteria": they're what you and I think of when we think "bacteria." (So, think Staphylococcus and tetanus bacteria and E. coli and all the rest.) The Eukaryota are higher life forms, starting with yeast and fungi and algae and plankton, progressing up through grass and corn and pine trees, worms and rabbits and donkeys, all the way to the highest life form of all, Stephen Colbert. (A little joke there.) Eukaryotes have big, complex cells with a distinct nucleus, complex organelles (like mitochondria and chloroplasts), and a huge amount of DNA packaged into pairs of chromosomes.

Archaea look a lot like bacteria (they're tiny and lack a distinct nucleus, organelles, etc.), and were in fact considered bacteria until recently. But around the turn of the 21st century, Carl Woese and George E. Fox provided persuasive evidence that members of this group of organisms were so different in genetic profile (not to mention lifestyle) that they deserved their own taxonomic domain. Thus, we now recognize certain bacteria-like creatures as Archaea.

The technical considerations behind the distinction between bacteria and archeons are rather deep and have to do with codon usage patterns, ribosomal RNA structure, cell-wall details, lipid metabolism, and other esoterica, but one distinguishing feature of archeons that's easy to understand is their willingness to live under harsh conditions. Archaeal species tend to be what we call extremophiles: They usually (not always) take up residence in places that are incredibly salty, or incredibly hot, or incredibly alkaline or acidic.

While it's generally agreed that eukaryotes arose after Archaea and bacteria appeared, it's by no means clear whether Archaea and bacteria branched off independently from a common ancestor, or perhaps one arose from the other. (A popular theory right now is that Archaea arose from gram-positive bacteria and sought refuge in inhospitable habitats to escape the chemical-warfare tactics of the gram-positives.) A complication that makes studying this sort of thing harder is the fact that horizontal gene transfer has been known to happen (with surprising frequency, actually) across domains.

Is it possible to study phylogenetic relationships, yourself, on the desktop? Of course. One way to do it: Obtain the DNA sequences of a given gene as produced by a variety of organisms, then feed those gene sequences to a tool like the tree-making tool at http://www.phylogeny.fr. Voila! Instant phylogeny.

The Ogg1 gene is an interesting case, because although the DNA-repair enzyme encoded by this gene occurs in a wide variety of higher life forms, plus Archaea, it is not widespread among bacteria. Aside from a couple of Spirochaetes and one Bacteroides species, the only bacteria that have this particular gene are the members of class Clostridia (which are all strict anaerobes). Question: Did the Clostridia get this gene from anaerobic Archaea?

Using the excellent online CoGeBlast tool, I was able to build a list of organisms that have Ogg1 and obtain the relevant gene sequences, all with literally just a few mouse clicks. Once you run a search using CoGeBlast, you can check the checkboxes next to organisms in the results list, then select "Phylogenetics" from the dropdown menu at the bottom of the results list. (See screenshot.)


When you click the Go button, a new FastaView window will open up, containing the gene sequences of all the items whose checkboxes you checked in CoGeBlast. At the bottom of this FastaView window, there's a small box that looks like this:


Click Phylogeny.fr button (red arrow). Immediately, your sequences are sent to the French server where they'll be converted to a phylogenetic tree in a matter of one to two minutes (usually). The result is a tree that looks something like this:


I've color-coded this tree to make the results easier to interpret. Creating a tree of this kind is not without potential pitfalls, because for one thing, if your DNA sequences are of vastly unequal lengths, the groupings made by Phylogeny.fr are likely to reflect gene lengths more than true phylogeny. For this tree, I did various data checks to make sure we're comparing apples and apples. Even so, a sanity check is in order. Do the groupings make sense? They do, actually. At the very top of the diagram (color-coded in green) we find all the eukaryotes grouped together: fruit-fly (Drosophila), yeast (Saccharomyces), fungus (Aspergillus). At the bottom of the diagram, Clostridium species (purplish red) fall into a subtree of their own, next to a tiny subtree of Methoanobrevibacter. This actually makes a good deal of sense, because the two Methanobrevibacter species shown are inhabitants of feces, as are the nearby Clostridium bartletti and C. diff. The fact that all the salt-loving Archaea members group together (organisms with names starting with 'H') is also indicative of a sound grouping. Overall, the tree looks sound.

If you're wondering what all the numbers are, the scale bar at the bottom (0.4) shows the approximate percentage difference in DNA sequences associated with that particular length of tree depth. The red numbers on the tree branches are indicative of the probability that the immediately underlying nodes are related. Probably the most important thing to know is that the evolutionary distance between any two leaves in the tree is proportional to the sums of the branch lengths connecting them. (The branch lengths are not explicitly specified; you have to eyeball it.) At the top of the diagram, you can see that the branch lengths of the two Drosophila instances are very short. This means they're closely related. By contrast, the branch lengths for Saccharomyces and the ancestor to Drosophila are long, meaning that these organisms are distantly related.

Just to give you an idea of the relatedness, I checked the C. botulinum Ogg1 protein amino-acid sequence against C. tetani, and found 63% identity of amino acids. When I compared C. botulinum's enzyme against C. difficile's, there was 52% identity. With Drosophila there is only 32% identity, and even that applies only to a 46% coverage area (versus 90%+ for C. tetani and C. diff). Bottom line, the Blast-wise relatedness does appear to correspond, in sound fashion, to tree-wise relatedness.

Two things stand out. One is that not all of the Clostridium species group together. (There's a small cluster of Clostridia near the salt-lovers, then a main branch near the methane-producing Archaea. The out-group of Clostridia near the salt-lovers happen to all have chromosomal G+C content of 50% or more, which makes them quite different from the rest of the Clositridia, whose G+C is under 30%.) The other thing that stands out is that it does appear as if Clostridial Ogg1 could be Archaeal in origin, based on the relationship of Methanoplanus and Methanobrevibacter to the main group of Clostridia. (Also, the C. leptum group's Ogg1 may share an ancestor with the halophilic Archaea.) One thing we can say for sure is that Ogg1 is ancient.

It's tempting to speculate that the eukaryotes obtained Ogg1 from early mitochondria, and that early mitochondria were actually Archaeal endosymbionts. The first part is easily true, because we know that early mitochondria quickly exported most of their DNA to the host nucleus. (Today's mitochondrial DNA is vestigial. Well over 90% of mitochondrial genes are actually in the host nucleus. Things like mitochondrial DNA polymerase have to be transcribed from nucleus-generated RNA.) Whether or not early mitochondria were Archaeal endosymbionts, no one knows.

Anyway, I hope this shows how easy it is to generate phylogenetic trees from the comfort of a living room sofa, using nothing more than a laptop with wireless internet connection. Try making your own phylo-trees using CoGeBlast and Phylogeny.fr—and let me know what you find out.
reade more... Résuméabuiyad

An Example of Antisense Proteogenesis?

The question of how organisms develop entirely new genes is one of the most important open questions in biology. One possibility is that new genes often develop through accidental translation of antisense strands of DNA.

An example of this can be seen with the S1 protein of the 30S bacterial ribosome. If you take the amino-acid sequence for an S1 gene and use it as the query sequence in a blast-p (protein blast), you'll mostly get back hits on other S1 proteins, but you'll also get minor (low-fidelity) hits on polynucleotide phosphorylase. Why? When you do a blast search, the search engine, by default, looks at both DNA strands of target genes (sense and antisense strands) to see if there's a potential sequence match with the query. If there's a match on the antisense strand, it will be reported along with "sense" matches. In the case of the S1 protein, blast-p searches often report weak antisense hits on polynucleotide phosphorylase in addition to strong sense hits on ribosomal S1.

Ribosomal proteins are, of course, among the most highly conserved proteins in nature. It turns out that polynucleotide phosphorylase (PNPase) is very highly conserved as well. It's an enzyme that occurs in every life form (bacteria, fungi, plants, animals), absent only in a scant handful of microbial endosymbionts that have lost the majority of their genes through deletions. While the chemical function of PNPase is well understood (it catalyzes the interconversion of nucleoside diphosphates to RNA), its physiologic purpose is not well understood, although recent research shows that PNPase-knockout mutants of E. coli exhibit lower mutation rates. (Hence, PNPase may actually be involved in generating mutations.)

The bacterium Rothia mucilaginosa, strain DY18, has a (putative) PNPase gene at a genome offset of 1277514. When this gene is used as the query for a blast-p search, the hits that come back include many strong matches for the S1 ribosomal proteins of various organisms. By "strong match," I mean better than 80% sequence identity coupled with an E-value (expectation value) of zero. (Recall that the E-value represents the approximate odds of the match in question happening due to random chance.

If we use the Genome Viewer at genomevolution.org to look at the PNPase gene of Rothia mucilaginosa, we see something extraordinarily peculiar (look carefully at the graphic below). Click to enlarge the following image, or better yet, to see this genome view for yourself, go to this link.

Notice the presence of overlapping sense and antisense open reading frames on a portion of DNA from Rothia mucilaginosa. The top reading frame contains the gene for polynucleotide phosphorylase. The lower (-1 strand) reading frame contains ribosomal S1. To see this in your own browser, go to this link.

Notice that there are overlapping genes. On the top strand is the gene for PNPase; on the bottom strand, in the same location, is a gene for ribosomal S1. These are bidirectionally overlapping open reading frames, something occasionally encountered in virus nucleic acids but rarely seen in bacterial or other genomes.

How do we explain this anomaly? It could be just that: an anomaly, two open reading frames that happen to overlap (but that aren't necessarily translated in vivo). Or it could be that at some point, many millions of years ago, the ribosomal S1 gene of a Rothia ancestor was erroneously translated via the antisense strand, producing a protein with PNPase characteristics. We don't know why PNPase confers survival value (its physiologic purpose is not fully understood), but we do know, with a fair degree of certainty, that PNPase does, in fact, confer survival value—because every organism, at every level of the tree of life, has at least one copy of PNPase. Once Rothia's ancestor, through whatever process, opened up a reading frame on the antisense strand of ribosomal S1, the reading frame stayed open, because it conferred survival value. In this way, the first Rothia PNPase was born. (Arguably.)

At some point in its history, Rothia duplicated its PNPase gene and placed a new copy at genome offset 1650959. Over time, this second copy diverged from the original copy, becoming more like E. coli PNPase (which is also to say, less S1-like). Rothia's second PNPase shows a blast-p similarity of 45% (in terms of AA identities) to E. coli PNPase, with E-value 4.0e-147. It shows a blast-p similarity of 26% (AA identities) with E. coli ribosomal S1 (E-value: 4.0e-17). Neither E. coli PNPase nor Rothia PNPase-2 overlaps an S1 gene. However, both are colocated with the ribosomal S15 protein gene. And you'll find (if you look at lots of bacterial genomes) that PNPase is almost always located immediately next to an S15 ribosomal gene.

Rothia PNPase is an example of an enzyme that may very well have started out as an antisense copy of another protein (the S1 ribosomal protein). Of course, the mere presence of bidirectionally overlapping open reading frames doesn't prove that both frames are actually transcribed and translated in vivo. But the fact that blast-p searches using PNPase as the query almost always turn up faint S1 echoes (in a wide variety of organisms) is highly suggestive of an ancestral relationship between the two proteins.

reade more... Résuméabuiyad

Thoughts on New Gene Origination

The other day, I wrote a damning critique of Darwin's theory and offered nothing in the way of a positive alternative to the traditional view of accumulated-point-mutations as a driving force for evolution. It's easy to take potshots at someone else's theory and walk away. As a rule, I don't like naysayers who criticize something, then offer nothing in return. So I'd like to take a moment to try to offer a different perspective on evolution. In particular, I'd like to offer my own theory as to how new genes arise.

The question of where new genes comes from is, of course, one of the foremost open problems in biology. Current theory revolves mostly around gene duplication followed by modification of the duplicated gene (via mutations and deletions) under survival pressure [reference 4 below]. Gene fusion and fission have also been proposed as mechanisms for gene origination [3]. In addition, genes derived from noncoding DNA have recently been described in Drosophila [1]. Likewise, transposons (genes that jump from one location to another) have been implicated in gene biogensis [3].

The problem with these theories is that various enzymes are required in order for duplication, transposition, fusion, fission, etc., to occur (to say nothing of transcription, translation initiation, translation elongation, and so on), and existing theories don't explain how these participating enzymes appeared, themselves, in the first place. A fully general theory has to start from the assumption that in pre-cellular, pre-chromosomal, pre-organismic times, genes (if they existed) may have occurred singly, with multiple copies arising through non-enzymatic replication. Likewise, we should assume that early protein-making machinery was probably non-enzymatic, which is to say entirely RNA-based (i.e., ribozymal). If the idea of catalytic RNA is new to you or sounds unreasonably farfetched, please review the 1989 Nobel Prize research by Altman and Cech.

The fundamental mechanisms of de novo gene creation available in pre-enzymatic times might well have been nothing more than ribozymal duplication of nucleic acid sequences followed by erroneous translation. "Erroneous translation" can be of two fundamental types: frameshifted translation, and reverse translation. (Reverse translation here means transcription of the antisense strand of DNA and subsequent translation to a polypeptide.)

DNA is parsed 3 bases at a time (the 3-base combinations are called codons; each codon corresponds to an amino acid). If a single base is spuriously added to, or deleted from, a gene, the reading frame is disrupted and a hugely different amino-acid sequence results. This is called a frameshift error or frameshift mutation.

Spurious addition or deletion of a single base to a free-floating piece of single-stranded genetic material (RNA or DNA) is all that's needed in order to cause frameshifted translation. The protein that results from a frameshift error is, of course, in general, vastly different from the original protein.

If pre-organismic nucleic acids were single-stranded, then reverse translation would require 3'-to-5' reading of the nucleic acid as well as 5'-to-3' reading. If, on the other hand, early nucleic acids were double-stranded, then 5'-to-3' (normal direction) translation of each strand would suffice to give one normal and one reverse translation product. (Note for non-biologists: In all known current organisms, reading of DNA and RNA takes place in the 5'-to-3' direction only.)

Nucleic acids (RNA and DNA) have directionality, defined by the orientation of sugar backbone molecules in terms of their 5' and 3' carbons.

It's interesting to speculate on the role of reverse translation in production of novel proteins, especially as it applies to early biological systems. We don't know if early systems relied on triplet codons (or even if all four bases—guanine, cytosine, adenine, thymine—existed from the beginning). We also don't know if there were 20 amino acids in the beginning. There may have been fewer (or more).

A novel possibility is that early triplet codons were palindromic (giving identical semantics when read in either direction). There are 16 palindromic codons in the codon lexicon (AGA, GAG, CAC, ACA, ATA, TAT, AAA, and so on) which today encode 15 amino acids out of the 20 commonly used. In a palindromic-codon world, the distinction between "sense" and "antisense" nucleic acid sequences vanishes, because a single-stranded gene made up of palindromic codons could be translated in either direction to give a polypeptide with the same sequence, the only chirality arising from N- to C-terminal polarity. For example, the sequence GGG-CAC-GCG-AAA would give a polypeptide of glycine-histidine-alanine-lysine whether translated forward or backward, the only difference being that the forward version would have glycine at the N-terminus whereas the reverse version would have glycine at the C-terminus. The secondary and tertiary structures of the two versions would be the same. As long as catalytic function didn't directly depend on an amino or carboxy terminus of an end-acid, the two proteins would also be functionally indistinguishable.

Codon palindromicity is potentially important in any system in which single-stranded genes are bidirectionally translated, because in the case where a gene does happen to rely heavily on palindromic codons, the reverse-translated product will (for the reasons just explained) have the potential to be functionally paralogous to the forward-translated product (to an extent matching the extent of palindromic-codon usage). But this assumes that in early organisms (or pre-organismic soups), single-stranded genes could be translated in the 5'-to-3' direction or the 3'-to-5' direction.

It turns out modern organisms differ markedly in the degree to which they use palindromic codons, and there are (remarkably) some prokaryotes whose genes use an average of ~40% palindromic codons. The complementary strand of DNA would, of course, contain palindromic complements: AGA opposite TCT, CCC opposite GGG, etc.

All of this makes for interesting conjecture, but does any of it really apply to the natural world? For example: Do organisms actually employ strategies of "erroneous translation" in creating new proteins? Did today's microbial meta-proteome arise through mechanisms involving frameshifted and/or reverse translation? Is there any evidence of such processes, one way or the other? Tomorrow I want to continue on this theme, presenting a little data to back up some of these strange ideas. Please join me; and bring a biologist-friend with you!


References
1. Begun, D., et al. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176, 1131–1137 (2007).
2. Fechotte, C., & Pritham, E. DNA transposons and the evolution of eukaryotic genomes. Annual Review of Genetics 41, 331–368 (2007)
3. Jones, C. D., & Begun, D. J. Parallel evolution of chimeric fusion genes. Proceedings of the National Academy of Sciences 102, 11373–11378 (2005).
4. Ohno, S. Evolution by Gene Duplication (Springer-Verlag, Berlin, 1970).
reade more... Résuméabuiyad

The Trouble with Darwin

As a biologist, I find Darwin's theory hugely disappointing. It's better than the alternative (which is to believe in magic, basically), but not by much, sadly.
Charles Darwin died before Mendel
proved the existence of genes
.

As scientific theories go, the theory of evolution is easily the weakest of all major scientific theories. It's a commendable piece of work in its ability to stir discussion, but terrible in most other ways.

To be useful, a scientific theory has to do a minimum of two things: explain what can be observed, and provide testable predictions. Darwin's theory is weak on the first count and useless on the second.

Evolutionary theory explains practically nothing, because every explanation of the theory is rooted in "survival of the fittest," which is a circular notion, utterly content-free. "Fittest" means most able to survive. Survival of the fittest means survival of those who survive.

Ironically, Darwin's landmark work was called On the Origin of Species. Yet it doesn't actually explain speciation, except in the most vacuous and speculative of terms. Of course, we can't set too high an expectation for Darwin, since he didn't live to see the publication of Mendel's work (the word "genetics" wouldn't exist until more than 20 years after Darwin's death), but still. Speciation is portrayed by Darwin as the outcome of the accumulation of small, gradual changes. That's all the explanation he offers.

But the explanation is wrong. Or at least it doesn't accord well with the facts. It doesn't explain the Cambrian Explosion, for example, or the sudden appearance of intelligence in hominids, or the rapid recovery (and net expansion!) of the biosphere in the wake of at least five super-massive extinction events in the most recent 15% of Earth's existence.

One of the most frustrating aspects of evolutionary theory (this is no fault of the theory's, though) is that it is so hard to test in the laboratory. The fact is, no one has ever seen speciation happen in the laboratory, under repeatable conditions, and until that happens we're at a distinct disadvantage for understanding speciation. (Incidentally, I don't count plant hybridization or breeding anomalies in fruit flies whose sexuality is under the control of microbial endosymbionts as examples of speciation.)

When I was in school, we were taught that mutations in DNA were the driving force behind evolution, an idea that is now thoroughly discredited. The overwhelming majority of non-neutral mutations are deleterious (they reduce, not increase, survival). Most mutations lead to loss of function (this is easily demonstrated in the lab), not gain of function. Evolutionary theory is great at explaining things like the loss of eyesight by cave-dwelling creatures (e.g., bats). It's terrible at explaining gain of function.

Even if mutations were capable of driving evolution, they simply don't happen fast enough to account for observed rates of speciation. In bacteria, the measured rate of 16S rRNA divergence due to point mutations is only 1% per 50 million years. And yet, there were no flowering plants on earth as recently as 150 million years ago! Does it take a biologist to see the disconnect?

I bring all this up because I've spent some time recently doing genomics research aimed at exploring mechanisms for new-protein creation/differentiation (mechanisms not relying wholly nor even mainly on point mutations), and I wanted to set the stage for discussing that research here. Over the next week or so, I'll be presenting some new ideas and findings. Hopefully, we can put some much-needed flesh on Darwin by exploring testable notions of how new protein motifs can arise quickly (without reliance on magic).

reade more... Résuméabuiyad

Hydrogen Peroxide Powers Evolution

I'm about to offer a conjecture that is a bit preposterous-sounding but could well hold true. I actually think it does.

I propose that evolution, at the level of bacteria (though probably not at higher levels), is driven by hydrogen peroxide.

This theory rests on three assumptions: One is that the creation of new bacterial species happens almost entirely via lateral gene transfer, not heritable point-mutations. Secondly, bacteria (marine and terrestrial) are regularly exposed to challenges by hydrogen peroxide in the environment. Thirdly, those challenges drive lateral gene transfer.

Evidence for the first assumption is embarrassingly abundant. If you're not up to speed on the subject, I suggest you read the excellent paper, "Lateral Gene Transfer," by Olga Zhaxybayeva and W. Ford Doolittle in Current Biology, April 2011, 21:7, pp. R242-246 (unlocked copy here). It's now common to find that any given bacterial species can trace a good percentage of its protein base to "ancestors" that are too far removed horizontally to be ancestors in the conventional sense.

Consider E. coli. There are hundreds of strains of E. coli, with genes ranging in number from 4,100 to about 5,300 per strain. The problem is, the various strains of E. coli have only about 900 genes in common (and that's far too few genes to render a fully functional E. coli). The E. coli pan-genome actually takes in more than 15,000 gene families, total. Certainly, you can draw a family tree of E. coli based on 16S ribosomal polymorphisms, but that doesn't explain where the 15,000 pan-genome genes came from. The "family tree" metaphor quickly breaks down if you start drawing trees based on proteins. You get many conflicting trees—all of them correct.

Trees like this are fiction where bacteria are concerned.
The tree of life is more like a net of life or web
of life than a directed acyclic graph.
Where are all of the genes coming from? Other species, of course. They arrive by way of mechanisms like transformation, transduction, and conjugation. all of which allow direct entry of foreign DNA into a bacterial cell. At one time it was thought that conjugation could only occur between bacteria of the same species, but it is now known that cross-species conjugation also occurs (as, for example, between E. coli and Streptomyces or Mycobacterium).

Transduction, which is where viruses package up an infected host's genes in virus capsules that are then taken up by another cell, occurs naturally in bacterial populations in response to environmental factors like ultraviolet light and hydrogen peroxide. Exposure of a virus-carrying (lysogenic) cell to UV light or peroxide can induce runaway production of virus, and in fact this mechanism is used by Streptococcus to kill competitive Staphylococcus cells, in a clever bit of chemical warfare. It's been known for years that hydrogen peroxide can cause many types of bacteria to shed DNA. Now we know why: Hydrogen peroxide is a signalling molecule. It signals (among other things) lysogenic bacteria to go into a lytic cycle. It also signals cells to mount what's known as the SOS response, which is a global response to oxidative challenge. Years ago, Bruce Ames and his colleagues showed that exposing Salmonella to very dilute (60 micromolar) hydrogen peroxide caused the cells to differentially express 30 "SOS" proteins, including heat-shock proteins and low-fidelity DNA-repair systems. We know that hydrogen peroxide as dilute as 0.1 micromolar can induce phage (virus) production in up to 11% of marine bacteria. This is significant, because rainwater contains hydrogen peroxide in concentrations of 2 to 40 micromolar, and ocean water has been known to reach millimolar levels of H2O2 after a rain storm.

If you're wondering why rain contains hydrogen peroxide, the peroxide gets there in two ways. One is UV-frequency photochemistry (where water is cleaved to H and OH, then reforms as H2 and H2O2); the other is via ionization reactions caused by lightning. (Lightning is energetic enough to bring airborne oxygen and water to a plasma state. The resulting ionization and rearrangement of free atoms yields a certain amount of hydrogen peroxide.) The presence of H2O2 in rainwater has been confirmed many times, and in fact there's a well-preserved "fossil record" of it in polar icepacks, going back centuries. (Polar snowpacks contain from 10 to 900 ppb of H2O2; it varies seasonally, the max coming in summer.)

Bottom line, every rain event (over land, over sea) constitutes a hydrogen peroxide challenge for microbes. Which induces viral transduction (and a release of whole-cell DNA through lysis, some of which will be inevitably be used in transformation). It also induces low-fidelity DNA repair (which is guaranteed to help evolution along). Every rain event, in other words, is a chance for evolution to do its thing. For bacteria, that means gene-sharing within and across species lines.
Darwin's theory of a tree-like ancestor basis
for all living things is dead wrong, at
least for bacteria.
W. Ford Doolittle (who wrote a classic book chapter about lateral gene transfer called "If the Tree of Life Fell, Would We Recognize the Sound?") estimates that if a horizontal gene transfer occurs once every ten billion vertical replications, "it would be enough to ensure that no gene in any modern genome has an unbroken history of vertical descent back to some hypothetical last universal common ancestor." (See this article.)

It's obvious (to me, at least) that every rain event carries with it the potential to cause far more gene transfers than are necessary (according to Doolittle) to make vertical inheritance fade into insignificance as an evolutionary bringer of change. The hydrogen peroxide in rain has been driving lateral gene transfer in bacteria for eons. In fact, it is arguably the dominant driver of evolution in bacteria.

Sorry, Mr. Darwin. Point mutations handed down to sons and daughters just isn't cutting it.
reade more... Résuméabuiyad