Pages

.

Showing posts with label genetics. Show all posts
Showing posts with label genetics. Show all posts

Decrypting DNA

In a previous post ("Information Theory in Three Minutes"), I hinted at the power of information theory to gage redundancy in a language. A fundamental finding of information theory is that when a language uses symbols in such a way that some symbols appear more often than others (for example when vowels turn up more often than consonants, in English), it's a tipoff to redundancy.

DNA is a language with many hidden redundancies. It's a four-letter language, with symbol choices of A, G, C, and T (adenine, guanine, cytosine, and thymine), which means any given symbol should be able to convey two bits' worth of information, since log2(4) is two. But it turns out, different organisms speak different "dialects" of this language. Some organisms use G and C twice as often as A and T, which (if you do the math) means each symbol is actually carrying a maximum of 1.837 bits (not 2 bits) of information.

Consider how an alien visitor to earth might be able to use information theory to figure out terrestrial molecular biology.

The first thing an alien visitor might notice is that there are four "symbols" in DNA (A, G, C, T).

By analyzing the frequencies of various naturally occurring combinations of these letters, the alien would quickly determine that the natural "word length" of DNA is three.

There are 64 possible 3-letter words that can be spelled with a 4-letter alphabet. So in theory, a 3-letter "word" in DNA should convey 6 bits worth of information (since 2 to the 6th power is 64). But an alien would look at many samples of earthly DNA, from many creatures, and do a summation of -F * log2(F) for every 3-letter "word" used by a given creature's DNA (where F is simply the frequency of usage of the 3-letter combo). From this sort of analysis, the alien would find that even though 64 different codons (3-letter words) are, in fact, being used in earthly DNA, in actuality the entropy per codon in some cases is as little as 4.524 bits. (Or at least, it approaches that value asymptotically.)

Since 2 to the 4.524 power is 23, and since proteins (the predominant macromolecule in earthly biology) are made of amino acids, a canny alien would surmise that there must be around 23 different amino acids; and earthly DNA is a language for mapping 3-letters words to those 23 amino acids.

As it turns out, the genetic code does use 3-letter "words" (codons) to specify amino acids, but there are 20 amino acids (not 23), with 3 "stop codons" reserved for telling the cell's protein-making machinery "this is the end of this protein; stop here."

E. coli codon usage.
The above chart shows the actual codon usage pattern for E. coli. Note that all organisms use the same 3-letter codes for the same amino acids, and most organisms use all 64 possible codons, but the codons are used with vastly unequal frequencies. If you look in the upper right corner of the above chart, for example, you'll see that E. coli uses CTG (one of the six codons for Leucine) far more often than CTA (another codon for Leucine). One of the open questions in biology is why organisms favor certain synonymous codons over others (a phenomenon called codon usage bias).

While DNA's 6-bit codon bandwidth permits 64 different codons, and while organisms do generally make use of all 64 codons, the uneven usage pattern means fewer than 6 bits of information are used per codon. To get the actual codon entropy, all you have to do is take each usage frequency and calculate -F * log2(F) for each codon, then sum. If you do that for E. coli, you get 5.679 bits per codon. As it happens, E. coli actually does make use of almost all the available bandwidth (of 6 bits) in its codons. This turns out not to be true for all organisms, however.
reade more... Résuméabuiyad

Hydrogen Peroxide Powers Evolution

I'm about to offer a conjecture that is a bit preposterous-sounding but could well hold true. I actually think it does.

I propose that evolution, at the level of bacteria (though probably not at higher levels), is driven by hydrogen peroxide.

This theory rests on three assumptions: One is that the creation of new bacterial species happens almost entirely via lateral gene transfer, not heritable point-mutations. Secondly, bacteria (marine and terrestrial) are regularly exposed to challenges by hydrogen peroxide in the environment. Thirdly, those challenges drive lateral gene transfer.

Evidence for the first assumption is embarrassingly abundant. If you're not up to speed on the subject, I suggest you read the excellent paper, "Lateral Gene Transfer," by Olga Zhaxybayeva and W. Ford Doolittle in Current Biology, April 2011, 21:7, pp. R242-246 (unlocked copy here). It's now common to find that any given bacterial species can trace a good percentage of its protein base to "ancestors" that are too far removed horizontally to be ancestors in the conventional sense.

Consider E. coli. There are hundreds of strains of E. coli, with genes ranging in number from 4,100 to about 5,300 per strain. The problem is, the various strains of E. coli have only about 900 genes in common (and that's far too few genes to render a fully functional E. coli). The E. coli pan-genome actually takes in more than 15,000 gene families, total. Certainly, you can draw a family tree of E. coli based on 16S ribosomal polymorphisms, but that doesn't explain where the 15,000 pan-genome genes came from. The "family tree" metaphor quickly breaks down if you start drawing trees based on proteins. You get many conflicting trees—all of them correct.

Trees like this are fiction where bacteria are concerned.
The tree of life is more like a net of life or web
of life than a directed acyclic graph.
Where are all of the genes coming from? Other species, of course. They arrive by way of mechanisms like transformation, transduction, and conjugation. all of which allow direct entry of foreign DNA into a bacterial cell. At one time it was thought that conjugation could only occur between bacteria of the same species, but it is now known that cross-species conjugation also occurs (as, for example, between E. coli and Streptomyces or Mycobacterium).

Transduction, which is where viruses package up an infected host's genes in virus capsules that are then taken up by another cell, occurs naturally in bacterial populations in response to environmental factors like ultraviolet light and hydrogen peroxide. Exposure of a virus-carrying (lysogenic) cell to UV light or peroxide can induce runaway production of virus, and in fact this mechanism is used by Streptococcus to kill competitive Staphylococcus cells, in a clever bit of chemical warfare. It's been known for years that hydrogen peroxide can cause many types of bacteria to shed DNA. Now we know why: Hydrogen peroxide is a signalling molecule. It signals (among other things) lysogenic bacteria to go into a lytic cycle. It also signals cells to mount what's known as the SOS response, which is a global response to oxidative challenge. Years ago, Bruce Ames and his colleagues showed that exposing Salmonella to very dilute (60 micromolar) hydrogen peroxide caused the cells to differentially express 30 "SOS" proteins, including heat-shock proteins and low-fidelity DNA-repair systems. We know that hydrogen peroxide as dilute as 0.1 micromolar can induce phage (virus) production in up to 11% of marine bacteria. This is significant, because rainwater contains hydrogen peroxide in concentrations of 2 to 40 micromolar, and ocean water has been known to reach millimolar levels of H2O2 after a rain storm.

If you're wondering why rain contains hydrogen peroxide, the peroxide gets there in two ways. One is UV-frequency photochemistry (where water is cleaved to H and OH, then reforms as H2 and H2O2); the other is via ionization reactions caused by lightning. (Lightning is energetic enough to bring airborne oxygen and water to a plasma state. The resulting ionization and rearrangement of free atoms yields a certain amount of hydrogen peroxide.) The presence of H2O2 in rainwater has been confirmed many times, and in fact there's a well-preserved "fossil record" of it in polar icepacks, going back centuries. (Polar snowpacks contain from 10 to 900 ppb of H2O2; it varies seasonally, the max coming in summer.)

Bottom line, every rain event (over land, over sea) constitutes a hydrogen peroxide challenge for microbes. Which induces viral transduction (and a release of whole-cell DNA through lysis, some of which will be inevitably be used in transformation). It also induces low-fidelity DNA repair (which is guaranteed to help evolution along). Every rain event, in other words, is a chance for evolution to do its thing. For bacteria, that means gene-sharing within and across species lines.
Darwin's theory of a tree-like ancestor basis
for all living things is dead wrong, at
least for bacteria.
W. Ford Doolittle (who wrote a classic book chapter about lateral gene transfer called "If the Tree of Life Fell, Would We Recognize the Sound?") estimates that if a horizontal gene transfer occurs once every ten billion vertical replications, "it would be enough to ensure that no gene in any modern genome has an unbroken history of vertical descent back to some hypothetical last universal common ancestor." (See this article.)

It's obvious (to me, at least) that every rain event carries with it the potential to cause far more gene transfers than are necessary (according to Doolittle) to make vertical inheritance fade into insignificance as an evolutionary bringer of change. The hydrogen peroxide in rain has been driving lateral gene transfer in bacteria for eons. In fact, it is arguably the dominant driver of evolution in bacteria.

Sorry, Mr. Darwin. Point mutations handed down to sons and daughters just isn't cutting it.
reade more... Résuméabuiyad