Pages

.

Showing posts with label Chargaff. Show all posts
Showing posts with label Chargaff. Show all posts

DNA Strand Asymmetry: More Surprises

The surprises just keep coming. When you start doing comparative genomics on the desktop (which is so easy with all the great tools at genomevolution.org and elsewhere), it's amazing how quickly you run into things that make you slap yourself on the side of the head and go "Whaaaa????"

If you know anything about DNA (or even if you don't), this one will set you back.

I've written before about Chargaff's second parity rule, which (peculiarly) states that A = T and G = C not just for double-stranded DNA (that's the first parity rule) but for bases in a single strand of DNA. The first parity rule is basic: It's what allows one strand of DNA to be complementary to another. The second parity rule is not so intuitive. Why should the amount of adenine have to equal the amount of thymine (or guanine equal cytosine) in a single strand of DNA? The conventional argument is that nature doesn't play favorites with purines and pyrimidines. There's no reason (in theory) why a single strand of DNA should have an excess of purines over pyrimidines or vice versa, all things being equal.

But it turns out, strand asymmetry vis-a-vis purines and pyrimidines is not only not uncommon, it's the rule. (Some call it Szybalski's rule, in fact.) You can prove it to yourself very easily. If you obtain a codon usage chart for a particular organism, then add the frequencies of occurrence of each base in each codon, you can get the relative abundances of the four bases (A, G, T, C) for the coding regions on which the codon chart was based. Let's take a simple example that requires no calculation: Clostridium botulinum. Just by eyeballing the chart below, you can quickly see that (for C. botulinum) codons using purines A and G are way-more-often used than codons containing pyrimidines T and C. (Note the green-highlighted codons.)


If you do the math, you'll find that in C. botulinum, G and A (combined) outnumber T and C by a factor of 1.41. That's a pretty extreme purine:pyrimidine ratio. (Remember that we're dealing with a single strand of DNA here. Codon frequencies are derived from the so-called "message strand" of DNA in coding regions.)

I've done this calculation for 1,373 different bacterial species (don't worry, it's all automated), and the bottom line is, the greater the DNA's A+T content (or, equivalently, the less its G+C content), the greater the purine imbalance. (See this post for a nice graph.)

If you inspect enough codon charts you'll quickly realize that Chargaff's second parity rule never holds true (except now and then by chance). It's a bogus rule, at least in coding regions (DNA that actually gets transcribed in vivo). It may have applicability to pseudogenes or "junk DNA" (but then again, I haven't checked; it may well not apply there either).

If Chargaff's second rule were true, we would expect to find that G = C (and A = T), because that's what the rule says. I went through the codon frequency data for 1,373 different bacterial species and then plotted the ratio of G to C (which Chargaff says should equal 1.0) for each species against the A+T content (which is a kind of phylogenetic signature) for each species. I was shocked by what I found:

Using base abundances derived from codon frequency data, I calculated G/C for 1,373 bacterial species and plotted it against total A+T content. (Each dot represents a genome for a particular organism.) Chargaff's second parity rule predicts a horizontal line at y=1.0. Clearly, that rule doesn't hold. 

I wasn't so much shocked by the fact that Chargaff's rule doesn't hold; I already knew that. What's shocking is that the ratio of G to C goes up as A+T increases, which means G/C is going up even as G+C is going down. (By definition, G+C goes down as A+T goes up.)

Chargaff says G/C should always equal 1.0. In reality, it never does except by chance. What we find is, the less G (or C) the DNA has, the greater the ratio of G to C. To put it differently: At the high-AT end of the phylogenetic scale, cytosine is decreasing faster (much faster) than guanine, as overall G+C content goes down.

When I first plotted this graph, I used a linear regression to get a line that minimizes the sum of squared absolute error. That line turned out to be given by 0.638 + [A+T]. Then I saw that the data looked exponential, not linear. So I refitted the data with a power curve (the red curve shown above) given by

G/C  = 1.0 + 0.587*[A+T] + 1.618*[A+T]2

which fit the data even better (minimum summed error 0.1119 instead of 0.1197). What struck me as strange is that the Golden Ratio (1.618) shows up in the power-curve formula (above), but also, the linear form of the regression has G/C equaliing 1.638 when [A+T] goes to 1.0. Which is almost the Golden Ratio.

In a previous post, I mentioned finding that the ratio A/T tends to approximate the Golden Ratio as A+T approaches 1.0. If this were to hold true, it could mean that A/T and G/C both approach the Golden Ratio as A+T approaches 1.0, which would be weird indeed.

For now, I'm not going to make the claim that the Golden Ratio figures into any of this, because it reeks too much of numerology and Intelligent Design (and I'm a fan of neither). I do think it's mildly interesting that A/T and G/C both approach a similar number as A+T approaches unity.

Comments, as usual, are welcome.
reade more... Résuméabuiyad

Shedding Light on DNA Strand Asymmetry

In 1950, Erwin Chargaff was the first to report that the amount of adenine (A) in DNA equals the amount of thymine (T), and the amount of guanine (G) equals the amount of cytosine (C). This result was instrumental in helping Watson and Crick (and Rosalind Franklin) determine the structure of DNA.

It's pretty easy to understand that every A on one strand of DNA pairs with a T on the other strand (and every G pairs with an opposite-strand C); this explains DNA complementarity and the associated replication model. But somewhere along the line, Chargaff was credited with the much less obvious rule that A = T and G = C even for individual strands of DNA that aren't paired with anything. This is the so-called second parity rule attributed to Chargaff, although I can't find any record of Chargaff himself having postulated such a rule. The Chargaff papers that are so often cited as supporting this rule (in particular the 3-paper series culminating in this report in PNAS) do not, in fact, offer such a rule, and if you read the papers carefully, what Chargaff and colleagues actually found was that one strand of DNA is heavier than the other (they label the strands 'H' and 'L', for Heavy and Light); not only that, but Chargaff et al. reported a consistent difference in purine content between strands (see Table 1 of this paper).

When I interviewed Linus Pauling in 1977, he cautioned me to always read the Results section of a paper carefully, because people will often conclude something entirely different than what the Results actually showed, or cite a paper as showing "ABC" when the data actually showed "XYZ."

How right he was.

At any rate, it turns out that the "message" strand of a gene hardly ever contains equal amounts of purines and pyrimidines. Codon analysis reveals that as genes become richer in A+T content (or as G+C content goes down), the excess of purines on the message strand becomes larger and larger. This is depicted in the following graph, which shows message-strand purine content (A+G) plotted against A+T content, for 1,373 distinct bacterial species. (No species is represented twice.)

Codon analysis reveals that as A+T content increases, message-strand purine content (A+G) increases. Each point on this graph represents a unique bacterial species (N=1373).

It's quite obvious that when A+T content is above approximately 33%, as it is for most bacterial species, the message strand tends to be comparatively purine-rich. Below A+T = 33%, the message strand becomes more pyrimidine-rich than purine-rich. (Note: In bacteria, where most of the DNA is in coding regions, codon-derived A+T content is very close to whole-genome A+T content. I checked the 1,373 species graphed here and found whole-chromosome A+T to differ from codon-derived A+T by an average of less than 7 parts in 10,000.)

The correlation between A+T and purine content is strong (r=0.85). Still, you can see that quite a few points have drifted far from the regression line, especially in the region of x = 0.5 to x = 0.7, where lots of points lie above y = 0.55. What's going on with those organisms? I decided to do some investigating.

First, some basics. Over time, transition mutations (AT↔GC) can change an organism's A+T content and thus move it along the x-axis of the graph, but transitions cannot move an organism higher or lower on the graph, because (by definition) transitions don't affect the strandwise purine balance.

Transversions, on the other hand, can affect strandwise purine balance (in theory, at least), but only if they occur more often on one strand of DNA than the other. (I should say: occur more often, or are fixed more often, on one strand versus the other.) For example, let's say G-to-T transversions are the most common kind of transversion (which is probably true, given that guanine is the most easily oxidized of the four bases and given the fact that failure to repair 8-oxoguanine lesions does lead to eventual replacement with thymine). And let's say G-to-T transversions are most likely to occur on the non-transcribed strand of DNA, at transcription time. (The non-transcribed strand is uncoiled and unprotected while transcription is taking place on the other strand.) Over time, the non-transcribed strand would lose guanines; they'd be replaced by thymines. The message strand, or RNA-synonymous strand (which is also the non-transcribed strand) would become pyrimidine-rich and the other strand would become purine-rich.

Unfortunately, while that's exactly what happens for organisms with A+T content below 33%, precisely the opposite happens (purines accumulate on the message strand) in organisms with A+T above 33%. And in fact, in some high-AT organisms, the purine content of message strands is rather extreme. How can we explain that?

One possibility is that some organisms have evolved extremely effective transversion repair systems for the message (non-transcribed) strand of genes—systems that are so effective, no G-to-T transversions go unrepaired on the message strand. The transcribed strand, on the other hand, doesn't get the benefit of this repair system, possibly because the repair enzymes can't access the strand: it's engulfed in transcription factors, topoisomerases, RNA polymerase, nearby ribosomal machinery, etc.

If the non-transcribed strand never mutates (because all mutations are swiftly repaired), then the transcribed strand will (in the absence of equally effective repairs) eventually accumulate G-to-T mutations, and the message strand will accumulate adenines (purines). Perhaps.

In the graph further above, you'll notice at x = 0.6 a tiny spur of points hangs down at around y = 0.5. These points belong to some Bartonella species, plus a Parachlamydia and another chlamydial organism. These are endosymbionts that have lost a good portion of their genomes over time. It seems likely they've lost some transversion-repair machinery. During transcription, their message strands are going unrepaired. G-to-T transversions happen on the message strand, rendering it light in purines. Such a scenario seems plausible, at least.

By this reasoning, maybe points far above the regression line represent organisms that have gained repair functionality, such that their message strands never undergo G-to-T transversions (although their transcribed strands do). Is this possible?

Examination of the highest points on the graph shows a predominance of Clostridia. (Not just members of the genus Clostridium, but the class Clostridia, which is a large, ancient, and diverse class of anaerobes.) One thing we know about the Clostridia is that unlike all other bacteria (unlike members of the Gammaproteobacteria, the Alpha- and Betaproteobacteria, the Actinomycetes, the Bacteroidetes, etc.), the Clostridia have Ogg1, otherwise known as 8-oxoguanine glycosylase (which specifically prevents G-to-T transversions). They share this capability with all members of the Archaea, and all higher life forms as well.

Note that while non-Ogg1 enzymes exist for correcting 8-oxoguanine lesions (e.g., MutM, MutY, mfd), there is evidence that Ogg1 is specifically involved in repair of 8oxoG lesions in non-transcribed strands of DNA, at transcription time. (The other 8oxoG repair systems may not be strand-specific.)

If Archaea benefit from Ogg1 the way Clostridia do, they too should fall well above the regression line on a graph of A+G versus A+T. And this is exactly what we find. In the graph below, the pink squares are members of Archaea that came up positive in a protein-Blast query against Drosophila Ogg1. (I'll explain why I used Drosophila in a minute.) The red-orange circles are bacterial species (mostly from class Clostridia) that turned up Ogg1-positive in a similar Blast search.

Ogg1-positive organisms are plotted here. The pink squares are Archaea species. Red-orange circles are bacterial species that came up Ogg1-positive in a protein Blast search using a Drosophila Ogg1 amino-acid sequence. In the background (greyed out) is the graph of all 1,373 bacterial species, for comparison. Note how the Ogg1-positive organisms have a higher purine (A+G) content than the vast majority of bacteria.

The points in this plot are significantly higher on the y-axis than points in the all-bacteria plot (and the regression line is steeper), consistent with a different DNA repair profile.

In identifying Ogg1-positive organisms, I wanted to avoid false positives (organisms with enzymes that share characteristics of Ogg1 but that aren't truly Ogg1), so for the Blast query I used Drosophila's Ogg1 as a reference enzyme, since it is well studied (unlike Archaeal or Clostridial Ogg1). I also set the E-value cutoff at 1e-10, to reduce spurious matches with DNA repair enzymes or nucleases that might have domain similarity with Ogg1 but aren't Ogg1. In addition, I did spot checks to be sure the putative Ogg1 matches that came up were not actually matches of Fpg (MutM), RecA, RadA, MutY, DNA-3-methyladenine glycosidase, or other DNA-binding enzymes.

Bottom line, organisms that have an Archaeal 8-oxoguanine glycosylase enzyme (mostly obligate anaerobes) occupy a unique part of the A+G vs. A+T graph. Which makes sense. It's only logical that anaerobes would have different DNA repair strategies (and a different "repairosome") than oxygen-tolerant bacteria, because oxidative stress is, in general, handled much differently in anaerobes. The fact that they bring different repair tactics to bear on DNA shouldn't come as a surprise.


reade more... Résuméabuiyad

A New Biological Constant?

Earlier, I gave evidence for a surprising relationship between the amount of G+C (guanine plus cytosine) in DNA and the amount of "purine loading" on the message strand in coding regions. The fact that message strands are often purine-rich is not new, of course; it's called Szybalski's Rule. What's new and unexpected is that the amount of G+C in the genome lets you predict the amount of purine loading. Also, Szybalski's rule is not always right.

Genome A+T content versus message-strand purine content (A+G) for 260 bacterial genera. Chargaff's second parity rule predicts a horizontal line at Y = 0.50. (Szybalski's rule says that all points should lie at or above 0.50.) Surprisingly, as A+T approaches 1.0, A/T approaches the Golden Ratio.
When you look at coding regions from many different bacterial species, you find that if a species has DNA with a G+C content below about 68%, it tends to have more purines than pyrimidines on the message strand (thus purine-rich mRNA). On the other hand, if an organism has extremely GC-rich DNA (G+C > 68%), a gene's message strand tends to have more pyrimidines than purines. What it means is that Szybalski's Rule is correct only for organisms with genome G+C content less than 68%. And Chargaff's second parity rule (which says that A=T an G=C even within a single strand of DNA) is flat-out wrong all the time, except at the 68% G+C point, where Chargaff is right now and then by chance.

Since the last time I wrote on this subject, I've had the chance to look at more than 1,000 additional genomes. What I've found is that the relationship between purine loading and G+C content applies not only to bacteria (and archaea) and eukaryotes, but to mitochondrial DNA, chloroplast DNA, and virus genomes (plant, animal, phage), as well.

The accompanying graphs tell the story, but I should explain a change in the way these graphs are prepared versus the graphs in my earlier posts. Earlier, I plotted G+C along the X-axis and purine/pyrmidine ratio on the Y-axis. I now plot A+T on the X-axis instead of G+C, in order to convert an inverse relationship to a direct relationship. Also, I now plot A+G (purines, as a mole fraction) on the Y-axis. Thus, X- and Y-axes are now both expressed in mole fractions, hence both are normalized to the unit interval (i.e., all values range from 0..1).

The graph above shows the relationship between genome A+T content and purine content of message strands in genomes for 260 bacterial genera. The straight line is regression-fitted to minimize the sum of squared absolute error. (Software by http://zunzun.com.) The line conforms to:

y = a + bx
 
where:
a =  0.45544384965539358
b = 0.14454244707261443


The line predicts that if a genome were to consist entirely of G+C (guanine and cytosine), it would be 45.54% guanine, whereas if (in some mythical creature) the genome were to consist entirely of A+T (adenine and thymine), adenine would comprise 59.99% of the DNA. Interestingly, the 95% confidence interval permits a value of 0.61803 at X = 1.0, which would mean that as guanine and cytosine diminish to zero, A/T approaches the Golden Ratio.

Do the most primitive bacteria (Archaea) also obey this relationship? Yes, they do. In preparing the graph below, I analyzed codon usage in 122 Archaeal genera to obtain A, G, T,  and C relative proportions in coding regions of genes. As you can see, the same basic relationship exists between purine content and A+T in Archaea as in Eubacteria. Regression analysis yielded a line with a slope of 0.16911 and a vertical offset 0.45865. So again, it's possible (or maybe it's just a very strange coincidence) that A/T approaches the Golden Ratio as A+T approaches unity.

Analysis of coding regions in 122 Archaea reveals that the same relationship exists between A+T content and purine mole-fraction (A+G) as exists in eubacteria.
For the graph below, I analyzed 114 eukaryotic genomes (everything from fungi and protists to insects, fish, worms, flowering and non-flowering plants, mosses, algae, and sundry warm- and cold-blooded animals). The slope of the generated regression line is 0.11567 and the vertical offset is 0.46116.

Eukaryotic organisms (N=114).

Mitochondria and chloroplasts (see the two graphs below) show a good bit more scatter in the data, but regression analysis still comes back with positive slopes (0.06702 and .13188, respectively) for the line of least squared absolute error.

Mitochondrial DNA (N=203).
Chloroplast DNA (N=227).
To see if this same fundamental relationship might hold even for viral genetic material, I looked at codon usage in 229 varieties of bacteriophage and 536 plant and animal viruses ranging in size from 3Kb to over 200 kilobases. Interestingly enough, the relationship between A+T and message-strand purine loading does indeed apply to viruses, despite the absence of dedicated protein-making machinery in a virion.

Plant and animal viruses (N=536).
Bacteriophage (N=229).
For the 536 plant and animal viruses (above left), the regression line has a slope of 0.23707 and meets the Y-axis at 0.62337 when X = 1.0. For bacteriophage (above right), the line's slope is 0.13733 and the vertical offset is 0.46395. (When inspecting the graphs, take note that the vertical-axis scaling is not the same for each graph. Hence the slopes are deceptive.) The Y-intercept at X = 1.0 is 0.60128. So again, it's possible A/T approaches the golden ratio as A+T approaches 100%.

The fact that viral nucleic acids follow the same purine trajectories as their hosts perhaps shouldn't come as a surprise, because viral genetic material is (in general) highly adapted to host machinery. Purine loading appropriate to the A+T milieu is just another adaptation.

It's striking that so many genomes, from so many diverse organisms (eubacteria, archaea, eukaryotes, viruses, bacteriophages, plus organelles), follow the same basic law of approximately

A+G = 0.46 + 0.14 * (A+T)

The above law is as universal a law of biology as I've ever seen. The only question is what to call the slope term. It's clearly a biological constant of considerable significance. Its physical interpretation is clear: It's the rate at which purines are accumulated in mRNA as genome A+T content increases. It says that a 1% increase in A+T content (or a 1% decrease in genome  G+C content) is worth a 0.14% increase in purine content in message strands. Maybe it should be called the purine rise rate? The purine amelioration rate?

Biologists, please feel free to get in touch to discuss. I'm interested in hearing your ideas. Reach out to me on LinkedIn, or simply leave a comment below.





reade more... Résuméabuiyad

A Very Simple Test of Chargaff's Second Rule

We know that for double-stranded DNA, the number of purines (A, G) will always equal the number of pyrimidines (T, C), because complementarity depends on A:T and G:C pairings. But do purines have to equal pyrimidines in single-stranded DNA? Chargaff's second parity rule says yes. Simple observation says no.

Suppose you have a couple thousand single-stranded DNA samples. All you have to do to see if Chargaff's second rule is correct is create a graph of A versus T, where each point represents the A and T (adenine and thymine) amounts in a particular DNA sample. If A = T (as predicted by Chargaff), the graph should look like a straight line with a slope of 1:1.

For fun, I grabbed the sequenced DNA genome of Clostridium botulinum A strain ATCC 19397 (available from the FASTA link on this page; be ready for a several-megabyte text dump), which contains coding sequences for 3552 genes of average length 442 bases each, and for each gene, I plotted the A content versus the T content.

A plot of thymine (T) versus adenine (A) content for all 3552 genes in C. botulinum coding regions. The greyed area represents areas where T/A > 1. Most genes fall in the white area where A/T > 1.

As you can see, the resulting cloud of points not only doesn't form a straight line of slope 1:1, it doesn't even cluster on the 45-degree line at all. The center of the cluster is well below the 45-degree line, and (this is the amazing part) the major axis of the cluster is almost at 90 degrees to the 45-degree line, indicating that the quantity A+T tends to be conserved.

A similar plot of G versus C (below) shows a somewhat different scatter pattern, but again notice that the centroid of the cluster is well off the 45-degree centerline. This means Chargaff's second rule doesn't hold (except for the few genes that randomly fell on the centerline).

A plot of cytosine (C) versus guanine (G) for all genes in all coding regions of C. botulinum. Again, notice that the points cluster well away from the 45-degree line (where they would have been expected to cluster, according to Chargaff).

The numbers of bases of each type in the botulinum genome are:
G: 577108
C: 358170
T: 977095
A: 1274032

Amazingly, there are 296,937 more adenines than thymines in the genome (here, I'm somewhat sloppily equating "genome" with combined coding regions). Likewise, excess guanines number 218,938. On average, each gene contains 73 excess purines (42 adenine and 31 guanine).

The above graphs are in no way unique to C. botulinum. If you do similar plots for other organisms, you'll see similar results, with excess purines being most numerous in organisms that have low G+C content. As explained in my earlier posts on this subject, the purine/pyrimidine ratio (for coding regions) tends to be high in low-GC organisms and low in high-GC organisms, a relationship that holds across all bacterial and eukaryotic domains.
reade more... Résuméabuiyad

Chargaff's Second Parity Rule is Broadly Violated

Erwin Chargaff, working with sea-urchin sperm in the 1950s, observed that within double-stranded DNA, the amount of adenine equals the amount of thymine (A = T) and guanine equals cytosine (G = C), which we now know is the basis of "complementarity" in DNA. But Chargaff later went on to observe the same thing in studies of single-stranded DNA, causing him to postulate that A = T and G = C more generally (within as well as across strands of DNA). The more general postulation is known as Chargaff's second parity rule. It says that A = T and G = C within a single strand of DNA.

The second parity rule seemed to make sense, because there was and is no a priori reason to think that DNA or RNA, whether single-stranded or double-stranded, should contain more purines than pyrimidines (nor vice versa). All other factors being equal, nature should not "favor" one class of nucleotide over another. Therefore, across evolutionary times frames, one would expect purine and pyrimidine prevalences in nucleic acids to equalize.

What we instead find, if we look at real-world DNA and RNA, is that individual strands seldom contain equal amounts of purines and pyrimidines. Szybalski was the first to note that viruses (which usually contain single-stranded nucleic acids) often contain more purines than pyrimidines. Others have since verified what Szybalski found, namely that in many organisms, DNA is purine-heavy on the "sense" strand of coding regions, such that messenger RNA ends up richer in purines than pyrimidines. This is called Szybalski's rule.

In a previous post, I presented evidence (from analysis of the sequenced genomes of 93 bacterial genera) that Szybalski's rule not only is more often true than Chargaff's second parity rule, but in fact purine-loading of coding region "message" strands occurs in direct proportion to the amount of A+T (or in inverse propoertion to the amount of G+C) in the genome. At G+C contents below about 68%, DNA becomes heavier and heavier with purines on the message strand. At G+C contents above 68%, we find organisms in which the message strand is actually pyrimidine-heavy instead of purine-heavy.

I now present evidence that purine loading of message strands in proportion to A+T content is a universal phenomenon, applying to a wide variety of eukaryotic ("higher") life forms as well as bacteria.

According to Chargaff's second parity rule, all points on this graph should fall on a horizontal line at y = 1. Instead, we see that Chargaff's rule is violated for all but a statistically insignificant subset of organisms. Pink/orange points represent eukaryotic species. Dark green data points represent bacterial genera. See text for discussion. Permission to reproduce this graph (with attribution) is granted.

To create the accompanying graph, I did frequency analysis of codons for 58 eukaryotic life forms (pink data points) and 93 prokaryotes (dark green data points) in order to derive prevalences of the four bases (A, G, C, T) in coding regions of DNA. Eukaryotes that were studied included yeast, molds, protists, warm and cold-blooded animals, flowering and non-flowering plants, alga, and insects and crustaceans. The complete list of organisms is shown in a table further below.

It can now be stated definitively that Chargaff's second parity rule is, in general, violated across all major forms of life. Not only that, it is violated in a regular fashion, such that purine loading of mRNA increases with genome A+T content. Significantly, some organisms with very low A+T content (high G+C content) actually have pyrimidine-loaded mRNA, but they are in a small minority.

Purine loading is both common and extreme. For about 20% of organisms, the purine-pyrimidine ratio is above 1.2. For some organisms, the purine excess is more than 40%, which is striking indeed.

Why should purines migrate to one strand of DNA while pyrimidines line up on the other strand? One possibility is that it minimizes spontaneous self-annealing of separated strands into secondary structures. Unrestrained "kissing" of intrastrand regions during transcription might lead to deleterious excisions, inversions, or other events. Poly-purine runs would allow the formation of many loops but few stems; in general, secondary structures would be rare.

The significance of purine loading remains to be elucidated. But in the meantime, there can be no doubt that purine enrichment of message strands is indeed widespread and strongly correlates to genome A+T content. Chargaff's second parity rule is invalid, except in a trivial minority of cases.

The prokaryotic organisms used in this study were presented in a table previously. The eukaryotic organisms are shown in the following table:

Organism Comment G+C% Purine ratio
Chlorella variabilis strain NC64A endosymbiont of Paramecium 68.76 1.1055181128896376
Chlamydomonas reinhardtii strain CC-503 cw92 mt+ unicellular alga 67.96 1.0818749999999997
Micromonas pusilla strain CCMP1545 unicellular alga 67.41 1.1873268193087356
Ectocarpus siliculosus strain Ec 32 alga 62.74 1.2090728330510347
Sporisorium reilianum SRZ2 smut fungus 62.5 0.9776547360094916
Leishmania major strain Friedlin protozoan 62.47 1.0325
Oryza sativa Japonica Group rice 54.77 1.0668412348401317
Takifugu rubripes (torafugu) fish 54.08 1.0655094027691674
Aspergillus fumigatus strain A1163 fungus 53.89 1.013091641490433
Sus scrofa (pig) pig 53.77 1.0680595779892428
Drosophila melanogaster (fruit fly)
53.69 1.0986989367655287
Brachypodium distachyon line Bd21 grass 53.32 1.0764746703677999
Selaginella moellendorffii (Spikemoss) moss 52.83 1.1014492753623195
Equus caballus (horse) horse 52.29 1.0844453711426192
Pongo abelii (Sumatran orangutan) orangutan 52 1.0929015146227405
Homo sapiens human 51.97 1.0939049081896255
Mus musculus (house mouse) strain mixed mouse 51.91 1.0827720297201582
Tuber melanosporum (Perigord truffle) strain Mel28 truffle 51.4 1.0836820083682006
Phaeodactylum tricornutum strain CCAP 1055/1 diatom 51.06 1.0418452745458253
Arthroderma benhamiae strain CBS 112371 fungus 50.99 1.0360268674944024
Ornithorhynchus anatinus (platypus) platypus 50.97 1.1121909993661525
Taeniopygia guttata (Zebra finch) bird 50.81 1.1344717182497328
Trypanosoma brucei TREU927 sleeping sickness protozoan 50.78 1.106974784013486
Danio rerio (zebrafish) strain Tuebingen fish 49.68 1.1195053003533566
Gallus gallus chicken 49.54 1.1265418970650787
Monodelphis domestica (gray short-tailed opossum) opossum 49.07 1.0768110918544194
Sorghum bicolor (sorghum) sorghum 48.93 1.046422719825232
Thalassiosira pseudonana strain CCMP1335 diatom 47.91 1.1403183213189638
Hyaloperonospora arabidopsis mildew 47.75 1.053039546400631
Daphnia pulex (common water flea) water flea 47.57 1.058036633052068
Physcomitrella patens subsp. patens moss 47.33 1.1727134477514667
Anolis carolinensis (green anole) lizard 46.72 1.113765477057538
Brassica rapa flowering plant 46.29 1.1056659411640803
Fragaria vesca (woodland strawberry) strawberry 46.02 1.1052853232259425
Amborella trichopoda flowering shrub 45.88 1.0992441209406494
Citrullus lanatus var. lanatus (watermelon) watermelon 44.5 1.0855134984692458
Capsella rubella mustard-family plant 44.37 1.1041257367387034
Arabidopsis thaliana (thale cress) cress 44.15 1.109853013573388
Lotus Japonicus lotus 44.11 1.0773228019122847
Populus trichocarpa (Populus balsamifera subsp. trichocarpa) tree 43.7 1.1097672456226706
Cucumis sativus (cucumber) cucumber 43.56 1.0823847862298719
Caenorhabditis elegans strain Bristol N2 worm 42.96 1.106320224719101
Vitis vinifera (grape) grape 42.75 1.0859833393697935
Ciona intestinalis tunicate 42.68 1.158652461848546
Solanum lycopersicum (tomato) tomato 41.7 1.1177
Theobroma cacao (chocolate) chocolate 41.31 1.1297481860862142
Medicago truncatula (barrel medic) strain A17 flowering plant 40.78 1.093754366354618
Apis mellifera (honey bee) strain DH4 honey bee 39.76 1.216042543762464
Saccharomyces cerevisiae (bakers yeast) strain S288C yeast 39.63 1.1387641650630744
Acyrthosiphon pisum (pea aphid) strain LSR1 aphid 39.35 1.1651853457619772
Debaryomyces hansenii strain CBS767 yeast 37.32  1.1477345930856775
Pediculus humanus corporis (human body louse) strain USDA louse 36.57 1.2365791828213537
Schistosoma mansoni strain Puerto Rico trematode 35.94 1.0586902800658977
Candida albicans strain WO-1 yeast 35.03 1.1490291609944834
Tetrapisispora phaffii CBS 4417 strain type CBS 4417 yeast 34.69 1.17503805175038
Paramecium tetraurelia strain d4-2 protist 30.03 1.2494922903347117
nucleomorph Guillardia theta endosymbiont 23.87 1.1529462427330803
Plasmodium falciparum 3D7 malaria parasite 23.76 1.4471365638766511
reade more... Résuméabuiyad