Pages

.

Showing posts with label antisense translation. Show all posts
Showing posts with label antisense translation. Show all posts

An Example of Antisense Proteogenesis?

The question of how organisms develop entirely new genes is one of the most important open questions in biology. One possibility is that new genes often develop through accidental translation of antisense strands of DNA.

An example of this can be seen with the S1 protein of the 30S bacterial ribosome. If you take the amino-acid sequence for an S1 gene and use it as the query sequence in a blast-p (protein blast), you'll mostly get back hits on other S1 proteins, but you'll also get minor (low-fidelity) hits on polynucleotide phosphorylase. Why? When you do a blast search, the search engine, by default, looks at both DNA strands of target genes (sense and antisense strands) to see if there's a potential sequence match with the query. If there's a match on the antisense strand, it will be reported along with "sense" matches. In the case of the S1 protein, blast-p searches often report weak antisense hits on polynucleotide phosphorylase in addition to strong sense hits on ribosomal S1.

Ribosomal proteins are, of course, among the most highly conserved proteins in nature. It turns out that polynucleotide phosphorylase (PNPase) is very highly conserved as well. It's an enzyme that occurs in every life form (bacteria, fungi, plants, animals), absent only in a scant handful of microbial endosymbionts that have lost the majority of their genes through deletions. While the chemical function of PNPase is well understood (it catalyzes the interconversion of nucleoside diphosphates to RNA), its physiologic purpose is not well understood, although recent research shows that PNPase-knockout mutants of E. coli exhibit lower mutation rates. (Hence, PNPase may actually be involved in generating mutations.)

The bacterium Rothia mucilaginosa, strain DY18, has a (putative) PNPase gene at a genome offset of 1277514. When this gene is used as the query for a blast-p search, the hits that come back include many strong matches for the S1 ribosomal proteins of various organisms. By "strong match," I mean better than 80% sequence identity coupled with an E-value (expectation value) of zero. (Recall that the E-value represents the approximate odds of the match in question happening due to random chance.

If we use the Genome Viewer at genomevolution.org to look at the PNPase gene of Rothia mucilaginosa, we see something extraordinarily peculiar (look carefully at the graphic below). Click to enlarge the following image, or better yet, to see this genome view for yourself, go to this link.

Notice the presence of overlapping sense and antisense open reading frames on a portion of DNA from Rothia mucilaginosa. The top reading frame contains the gene for polynucleotide phosphorylase. The lower (-1 strand) reading frame contains ribosomal S1. To see this in your own browser, go to this link.

Notice that there are overlapping genes. On the top strand is the gene for PNPase; on the bottom strand, in the same location, is a gene for ribosomal S1. These are bidirectionally overlapping open reading frames, something occasionally encountered in virus nucleic acids but rarely seen in bacterial or other genomes.

How do we explain this anomaly? It could be just that: an anomaly, two open reading frames that happen to overlap (but that aren't necessarily translated in vivo). Or it could be that at some point, many millions of years ago, the ribosomal S1 gene of a Rothia ancestor was erroneously translated via the antisense strand, producing a protein with PNPase characteristics. We don't know why PNPase confers survival value (its physiologic purpose is not fully understood), but we do know, with a fair degree of certainty, that PNPase does, in fact, confer survival value—because every organism, at every level of the tree of life, has at least one copy of PNPase. Once Rothia's ancestor, through whatever process, opened up a reading frame on the antisense strand of ribosomal S1, the reading frame stayed open, because it conferred survival value. In this way, the first Rothia PNPase was born. (Arguably.)

At some point in its history, Rothia duplicated its PNPase gene and placed a new copy at genome offset 1650959. Over time, this second copy diverged from the original copy, becoming more like E. coli PNPase (which is also to say, less S1-like). Rothia's second PNPase shows a blast-p similarity of 45% (in terms of AA identities) to E. coli PNPase, with E-value 4.0e-147. It shows a blast-p similarity of 26% (AA identities) with E. coli ribosomal S1 (E-value: 4.0e-17). Neither E. coli PNPase nor Rothia PNPase-2 overlaps an S1 gene. However, both are colocated with the ribosomal S15 protein gene. And you'll find (if you look at lots of bacterial genomes) that PNPase is almost always located immediately next to an S15 ribosomal gene.

Rothia PNPase is an example of an enzyme that may very well have started out as an antisense copy of another protein (the S1 ribosomal protein). Of course, the mere presence of bidirectionally overlapping open reading frames doesn't prove that both frames are actually transcribed and translated in vivo. But the fact that blast-p searches using PNPase as the query almost always turn up faint S1 echoes (in a wide variety of organisms) is highly suggestive of an ancestral relationship between the two proteins.

reade more... Résuméabuiyad

Evolution and Antisense Translation of DNA

Yesterday I offered a theory for new gene creation which might be called the Erroneous Translation Theory. Basically, I proposed that new proteins arise through frameshifted and/or reversed translation of nucleic acids (translation of antisense strands of DNA).

Erroneous translation of DNA offers interesting possibilities for gain of function. (Recall that most point mutations result in loss of function, and one of the major criticisms of Darwinian theory is that evolution based on accumulation of point mutations cannot account for gain-of-function events.) Wholesale mistranslation via frameshift errors and/or wrong-strand transcription allow for the sudden emergence of entirely new classes of proteins. The unit of change is no longer the single base-pair polymorphism but the functional domain or motif.

An important aspect of antisense-strand translation has to do with stop codons. In DNA, the sequences TCA, TTA, and CTA specify amino acids serine, leucine, and leucine, respectively. But when these three codons are complemented, then read in 5'-to-3' direction—in other words, when they're antisense-translated—they form the stop codons TGA, TAA, and TAG, which tell the cell's protein-making machinery to terminate the production of the current polypeptide. Thus, if a typical gene containing codons TCA, TTA, and CTA is translated "backwards," translation will end prematurely: It will end as soon as a stop codon is encountered.

How important a consideration is this in the real world? Consider the following DNA sequence, which represents the gene for the cytidine deaminase enzyme of Clostridium botulinum:

>Clostridium botulinum A strain ATCC 19397(v1, unmasked), Name: ABS32549.1, CLB_0040, Type: CDS, Feature Location: (Chr: 1, 37028..37465) Genomic Location: 37028-37465ATGAATGATTATATAGAATATGCAATAATTGAAGCAAAAAAAGCATTAGCAATAGGAGAAGTACCTGTTGGAGCTATTATAGTTAAAGAAAATAAAATTATAGCAAAAAGTCATAATTTAAAAGAGTCATTGAAGGATCCAACAGCTCATGCAGAGATATTAGCTATAAAAGAAGCTTGCAATACAATACATAATTGGAGATTAAAAGGATGTAAGATGTATGTAACATTAGAACCATGTGCTATGTGTGCTAGTGCAATAATTCAATCTAGAATAAGTGAATTGCATATAGGAACCTTTGATCCAGTGGGAGGGGCTTGTGGATCAGTAGTAAATATAACAAATAATAGTTATTTAAAAAATAATTTAAATATTAAATGGTTATATGATGATGAATGTAGTAGAATAATAACAAATTTTTTTAAAAATATTAGATAA

The above sequence is the "sense" strand of the DNA, in 5'-to-3' direction. The sequence below is the corresponding 3'-to-5' complementary sequence (in other words, what's on the antisense strand of DNA):

TACTTACTAATATATCTTATACGTTATTAACTTCGTTTTTTTCGTAATCGTTATCCTCTTCATGGACAACCTCGATAATATCAATTTCTTTTATTTTAATATCGTTTTTCAGTATTAAATTTTCTCAGTAACTTCCTAGGTTGTCGAGTACGTCTCTATAATCGATATTTTCTTCGAACGTTATGTTATGTATTAACCTCTAATTTTCCTACATTCTACATACATTGTAATCTTGGTACACGATACACACGATCACGTTATTAAGTTAGATCTTATTCACTTAACGTATATCCTTGGAAACTAGGTCACCCTCCCCGAACACCTAGTCATCATTTATATTGTTTATTATCAATAAATTTTTTATTAAATTTATAATTTACCAATATACTACTACTTACATCATCTTATTATTGTTTAAAAAAATTTTTATAATCTATT

When the antisense sequence is translated in the normal 5'-to-3' direction, the following amino acid sequence results:

LSNIFKKICYYSTTFIII*PFNI*IIF*ITIICYIYY*STSPSHWIKGSYMQFTYSRLNYCTSTHSTWF*CYIHLTSF*SPIMYCIASFFYS*YLCMSCWILQ*LF*IMTFCYNFIFFNYNSSNRYFSYC*CFFCFNYCIFYIIIH

This sequence of 146 amino acids (shown here using standard one-letter amino-acid abbreviations) contains 10 stop codons (depicted as asterisks). Any attempt to translate the antisense strand of the C. botulinum cytidine deaminase gene will result in (at best) a series of short oligopeptides.

It's tempting to conclude that this is nature's ingenious way of preventing the occurrence of nonsense proteins. Translate the wrong strand of DNA by mistake, and translation quickly terminates. (In the above example, a stop codon occurs every 14 amino acids, on average.) But before you jump to that conclusion, consider the cytidine deaminase gene of Anaeromyxobacter dehalogenans strain 2CP-C:

GTGGACGAGCGCGAGGCGATGCAGGAGGCGCTGGGGCTGGCGCGCGAGGCGGCGGCCCGCGGCGAGGTGCCGGTCGGCGCGGTGGCGCTGTTCGAGGGCCGCGTGGTCGGCCGCGGCGCGAACGCCCGCGAGGCGGCGCGCGATCCCACCGCGCACGCGGAGCTCCTCGCGATCCAGGAGGCGGCGCGCACCCTCGGGCGCTGGCGCCTCACCGGCGTCACGCTGGTGGTGACGCTCGAGCCCTGCGCCATGTGCGCCGGCGCCATGGTGCTCGCCCGCATCGACCGGCTCGTCTACGGGGCGAGCGATCCCAAGGCCGGCTGCACCGGCTCCCTCCAGGACCTGTCGGCGGACCCCCGGCTGAACCACCGGTTCCCGGTGGAGCGCGGCCTGCTGGCCGAGGAGTCCGGCGAGCTCCTCCGGGCCTTCTTCCGGGCCCGCCGGGGCGCCGGGAACGGAAACGGCAACGGCGGCGAGGGTTAG

The translation of the antisense version of this gene is:

LTLAAVAVSVPGAPAGPEEGPEELAGLLGQQAALHREPVVQPGVRRQVLEGAGAAGLGIARPVDEPVDAGEHHGAGAHGAGLERHHQRDAGEAPAPEGARRLLDREELRVRGGIARRLAGVRAAADHAALEQRHRADRHLAAGRRLARQPQRLLHRLALVH

Which contains no stop codons! Why does one version of the gene give ten stop codons when anti-translated, whereas the other version gives zero stop codons? Clostridium botulinum has a genome G+C content of 28% whereas the DNA of Anaeromyxobacter dehalogenans has a G+C content of 74%. The two organisms favor entirely different codons. Anaeromyxobacter uses codons TCA, TTA, and CTA only 0.03%, 0%, and 0.02% of the time, respectively. Clostridium uses the same codons 1.72%, 5.62%, and 4.67% of the time—over 200 times more often than Anaeromyxobacter.

Bottom line: Almost any gene in Anaeromyxobacter (or any high-GC organism, it turns out) can be antisense-translated without generating stop codons. Stop codons occur in antisense genes in inverse proportion to the amount of G+C in the gene.  

If it's true that antisense-strand translation is (or has been) an important source of new proteins in nature, the foregoing observation is tremendously relevant, because it means successful reverse translation has likely occurred far more often in high-GC organisms than in low-GC organisms. It suggests that bacteria with high G+C content in their genomes may, in fact, have been the incubators of early proteins. It implies a "GC Eden" scenario in which early life forms had predominantly high-GC genomes. Low-GC organisms then arose through continuous "AT pressure," from large numbers of accumulated GC-to-AT transition mutations. (We know that GC-to-AT transition mutations occur at a much higher rate than AT-to-GC transitions; this fact is not in dispute.)

Even so, we have to ask: What is the evidence for reverse (antisense-strand) translation having occurred in nature? Is there any such evidence?

More on this subject tomorrow.
reade more... Résuméabuiyad