GeneticCode.
translate
(sequence, reading_frame=1, start='ignore', stop='ignore')[source]¶Translate RNA sequence into protein sequence.
State: Stable as of 0.4.0.
sequence (RNA) – RNA sequence to translate.
reading_frame ({1, 2, 3, -1, -2, -3}) – Reading frame to use in translation. 1, 2, and 3 are forward frames and -1, -2, and -3 are reverse frames. If reverse (negative), will reverse complement the sequence before translation.
start ({'ignore', 'require', 'optional'}) –
How to handle start codons:
”ignore”: translation will start from the beginning of the reading frame, regardless of the presence of a start codon.
”require”: translation will start at the first start codon in
the reading frame, ignoring all prior positions. The first amino
acid in the translated sequence will always be methionine
(M character), even if an alternative start codon was used in
translation. This behavior most closely matches the underlying
biology since fMet doesn’t have a corresponding IUPAC character.
If a start codon does not exist, a ValueError
is raised.
”optional”: if a start codon exists in the reading frame, matches the behavior of “require”. If a start codon does not exist, matches the behavior of “ignore”.
stop ({'ignore', 'require', 'optional'}) –
How to handle stop codons:
”ignore”: translation will ignore the presence of stop codons and translate to the end of the reading frame.
”require”: translation will terminate at the first stop codon.
The stop codon will not be included in the translated sequence.
If a stop codon does not exist, a ValueError
is raised.
”optional”: if a stop codon exists in the reading frame, matches the behavior of “require”. If a stop codon does not exist, matches the behavior of “ignore”.
Translated sequence.
See also
Notes
Input RNA sequence metadata are included in the translated protein sequence. Positional metadata are not included.
Examples
Translate RNA into protein using NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):
>>> from skbio import RNA, GeneticCode
>>> rna = RNA('AGUAUUCUGCCACUGUAAGAA')
>>> sgc = GeneticCode.from_ncbi()
>>> sgc.translate(rna)
Protein
--------------------------
Stats:
length: 7
has gaps: False
has degenerates: False
has definites: True
has stops: True
--------------------------
0 SILPL*E
In this command, we used the default start
behavior, which starts
translation at the beginning of the reading frame, regardless of the
presence of a start codon. If we specify “require”, translation will
start at the first start codon in the reading frame (in this example,
CUG), ignoring all prior positions:
>>> sgc.translate(rna, start='require')
Protein
--------------------------
Stats:
length: 5
has gaps: False
has degenerates: False
has definites: True
has stops: True
--------------------------
0 MPL*E
Note that the codon coding for L (CUG) is an alternative start codon in this genetic code. Since we specified “require” mode, methionine (M) was used in place of the alternative start codon (L). This behavior most closely matches the underlying biology since fMet doesn’t have a corresponding IUPAC character.
Translate the same RNA sequence, also specifying that translation terminate at the first stop codon in the reading frame:
>>> sgc.translate(rna, start='require', stop='require')
Protein
--------------------------
Stats:
length: 3
has gaps: False
has degenerates: False
has definites: True
has stops: False
--------------------------
0 MPL
Passing “require” to both start
and stop
trims the translation
to the CDS (and in fact requires that one is present in the reading
frame). Changing the reading frame to 2 causes an exception to be
raised because a start codon doesn’t exist in the reading frame:
>>> sgc.translate(rna, start='require', stop='require',
... reading_frame=2)
Traceback (most recent call last):
...
ValueError: ...