skbio.sequence.
GeneticCode
(amino_acids, starts, name='')[source]¶Genetic code for translating codons to amino acids.
amino_acids (consumable by skbio.Protein
constructor) – 64-character vector containing IUPAC amino acid characters. The order
of the amino acids should correspond to NCBI’s codon order (see Notes
section below). amino_acids is the “AAs” field in NCBI’s genetic
code format 1.
starts (consumable by skbio.Protein
constructor) – 64-character vector containing only M and - characters, with start
codons indicated by M. The order of the amino acids should correspond
to NCBI’s codon order (see Notes section below). starts is the
“Starts” field in NCBI’s genetic code format 1.
name (str, optional) – Genetic code name. This is simply metadata and does not affect the functionality of the genetic code itself.
See also
Notes
The genetic codes available via GeneticCode.from_ncbi
and used
throughout the examples are defined in 1. The genetic code strings
defined there are directly compatible with the GeneticCode
constructor.
The order of amino_acids and starts should correspond to NCBI’s codon order, defined in 1:
UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Note that scikit-bio displays this ordering using the IUPAC RNA alphabet, while NCBI displays this same ordering using the IUPAC DNA alphabet (for historical purposes).
References
Examples
Get NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):
>>> from skbio import GeneticCode
>>> GeneticCode.from_ncbi()
GeneticCode (Standard)
-------------------------------------------------------------------------
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Get a different NCBI genetic code (25):
>>> GeneticCode.from_ncbi(25)
GeneticCode (Candidate Division SR1 and Gracilibacteria)
-------------------------------------------------------------------------
AAs = FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M-------------------------------M---------------M------------
Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Define a custom genetic code:
>>> GeneticCode('M' * 64, '-' * 64)
GeneticCode
-------------------------------------------------------------------------
AAs = MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
Starts = ----------------------------------------------------------------
Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Translate an RNA sequence to protein using NCBI’s standard genetic code:
>>> from skbio import RNA
>>> rna = RNA('AUGCCACUUUAA')
>>> GeneticCode.from_ncbi().translate(rna)
Protein
--------------------------
Stats:
length: 4
has gaps: False
has degenerates: False
has definites: True
has stops: True
--------------------------
0 MPL*
Attributes
|
Genetic code name. |
|
Six possible reading frames. |
Built-ins
Determine if the genetic code is equal to another. |
|
Determine if the genetic code is not equal to another. |
|
Return string representation of the genetic code. |
Methods
|
Return NCBI genetic code specified by table ID. |
|
Translate RNA sequence into protein sequence. |
|
Translate RNA into protein using six possible reading frames. |