skbio.sequence.
RNA
(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False, validate=True)[source]¶Store RNA sequence data and optional associated metadata.
Only characters in the IUPAC RNA character set 1 are supported.
sequence (str, Sequence, or 1D np.ndarray (np.uint8 or '|S1')) – Characters representing the RNA sequence itself.
metadata (dict, optional) – Arbitrary metadata which applies to the entire sequence.
positional_metadata (Pandas DataFrame consumable, optional) – Arbitrary per-character metadata. For example, quality data from sequencing reads. Must be able to be passed directly to the Pandas DataFrame constructor.
interval_metadata (IntervalMetadata) – Arbitrary metadata which applies to intervals within a sequence to store interval features (such as exons or introns on the sequence).
lowercase (bool or str, optional) – If True
, lowercase sequence characters will be converted to
uppercase characters in order to be valid IUPAC RNA characters. If
False
, no characters will be converted. If a str, it will be
treated as a key into the positional metadata of the object. All
lowercase characters will be converted to uppercase, and a True
value will be stored in a boolean array in the positional metadata
under the key.
validate (bool, optional) – If True
, validation will be performed to ensure that all sequence
characters are in the IUPAC RNA character set. If False
, validation
will not be performed. Turning off validation will improve runtime
performance. If invalid characters are present, however, there is
no guarantee that operations performed on the resulting object will
work or behave as expected. Only turn off validation if you are
certain that the sequence characters are valid. To store sequence data
that is not IUPAC-compliant, use Sequence
.
See also
Notes
Subclassing is disabled for RNA, because subclassing makes
it possible to change the alphabet, and certain methods rely on the
IUPAC alphabet. If a custom sequence alphabet is needed, inherit directly
from GrammaredSequence
.
References
Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden
Examples
>>> from skbio import RNA
>>> RNA('ACCGAAU')
RNA
--------------------------
Stats:
length: 7
has gaps: False
has degenerates: False
has definites: True
GC-content: 42.86%
--------------------------
0 ACCGAAU
Convert lowercase characters to uppercase:
>>> RNA('AcCGaaU', lowercase=True)
RNA
--------------------------
Stats:
length: 7
has gaps: False
has degenerates: False
has definites: True
GC-content: 42.86%
--------------------------
0 ACCGAAU
Attributes
|
Return valid characters. |
|
Return mapping of nucleotide characters to their complements. |
|
Gap character to use when constructing a new gapped sequence. |
|
|
|
Return definite characters. |
|
Return degenerate characters. |
|
Return mapping of degenerate to definite characters. |
|
Return characters defined as gaps. |
|
|
|
|
|
Return non-degenerate characters. |
|
Set of observed characters in the sequence. |
|
|
|
Array containing underlying sequence characters. |
Built-ins
Returns truth value (truthiness) of sequence. |
|
Determine if a subsequence is contained in this sequence. |
|
Return a shallow copy of this sequence. |
|
Return a deep copy of this sequence. |
|
Determine if this sequence is equal to another. |
|
Slice this sequence. |
|
Iterate over positions in this sequence. |
|
Return the number of characters in this sequence. |
|
Determine if this sequence is not equal to another. |
|
Iterate over positions in this sequence in reverse order. |
|
Return sequence characters as a string. |
Methods
|
Return the complement of the nucleotide sequence. |
|
Concatenate an iterable of |
|
Count occurrences of a subsequence in this sequence. |
Find positions containing definite characters in the sequence. |
|
|
Return a new sequence with gap characters removed. |
Find positions containing degenerate characters in the sequence. |
|
|
Compute the distance to another sequence. |
Yield all possible definite versions of the sequence. |
|
|
Search the biological sequence for motifs. |
|
Generate slices for patterns matched by a regular expression. |
|
Compute frequencies of characters in the sequence. |
|
Find positions containing gaps in the biological sequence. |
Calculate the relative frequency of G’s and C’s in the sequence. |
|
|
Calculate frequency of G’s and C’s in the sequence. |
Determine if sequence contains one or more definite characters |
|
Determine if sequence contains one or more degenerate characters. |
|
|
Determine if the sequence contains one or more gap characters. |
Determine if the object has interval metadata. |
|
Determine if the object has metadata. |
|
Determine if sequence contains one or more non-degenerate characters |
|
Determine if the object has positional metadata. |
|
|
Find position where subsequence first occurs in the sequence. |
|
Determine if a sequence is the reverse complement of this sequence. |
|
Yield contiguous subsequences based on included. |
|
Generate kmers of length k from this sequence. |
|
Return counts of words of length k from this sequence. |
|
Return a case-sensitive string representation of the sequence. |
|
Return count of positions that are the same between two sequences. |
|
Find positions that match with another sequence. |
|
Return count of positions that differ between two sequences. |
|
Find positions that do not match with another sequence. |
Find positions containing non-degenerate characters in the sequence. |
|
|
Create a new |
|
Replace values in this sequence with a different character. |
Return the reverse complement of the nucleotide sequence. |
|
Reverse transcribe RNA into DNA. |
|
|
Return regular expression object that accounts for degenerate chars. |
|
Translate RNA sequence into protein sequence. |
|
Translate RNA into protein using six possible reading frames. |
|
Write an instance of |