RNA.
iter_contiguous
(included, min_length=1, invert=False)[source]¶Yield contiguous subsequences based on included.
State: Stable as of 0.4.0.
included (1D array_like (bool) or iterable (slices or ints)) – included is transformed into a flat boolean vector where each position will either be included or skipped. All contiguous included positions will be yielded as a single region.
min_length (int, optional) – The minimum length of a subsequence for it to be yielded. Default is 1.
invert (bool, optional) – Whether to invert included such that it describes what should be skipped instead of included. Default is False.
Sequence – Contiguous subsequence as indicated by included.
Notes
If slices provide adjacent ranges, then they will be considered the same contiguous subsequence.
Examples
Here we use iter_contiguous to find all of the contiguous ungapped sequences using a boolean vector derived from our DNA sequence.
>>> from skbio import DNA
>>> s = DNA('AAA--TT-CCCC-G-')
>>> no_gaps = ~s.gaps()
>>> for ungapped_subsequence in s.iter_contiguous(no_gaps,
... min_length=2):
... print(ungapped_subsequence)
AAA
TT
CCCC
Note how the last potential subsequence was skipped because it would have been smaller than our min_length which was set to 2.
We can also use iter_contiguous on a generator of slices as is produced by find_motifs (and find_with_regex).
>>> from skbio import Protein
>>> s = Protein('ACDFNASANFTACGNPNRTESL')
>>> for subseq in s.iter_contiguous(s.find_motifs('N-glycosylation')):
... print(subseq)
NASANFTA
NRTE
Note how the first subsequence contains two N-glycosylation sites. This happened because they were contiguous.