Protein.
frequencies
(chars=None, relative=False)[source]¶Compute frequencies of characters in the sequence.
State: Experimental as of 0.4.1.
chars (str or set of str, optional) – Characters to compute the frequencies of. May be a str
containing a single character or a set
of single-character
strings. If None
, frequencies will be computed for all
characters present in the sequence.
relative (bool, optional) – If True
, return the relative frequency of each character
instead of its count. If chars is provided, relative frequencies
will be computed with respect to the number of characters in the
sequence, not the total count of characters observed in
chars. Thus, the relative frequencies will not necessarily sum to
1.0 if chars is provided.
Frequencies of characters in the sequence.
dict
TypeError – If chars is not a str
or set
of str
.
ValueError – If chars is not a single-character str
or a set
of
single-character strings.
ValueError – If chars contains characters outside the allowable range of
characters in a Sequence
object.
See also
Notes
If the sequence is empty (i.e., length zero), relative=True
,
and chars is provided, the relative frequency of each specified
character will be np.nan
.
If chars is not provided, this method is equivalent to, but faster
than, seq.kmer_frequencies(k=1)
.
If chars is not provided, it is equivalent to, but faster than,
passing chars=seq.observed_chars
.
Examples
Compute character frequencies of a sequence:
>>> from pprint import pprint
>>> from skbio import Sequence
>>> seq = Sequence('AGAAGACC')
>>> freqs = seq.frequencies()
>>> pprint(freqs) # using pprint to display dict in sorted order
{'A': 4, 'C': 2, 'G': 2}
Compute relative character frequencies:
>>> freqs = seq.frequencies(relative=True)
>>> pprint(freqs)
{'A': 0.5, 'C': 0.25, 'G': 0.25}
Compute relative frequencies of characters A, C, and T:
>>> freqs = seq.frequencies(chars={'A', 'C', 'T'}, relative=True)
>>> pprint(freqs)
{'A': 0.5, 'C': 0.25, 'T': 0.0}
Note that since character T is not in the sequence we receive a relative frequency of 0.0. The relative frequencies of A and C are relative to the number of characters in the sequence (8), not the number of A and C characters (4 + 2 = 6).