TabularMSA.
gap_frequencies
(axis='sequence', relative=False)[source]¶Compute frequency of gap characters across an axis.
State: Experimental as of 0.4.1.
axis ({'sequence', 'position'}, optional) – Axis to compute gap character frequencies across. If ‘sequence’ or 0, frequencies are computed for each position in the MSA. If ‘position’ or 1, frequencies are computed for each sequence.
relative (bool, optional) – If True
, return the relative frequency of gap characters
instead of the count.
Vector of gap character frequencies across the specified axis. Will
have int
dtype if relative=False
and float
dtype if
relative=True
.
1D np.ndarray (int or float)
ValueError – If axis is invalid.
Notes
If there are no positions in the MSA, axis='position'
, and
relative=True
, the relative frequency of gap characters in each
sequence will be np.nan
.
Examples
Compute frequency of gap characters for each position in the MSA (i.e., across the sequence axis):
>>> from skbio import DNA, TabularMSA
>>> msa = TabularMSA([DNA('ACG'),
... DNA('A--'),
... DNA('AC.'),
... DNA('AG.')])
>>> msa.gap_frequencies()
array([0, 1, 3])
Compute relative frequencies across the same axis:
>>> msa.gap_frequencies(relative=True)
array([ 0. , 0.25, 0.75])
Compute frequency of gap characters for each sequence (i.e., across the position axis):
>>> msa.gap_frequencies(axis='position')
array([0, 2, 1, 1])