TabularMSA.
consensus
()[source]¶Compute the majority consensus sequence for this MSA.
State: Experimental as of 0.4.1.
The majority consensus sequence contains the most common character at each position in this MSA. Ties will be broken in an arbitrary manner.
The majority consensus sequence for this MSA. The type of sequence
returned will be the same as this MSA’s dtype
or Sequence
if this MSA does not contain any sequences. The majority consensus
sequence will have its positional metadata set to this MSA’s
positional metadata if present.
Notes
The majority consensus sequence will use this MSA’s default gap
character (dtype.default_gap_char
) to represent gap majority at a
position, regardless of the gap characters present at that position.
Different gap characters at a position are not treated as distinct characters. All gap characters at a position contribute to that position’s gap consensus.
Examples
>>> from skbio import DNA, TabularMSA
>>> sequences = [DNA('AC---'),
... DNA('AT-C.'),
... DNA('TT-CG')]
>>> msa = TabularMSA(sequences,
... positional_metadata={'prob': [2, 1, 2, 3, 5]})
>>> msa.consensus()
DNA
--------------------------
Positional metadata:
'prob': <dtype: int64>
Stats:
length: 5
has gaps: True
has degenerates: False
has definites: True
GC-content: 33.33%
--------------------------
0 AT-C-
Note that the last position in the MSA has more than one type of gap
character. These are not treated as distinct characters; both types of
gap characters contribute to the position’s consensus. Also note that
DNA.default_gap_char
is used to represent gap majority at a
position ('-'
).