skbio.alignment.
TabularMSA
(sequences, metadata=None, positional_metadata=None, minter=None, index=None)[source]¶Store a multiple sequence alignment in tabular (row/column) form.
sequences (iterable of GrammaredSequence, TabularMSA) – Aligned sequences in the MSA. Sequences must all be the same type and
length. For example, sequences could be an iterable of DNA
,
RNA
, or Protein
sequences. If sequences is a TabularMSA
,
its metadata, positional_metadata, and index will be used unless
overridden by parameters metadata, positional_metadata, and
minter/index, respectively.
metadata (dict, optional) – Arbitrary metadata which applies to the entire MSA. A shallow copy of
the dict
will be made.
positional_metadata (pd.DataFrame consumable, optional) – Arbitrary metadata which applies to each position in the MSA. Must be
able to be passed directly to pd.DataFrame
constructor. Each column
of metadata must be the same length as the number of positions in the
MSA. A shallow copy of the positional metadata will be made.
minter (callable or metadata key, optional) – If provided, defines an index label for each sequence in sequences.
Can either be a callable accepting a single argument (each sequence) or
a key into each sequence’s metadata
attribute. Note that minter
cannot be combined with index.
index (pd.Index consumable, optional) – Index containing labels for sequences. Must be the same length as
sequences. Must be able to be passed directly to pd.Index
constructor. Note that index cannot be combined with minter and the
contents of index must be hashable.
ValueError – If minter and index are both provided.
ValueError – If index is not the same length as sequences.
TypeError – If sequences contains an object that isn’t a GrammaredSequence
.
TypeError – If sequences does not contain exactly the same type of
GrammaredSequence
objects.
ValueError – If sequences does not contain GrammaredSequence
objects of the
same length.
See also
skbio.sequence.DNA
, skbio.sequence.RNA
, skbio.sequence.Protein
, pandas.DataFrame
, pandas.Index
, reassign_index
Notes
If neither minter nor index are provided, default index labels will be
used: pd.RangeIndex(start=0, stop=len(sequences), step=1)
.
Examples
Create a TabularMSA
object with three DNA sequences and four positions:
>>> from skbio import DNA, TabularMSA
>>> seqs = [
... DNA('ACGT'),
... DNA('AG-T'),
... DNA('-C-T')
... ]
>>> msa = TabularMSA(seqs)
>>> msa
TabularMSA[DNA]
---------------------
Stats:
sequence count: 3
position count: 4
---------------------
ACGT
AG-T
-C-T
Since minter or index wasn’t provided, the MSA has default index labels:
>>> msa.index
RangeIndex(start=0, stop=3, step=1)
Create an MSA with metadata, positional metadata, and non-default index labels:
>>> msa = TabularMSA(seqs, index=['seq1', 'seq2', 'seq3'],
... metadata={'id': 'msa-id'},
... positional_metadata={'prob': [3, 4, 2, 2]})
>>> msa
TabularMSA[DNA]
--------------------------
Metadata:
'id': 'msa-id'
Positional metadata:
'prob': <dtype: int64>
Stats:
sequence count: 3
position count: 4
--------------------------
ACGT
AG-T
-C-T
>>> msa.index
Index(['seq1', 'seq2', 'seq3'], dtype='object')
Attributes
|
|
|
Data type of the stored sequences. |
|
Slice the MSA on either axis by index position. |
|
Index containing labels along the sequence axis. |
|
Slice the MSA on first axis by index label, second axis by position. |
|
|
|
|
|
Number of sequences (rows) and positions (columns). |
Built-ins
Boolean indicating whether the MSA is empty or not. |
|
Determine if an index label is in this MSA. |
|
Return a shallow copy of this MSA. |
|
Return a deep copy of this MSA. |
|
Determine if this MSA is equal to another. |
|
Slice the MSA on either axis. |
|
Iterate over sequences in the MSA. |
|
Number of sequences in the MSA. |
|
Determine if this MSA is not equal to another. |
|
Iterate in reverse order over sequences in the MSA. |
|
String summary of this MSA. |
Methods
|
Append a sequence to the MSA without recomputing alignment. |
Compute the majority consensus sequence for this MSA. |
|
|
Apply metric to compute conservation for all alignment positions |
|
Extend this MSA with sequences without recomputing alignment. |
|
Create a |
|
Compute frequency of gap characters across an axis. |
Determine if the object has metadata. |
|
Determine if the object has positional metadata. |
|
|
Iterate over positions (columns) in the MSA. |
|
Join this MSA with another by sequence (horizontally). |
|
Create a new |
|
Reassign index labels to sequences in this MSA. |
|
Sort sequences by index label in-place. |
|
Create a |
|
Write an instance of |