skbio.stats.distance.
permanova
(distance_matrix, grouping, column=None, permutations=999)[source]¶Test for significant differences between groups using PERMANOVA.
State: Experimental as of 0.4.0.
Permutational Multivariate Analysis of Variance (PERMANOVA) is a non-parametric method that tests whether two or more groups of objects (e.g., samples) are significantly different based on a categorical factor. It is conceptually similar to ANOVA except that it operates on a distance matrix, which allows for multivariate analysis. PERMANOVA computes a pseudo-F statistic.
Statistical significance is assessed via a permutation test. The assignment of objects to groups (grouping) is randomly permuted a number of times (controlled via permutations). A pseudo-F statistic is computed for each permutation and the p-value is the proportion of permuted pseudo-F statisics that are equal to or greater than the original (unpermuted) pseudo-F statistic.
distance_matrix (DistanceMatrix) – Distance matrix containing distances between objects (e.g., distances between samples of microbial communities).
grouping (1-D array_like or pandas.DataFrame) – Vector indicating the assignment of objects to groups. For example,
these could be strings or integers denoting which group an object
belongs to. If grouping is 1-D array_like
, it must be the same
length and in the same order as the objects in distance_matrix. If
grouping is a DataFrame
, the column specified by column will be
used as the grouping vector. The DataFrame
must be indexed by the
IDs in distance_matrix (i.e., the row labels must be distance matrix
IDs), but the order of IDs between distance_matrix and the
DataFrame
need not be the same. All IDs in the distance matrix must
be present in the DataFrame
. Extra IDs in the DataFrame
are
allowed (they are ignored in the calculations).
column (str, optional) – Column name to use as the grouping vector if grouping is a
DataFrame
. Must be provided if grouping is a DataFrame
.
Cannot be provided if grouping is 1-D array_like
.
permutations (int, optional) – Number of permutations to use when assessing statistical
significance. Must be greater than or equal to zero. If zero,
statistical significance calculations will be skipped and the p-value
will be np.nan
.
Results of the statistical test, including test statistic
and
p-value
.
pandas.Series
See also
Notes
See 1 for the original method reference, as well as vegan::adonis
,
available in R’s vegan package 2.
The p-value will be np.nan
if permutations is zero.
References
Anderson, Marti J. “A new method for non-parametric multivariate analysis of variance.” Austral Ecology 26.1 (2001): 32-46.
Examples
See skbio.stats.distance.anosim
for usage examples (both functions
provide similar interfaces).