statsmodels.stats.dist_dependence_measures.distance_statistics

statsmodels.stats.dist_dependence_measures.distance_statistics(x, y, x_dist=None, y_dist=None)[source]

Calculate various distance dependence statistics.

Calculate several distance dependence statistics as described in [R91].

Parameters

x : array_like, 1-D or 2-D

If x is 1-D than it is assumed to be a vector of observations of a single random variable. If x is 2-D than the rows should be observations and the columns are treated as the components of a random vector, i.e., each column represents a different component of the random vector x.

y : array_like, 1-D or 2-D

Same as x, but only the number of observation has to match that of x. If y is 2-D note that the number of columns of y (i.e., the number of components in the random vector) does not need to match the number of columns in x.

x_dist : array_like, 2-D, optional

A square 2-D array_like object whose values are the euclidean distances between x’s rows.

y_dist : array_like, 2-D, optional

A square 2-D array_like object whose values are the euclidean distances between y’s rows.

Returns

namedtuple

A named tuple of distance dependence statistics (DistDependStat) with the following values:

  • test_statistic : float - The “basic” test statistic (i.e., the one used when the emp method is chosen when calling distance_covariance_test()

  • distance_correlation : float - The distance correlation between x and y.

  • distance_covariance : float - The distance covariance of x and y.

  • dvar_x : float - The distance variance of x.

  • dvar_y : float - The distance variance of y.

  • S : float - The mean of the euclidean distances in x multiplied by those of y. Mostly used internally.

References

R91(1,2)

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007) “Measuring and testing dependence by correlation of distances”. Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.

Examples

>>> from statsmodels.stats.dist_dependence_measures import
... distance_statistics
>>> distance_statistics(np.random.random(1000), np.random.random(1000))
DistDependStat(test_statistic=0.07948284320205831,
distance_correlation=0.04269511890990793,
distance_covariance=0.008915315092696293,
dvar_x=0.20719027438266704, dvar_y=0.21044934264957588,
S=0.10892061635588891)