skbio.diversity.alpha.
gini_index
(data, method='rectangles')[source]¶Calculate the Gini index.
State: Experimental as of 0.4.0.
The Gini index is defined as
where \(A\) is the area between \(y=x\) and the Lorenz curve and \(B\) is the area under the Lorenz curve. Simplifies to \(1-2B\) since \(A+B=0.5\).
data (1-D array_like) – Vector of counts, abundances, proportions, etc. All entries must be non-negative.
method ({'rectangles', 'trapezoids'}) – Method for calculating the area under the Lorenz curve. If
'rectangles'
, connects the Lorenz curve points by lines parallel to
the x axis. This is the correct method (in our opinion) though
'trapezoids'
might be desirable in some circumstances. If
'trapezoids'
, connects the Lorenz curve points by linear segments
between them. Basically assumes that the given sampling is accurate and
that more features of given data would fall on linear gradients between
the values of this data.
Gini index.
double
ValueError – If method isn’t one of the supported methods for calculating the area under the curve.
Notes
The Gini index was introduced in 1. The formula for
method='rectangles'
is
The formula for method='trapezoids'
is
References
Gini, C. (1912). “Variability and Mutability”, C. Cuppini, Bologna, 156 pages. Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1955).