Quality metrics¶

optunity.metrics.absolute_error(y, yhat)[source]¶

Returns the maximal absolute error between y and yhat.

Parameters:	y – true function values yhat – predicted function values

Lower is better.

>>> absolute_error([0,1,2,3], [0,0,1,1])
2.0

optunity.metrics.accuracy(y, yhat)[source]¶

Returns the accuracy. Higher is better.

Parameters:	y – true function values yhat – predicted function values

optunity.metrics.auc(curve)[source]¶

Computes the area under the specified curve.

Parameters:	curve ([(x, y), ...]) – a curve, specified as a list of (x, y) tuples

See also

optunity.score_functions.compute_curve()

optunity.metrics.brier(y, yhat, positive=True)[source]¶

Returns the Brier score between y and yhat.

Parameters:

y – true function values
yhat – predicted function values

Returns:

\[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]

yhat must be a vector of probabilities, e.g. elements in [0, 1]

Lower is better.

Note

This loss function should only be used for probabilistic models.

optunity.metrics.compute_curve(ys, decision_values, xfun, yfun, positive=True, presorted=False)[source]¶

Computes a curve based on contingency tables at different decision values.

Parameters:	ys (iterable) – true labels decision_values (iterable) – decision values positive – positive label xfun (callable) – function to compute x values, based on contingency tables yfun (callable) – function to compute y values, based on contingency tables presorted (bool) – whether or not ys and yhat are already sorted
Returns:	the resulting curve, as a list of (x, y)-tuples

optunity.metrics.contingency_table(ys, yhats, positive=True)[source]¶

Computes a contingency table for given predictions.

Parameters:	ys (iterable) – true labels yhats (iterable) – predicted labels positive – the positive label
Returns:	TP, FP, TN, FN

>>> ys =    [True, True, True, True, True, False]
>>> yhats = [True, True, False, False, False, True]
>>> tab = contingency_table(ys, yhats, 1)
>>> print(tab)
(2, 1, 0, 3)

optunity.metrics.contingency_tables(ys, decision_values, positive=True, presorted=False)[source]¶

Computes contingency tables for every unique decision value.

Parameters:	ys (iterable) – true labels decision_values (iterable) – decision values (higher = stronger positive) positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted
Returns:	a list of contingency tables (TP, FP, TN, FN) and the corresponding thresholds.

Contingency tables are built based on decision \(decision\_value \geq threshold\).

The first contingency table corresponds with a (potentially unseen) threshold that yields all negatives.

>>> y = [0, 0, 0, 0, 1, 1, 1, 1]
>>> d = [2, 2, 1, 1, 1, 2, 3, 3]
>>> tables, thresholds = contingency_tables(y, d, 1)
>>> print(tables)
[(0, 0, 4, 4), (2, 0, 4, 2), (3, 2, 2, 1), (4, 4, 0, 0)]
>>> print(thresholds)
[None, 3, 2, 1]

optunity.metrics.error_rate(y, yhat)[source]¶

Returns the error rate (lower is better).

Parameters:	y – true function values yhat – predicted function values

>>> error_rate([0,0,1,1], [0,0,0,1])
0.25

optunity.metrics.fbeta(y, yhat, beta, positive=True)[source]¶

Returns the \(F_\beta\)-score.

Parameters:

y – true function values
yhat – predicted function values
beta (float (positive)) – the value for beta to be used
positive – the positive label

Returns:

\[(1 + \beta^2)\frac{cdot precision\cdot recall}{(\beta^2 * precision)+recall}\]

optunity.metrics.logloss(y, yhat)[source]¶

Returns the log loss between labels and predictions.

Parameters:

y – true function values
yhat – predicted function values

Returns:

\[-\frac{1}{n}\sum_{i=1}^n\big[y \times \log \hat{y}+(1-y) \times \log (1-\hat{y})\big]\]

y must be a binary vector, e.g. elements in {True, False} yhat must be a vector of probabilities, e.g. elements in [0, 1]

Lower is better.

Note

This loss function should only be used for probabilistic models.

optunity.metrics.mse(y, yhat)[source]¶

Returns the mean squared error between y and yhat.

Parameters:

y – true function values
yhat – predicted function values

Returns:

\[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]

Lower is better.

>>> mse([0, 0], [2, 3])
6.5

optunity.metrics.npv(y, yhat, positive=True)[source]¶

Returns the negative predictive value (higher is better).

Parameters:	y – true function values yhat – predicted function values positive – the positive label
Returns:	number of true negative predictions / number of negative predictions

optunity.metrics.pr_auc(ys, yhat, positive=True, presorted=False, return_curve=False)[source]¶

Computes the area under the precision-recall curve (higher is better).

Parameters:	y – true function values yhat – predicted function values positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted return_curve (bool) – whether or not the curve should be returned

>>> pr_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0

>>> round(pr_auc([0,0,1,1], [0,1,1,2], 1), 2)
0.92

Note

Precision is undefined at recall = 0. In this case, we set precision equal to the precision that was obtained at the lowest non-zero recall.

optunity.metrics.precision(y, yhat, positive=True)[source]¶

Returns the precision (higher is better).

Parameters:	y – true function values yhat – predicted function values positive – the positive label
Returns:	number of true positive predictions / number of positive predictions

optunity.metrics.pu_score(y, yhat)[source]¶

Returns a score used for PU learning as introduced in [LEE2003].

Parameters:

y – true function values
yhat – predicted function values

Returns:

\[\frac{P(\hat{y}=1 | y=1)^2}{P(\hat{y}=1)}\]

y and yhat must be boolean vectors.

Higher is better.

[LEE2003]

Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003.

optunity.metrics.r_squared(y, yhat)[source]¶

Returns the R squared statistic, also known as coefficient of determination (higher is better).

Parameters:

y – true function values
yhat – predicted function values
positive – the positive label

Returns:

\[R^2 = 1-\frac{SS_{res}}{SS_{tot}} = 1-\frac{\sum_i (y_i - yhat_i)^2}{\sum_i (y_i - mean(y))^2}\]

optunity.metrics.recall(y, yhat, positive=True)[source]¶

Returns the recall (higher is better).

Parameters:	y – true function values yhat – predicted function values positive – the positive label
Returns:	number of true positive predictions / number of true positives

optunity.metrics.roc_auc(ys, yhat, positive=True, presorted=False, return_curve=False)[source]¶

Computes the area under the receiver operating characteristic curve (higher is better).

Parameters:	y – true function values yhat – predicted function values positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted return_curve (bool) – whether or not the curve should be returned

>>> roc_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0

>>> roc_auc([0,0,1,1], [0,1,1,2], 1)
0.875