Quality metrics¶

optunity.metrics.absolute_error(y, yhat)[source]

Returns the maximal absolute error between y and yhat.

Parameters: y – true function values yhat – predicted function values

Lower is better.

>>> absolute_error([0,1,2,3], [0,0,1,1])
2.0

optunity.metrics.accuracy(y, yhat)[source]

Returns the accuracy. Higher is better.

Parameters: y – true function values yhat – predicted function values
optunity.metrics.auc(curve)[source]

Computes the area under the specified curve.

Parameters: curve ([(x, y), ..]) – a curve, specified as a list of (x, y) tuples

optunity.score_functions.compute_curve()

optunity.metrics.brier(y, yhat, positive=True)[source]

Returns the Brier score between y and yhat.

Parameters: y – true function values yhat – predicted function values $\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]$

yhat must be a vector of probabilities, e.g. elements in [0, 1]

Lower is better.

Note

This loss function should only be used for probabilistic models.

optunity.metrics.compute_curve(ys, decision_values, xfun, yfun, positive=True, presorted=False)[source]

Computes a curve based on contingency tables at different decision values.

Parameters: ys (iterable) – true labels decision_values (iterable) – decision values positive – positive label xfun (callable) – function to compute x values, based on contingency tables yfun (callable) – function to compute y values, based on contingency tables presorted (bool) – whether or not ys and yhat are already sorted the resulting curve, as a list of (x, y)-tuples
optunity.metrics.contingency_table(ys, yhats, positive=True)[source]

Computes a contingency table for given predictions.

Parameters: ys (iterable) – true labels yhats (iterable) – predicted labels positive – the positive label TP, FP, TN, FN
>>> ys =    [True, True, True, True, True, False]
>>> yhats = [True, True, False, False, False, True]
>>> tab = contingency_table(ys, yhats, 1)
>>> print(tab)
(2, 1, 0, 3)

optunity.metrics.contingency_tables(ys, decision_values, positive=True, presorted=False)[source]

Computes contingency tables for every unique decision value.

Parameters: ys (iterable) – true labels decision_values (iterable) – decision values (higher = stronger positive) positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted a list of contingency tables (TP, FP, TN, FN) and the corresponding thresholds.

Contingency tables are built based on decision $$decision\_value \geq threshold$$.

The first contingency table corresponds with a (potentially unseen) threshold that yields all negatives.

>>> y = [0, 0, 0, 0, 1, 1, 1, 1]
>>> d = [2, 2, 1, 1, 1, 2, 3, 3]
>>> tables, thresholds = contingency_tables(y, d, 1)
>>> print(tables)
[(0, 0, 4, 4), (2, 0, 4, 2), (3, 2, 2, 1), (4, 4, 0, 0)]
>>> print(thresholds)
[None, 3, 2, 1]

optunity.metrics.error_rate(y, yhat)[source]

Returns the error rate (lower is better).

Parameters: y – true function values yhat – predicted function values
>>> error_rate([0,0,1,1], [0,0,0,1])
0.25

optunity.metrics.fbeta(y, yhat, beta, positive=True)[source]

Returns the $$F_\beta$$-score.

Parameters: y – true function values yhat – predicted function values beta (float (positive)) – the value for beta to be used positive – the positive label $(1 + \beta^2)\frac{cdot precision\cdot recall}{(\beta^2 * precision)+recall}$
optunity.metrics.logloss(y, yhat)[source]

Returns the log loss between labels and predictions.

Parameters: y – true function values yhat – predicted function values $-\frac{1}{n}\sum_{i=1}^n\big[y \times \log \hat{y}+(1-y) \times \log (1-\hat{y})\big]$

y must be a binary vector, e.g. elements in {True, False} yhat must be a vector of probabilities, e.g. elements in [0, 1]

Lower is better.

Note

This loss function should only be used for probabilistic models.

optunity.metrics.mse(y, yhat)[source]

Returns the mean squared error between y and yhat.

Parameters: y – true function values yhat – predicted function values $\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]$

Lower is better.

>>> mse([0, 0], [2, 3])
6.5

optunity.metrics.npv(y, yhat, positive=True)[source]

Returns the negative predictive value (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label number of true negative predictions / number of negative predictions
optunity.metrics.pr_auc(ys, yhat, positive=True, presorted=False, return_curve=False)[source]

Computes the area under the precision-recall curve (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted return_curve (bool) – whether or not the curve should be returned
>>> pr_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0

>>> round(pr_auc([0,0,1,1], [0,1,1,2], 1), 2)
0.92


Note

Precision is undefined at recall = 0. In this case, we set precision equal to the precision that was obtained at the lowest non-zero recall.

optunity.metrics.precision(y, yhat, positive=True)[source]

Returns the precision (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label number of true positive predictions / number of positive predictions
optunity.metrics.pu_score(y, yhat)[source]

Returns a score used for PU learning as introduced in [LEE2003].

Parameters: y – true function values yhat – predicted function values $\frac{P(\hat{y}=1 | y=1)^2}{P(\hat{y}=1)}$

y and yhat must be boolean vectors.

Higher is better.

 [LEE2003] Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003.
optunity.metrics.r_squared(y, yhat)[source]

Returns the R squared statistic, also known as coefficient of determination (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label $R^2 = 1-\frac{SS_{res}}{SS_{tot}} = 1-\frac{\sum_i (y_i - yhat_i)^2}{\sum_i (y_i - mean(y))^2}$
optunity.metrics.recall(y, yhat, positive=True)[source]

Returns the recall (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label number of true positive predictions / number of true positives
optunity.metrics.roc_auc(ys, yhat, positive=True, presorted=False, return_curve=False)[source]

Computes the area under the receiver operating characteristic curve (higher is better).

Parameters: y – true function values yhat – predicted function values positive – the positive label presorted (bool) – whether or not ys and yhat are already sorted return_curve (bool) – whether or not the curve should be returned
>>> roc_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0

>>> roc_auc([0,0,1,1], [0,1,1,2], 1)
0.875