Quality metrics¶
-
optunity.metrics.
absolute_error
(y, yhat)[source]¶ Returns the maximal absolute error between y and yhat.
Parameters: - y – true function values
- yhat – predicted function values
Lower is better.
>>> absolute_error([0,1,2,3], [0,0,1,1]) 2.0
-
optunity.metrics.
accuracy
(y, yhat)[source]¶ Returns the accuracy. Higher is better.
Parameters: - y – true function values
- yhat – predicted function values
-
optunity.metrics.
auc
(curve)[source]¶ Computes the area under the specified curve.
Parameters: curve ([(x, y), ..]) – a curve, specified as a list of (x, y) tuples See also
optunity.score_functions.compute_curve()
-
optunity.metrics.
brier
(y, yhat, positive=True)[source]¶ Returns the Brier score between y and yhat.
Parameters: - y – true function values
- yhat – predicted function values
Returns: \[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]yhat must be a vector of probabilities, e.g. elements in [0, 1]
Lower is better.
Note
This loss function should only be used for probabilistic models.
-
optunity.metrics.
compute_curve
(ys, decision_values, xfun, yfun, positive=True, presorted=False)[source]¶ Computes a curve based on contingency tables at different decision values.
Parameters: - ys (iterable) – true labels
- decision_values (iterable) – decision values
- positive – positive label
- xfun (callable) – function to compute x values, based on contingency tables
- yfun (callable) – function to compute y values, based on contingency tables
- presorted (bool) – whether or not ys and yhat are already sorted
Returns: the resulting curve, as a list of (x, y)-tuples
-
optunity.metrics.
contingency_table
(ys, yhats, positive=True)[source]¶ Computes a contingency table for given predictions.
Parameters: - ys (iterable) – true labels
- yhats (iterable) – predicted labels
- positive – the positive label
Returns: TP, FP, TN, FN
>>> ys = [True, True, True, True, True, False] >>> yhats = [True, True, False, False, False, True] >>> tab = contingency_table(ys, yhats, 1) >>> print(tab) (2, 1, 0, 3)
-
optunity.metrics.
contingency_tables
(ys, decision_values, positive=True, presorted=False)[source]¶ Computes contingency tables for every unique decision value.
Parameters: - ys (iterable) – true labels
- decision_values (iterable) – decision values (higher = stronger positive)
- positive – the positive label
- presorted (bool) – whether or not ys and yhat are already sorted
Returns: a list of contingency tables (TP, FP, TN, FN) and the corresponding thresholds.
Contingency tables are built based on decision \(decision\_value \geq threshold\).
The first contingency table corresponds with a (potentially unseen) threshold that yields all negatives.
>>> y = [0, 0, 0, 0, 1, 1, 1, 1] >>> d = [2, 2, 1, 1, 1, 2, 3, 3] >>> tables, thresholds = contingency_tables(y, d, 1) >>> print(tables) [(0, 0, 4, 4), (2, 0, 4, 2), (3, 2, 2, 1), (4, 4, 0, 0)] >>> print(thresholds) [None, 3, 2, 1]
-
optunity.metrics.
error_rate
(y, yhat)[source]¶ Returns the error rate (lower is better).
Parameters: - y – true function values
- yhat – predicted function values
>>> error_rate([0,0,1,1], [0,0,0,1]) 0.25
-
optunity.metrics.
fbeta
(y, yhat, beta, positive=True)[source]¶ Returns the \(F_\beta\)-score.
Parameters: - y – true function values
- yhat – predicted function values
- beta (float (positive)) – the value for beta to be used
- positive – the positive label
Returns: \[(1 + \beta^2)\frac{cdot precision\cdot recall}{(\beta^2 * precision)+recall}\]
-
optunity.metrics.
logloss
(y, yhat)[source]¶ Returns the log loss between labels and predictions.
Parameters: - y – true function values
- yhat – predicted function values
Returns: \[-\frac{1}{n}\sum_{i=1}^n\big[y \times \log \hat{y}+(1-y) \times \log (1-\hat{y})\big]\]y must be a binary vector, e.g. elements in {True, False} yhat must be a vector of probabilities, e.g. elements in [0, 1]
Lower is better.
Note
This loss function should only be used for probabilistic models.
-
optunity.metrics.
mse
(y, yhat)[source]¶ Returns the mean squared error between y and yhat.
Parameters: - y – true function values
- yhat – predicted function values
Returns: \[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]Lower is better.
>>> mse([0, 0], [2, 3]) 6.5
-
optunity.metrics.
npv
(y, yhat, positive=True)[source]¶ Returns the negative predictive value (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
Returns: number of true negative predictions / number of negative predictions
-
optunity.metrics.
pr_auc
(ys, yhat, positive=True, presorted=False, return_curve=False)[source]¶ Computes the area under the precision-recall curve (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
- presorted (bool) – whether or not ys and yhat are already sorted
- return_curve (bool) – whether or not the curve should be returned
>>> pr_auc([0, 0, 1, 1], [0, 0, 1, 1], 1) 1.0
>>> round(pr_auc([0,0,1,1], [0,1,1,2], 1), 2) 0.92
Note
Precision is undefined at recall = 0. In this case, we set precision equal to the precision that was obtained at the lowest non-zero recall.
-
optunity.metrics.
precision
(y, yhat, positive=True)[source]¶ Returns the precision (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
Returns: number of true positive predictions / number of positive predictions
-
optunity.metrics.
pu_score
(y, yhat)[source]¶ Returns a score used for PU learning as introduced in [LEE2003].
Parameters: - y – true function values
- yhat – predicted function values
Returns: \[\frac{P(\hat{y}=1 | y=1)^2}{P(\hat{y}=1)}\]y and yhat must be boolean vectors.
Higher is better.
[LEE2003] Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003.
-
optunity.metrics.
r_squared
(y, yhat)[source]¶ Returns the R squared statistic, also known as coefficient of determination (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
Returns: \[R^2 = 1-\frac{SS_{res}}{SS_{tot}} = 1-\frac{\sum_i (y_i - yhat_i)^2}{\sum_i (y_i - mean(y))^2}\]
-
optunity.metrics.
recall
(y, yhat, positive=True)[source]¶ Returns the recall (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
Returns: number of true positive predictions / number of true positives
-
optunity.metrics.
roc_auc
(ys, yhat, positive=True, presorted=False, return_curve=False)[source]¶ Computes the area under the receiver operating characteristic curve (higher is better).
Parameters: - y – true function values
- yhat – predicted function values
- positive – the positive label
- presorted (bool) – whether or not ys and yhat are already sorted
- return_curve (bool) – whether or not the curve should be returned
>>> roc_auc([0, 0, 1, 1], [0, 0, 1, 1], 1) 1.0
>>> roc_auc([0,0,1,1], [0,1,1,2], 1) 0.875