sklearn: SVM classification

In this example we will use Optunity to optimize hyperparameters for a support vector machine classifier (SVC) in scikit-learn. We will learn a model to distinguish digits 8 and 9 in the MNIST data set in two settings

  • tune SVM with RBF kernel
  • tune SVM with RBF, polynomial or linear kernel, that is choose the kernel function and its hyperparameters at once
import optunity
import optunity.metrics

# comment this line if you are running the notebook
import sklearn.svm
import numpy as np

Create the data set: we use the MNIST data set and will build models to distinguish digits 8 and 9.

from sklearn.datasets import load_digits
digits = load_digits()
n = digits.data.shape[0]

positive_digit = 8
negative_digit = 9

positive_idx = [i for i in range(n) if digits.target[i] == positive_digit]
negative_idx = [i for i in range(n) if digits.target[i] == negative_digit]

# add some noise to the data to make it a little challenging
original_data = digits.data[positive_idx + negative_idx, ...]
data = original_data + 5 * np.random.randn(original_data.shape[0], original_data.shape[1])
labels = [True] * len(positive_idx) + [False] * len(negative_idx)

First, lets see the performance of an SVC with default hyperparameters.

# compute area under ROC curve of default parameters
@optunity.cross_validated(x=data, y=labels, num_folds=5)
def svm_default_auroc(x_train, y_train, x_test, y_test):
    model = sklearn.svm.SVC().fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_default_auroc()
0.7328666183635757

Tune SVC with RBF kernel

In order to use Optunity to optimize hyperparameters, we start by defining the objective function. We will use 5-fold cross-validated area under the ROC curve. For now, lets restrict ourselves to the RBF kernel and optimize \(C\) and \(\gamma\).

We start by defining the objective function svm_rbf_tuned_auroc(), which accepts \(C\) and \(\gamma\) as arguments.

#we will make the cross-validation decorator once, so we can reuse it later for the other tuning task
# by reusing the decorator, we get the same folds etc.
cv_decorator = optunity.cross_validated(x=data, y=labels, num_folds=5)

def svm_rbf_tuned_auroc(x_train, y_train, x_test, y_test, C, logGamma):
    model = sklearn.svm.SVC(C=C, gamma=10 ** logGamma).fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_rbf_tuned_auroc = cv_decorator(svm_rbf_tuned_auroc)
# this is equivalent to the more common syntax below
# @optunity.cross_validated(x=data, y=labels, num_folds=5)
# def svm_rbf_tuned_auroc...

svm_rbf_tuned_auroc(C=1.0, logGamma=0.0)
0.5

Now we can use Optunity to find the hyperparameters that maximize AUROC.

optimal_rbf_pars, info, _ = optunity.maximize(svm_rbf_tuned_auroc, num_evals=150, C=[0, 10], logGamma=[-5, 0])
# when running this outside of IPython we can parallelize via optunity.pmap
# optimal_rbf_pars, _, _ = optunity.maximize(svm_rbf_tuned_auroc, 150, C=[0, 10], gamma=[0, 0.1], pmap=optunity.pmap)

print("Optimal parameters: " + str(optimal_rbf_pars))
print("AUROC of tuned SVM with RBF kernel: %1.3f" % info.optimum)
Optimal parameters: {'logGamma': -3.0716796875000005, 'C': 3.3025997497032007}
AUROC of tuned SVM with RBF kernel: 0.987

We can turn the call log into a pandas dataframe to efficiently inspect the solver trace.

import pandas
df = optunity.call_log2dataframe(info.call_log)

Lets look at the best 20 sets of hyperparameters to make sure the results are somewhat stable.

df.sort('value', ascending=False)[:10]
C logGamma value
149 3.822811 -3.074680 0.987413
92 3.302600 -3.071680 0.987413
145 3.259690 -3.033531 0.987252
14 3.542839 -3.080013 0.987237
131 3.232732 -3.080968 0.987237
53 7.328411 -3.103471 0.987237
70 3.632562 -3.088346 0.987237
146 3.067660 -3.091143 0.987237
124 2.566381 -3.114649 0.987237
100 3.340268 -3.092535 0.987237

Tune SVC without deciding the kernel in advance

In the previous part we choose to use an RBF kernel. Even though the RBF kernel is known to work well for a large variety of problems (and yielded good accuracy here), our choice was somewhat arbitrary.

We will now use Optunity’s conditional hyperparameter optimization feature to optimize over all kernel functions and their associated hyperparameters at once. This requires us to define the search space.

space = {'kernel': {'linear': {'C': [0, 2]},
                    'rbf': {'logGamma': [-5, 0], 'C': [0, 10]},
                    'poly': {'degree': [2, 5], 'C': [0, 5], 'coef0': [0, 2]}
                    }
         }

We will also have to modify the objective function to cope with conditional hyperparameters. The reason we need to do this explicitly is because scikit-learn doesn’t like dealing with None values for irrelevant hyperparameters (e.g. degree when using an RBF kernel). Optunity will set all irrelevant hyperparameters in a given set to None.

def train_model(x_train, y_train, kernel, C, logGamma, degree, coef0):
    """A generic SVM training function, with arguments based on the chosen kernel."""
    if kernel == 'linear':
        model = sklearn.svm.SVC(kernel=kernel, C=C)
    elif kernel == 'poly':
        model = sklearn.svm.SVC(kernel=kernel, C=C, degree=degree, coef0=coef0)
    elif kernel == 'rbf':
        model = sklearn.svm.SVC(kernel=kernel, C=C, gamma=10 ** logGamma)
    else:
        raise ArgumentError("Unknown kernel function: %s" % kernel)
    model.fit(x_train, y_train)
    return model

def svm_tuned_auroc(x_train, y_train, x_test, y_test, kernel='linear', C=0, logGamma=0, degree=0, coef0=0):
    model = train_model(x_train, y_train, kernel, C, logGamma, degree, coef0)
    decision_values = model.decision_function(x_test)
    return optunity.metrics.roc_auc(y_test, decision_values)

svm_tuned_auroc = cv_decorator(svm_tuned_auroc)

Now we are ready to go and optimize both kernel function and associated hyperparameters!

optimal_svm_pars, info, _ = optunity.maximize_structured(svm_tuned_auroc, space, num_evals=150)
print("Optimal parameters" + str(optimal_svm_pars))
print("AUROC of tuned SVM: %1.3f" % info.optimum)
Optimal parameters{'kernel': 'rbf', 'C': 3.634209495387873, 'coef0': None, 'degree': None, 'logGamma': -3.6018043228483627}
AUROC of tuned SVM: 0.990

Again, we can have a look at the best sets of hyperparameters based on the call log.

df = optunity.call_log2dataframe(info.call_log)
df.sort('value', ascending=False)
C coef0 degree kernel logGamma value
147 3.806445 NaN NaN rbf -3.594290 0.990134
124 3.634209 NaN NaN rbf -3.601804 0.990134
144 4.350397 NaN NaN rbf -3.539531 0.990128
82 5.998112 NaN NaN rbf -3.611495 0.989975
75 2.245622 NaN NaN rbf -3.392871 0.989965
139 4.462613 NaN NaN rbf -3.391728 0.989965
111 2.832370 NaN NaN rbf -3.384538 0.989965
92 5.531445 NaN NaN rbf -3.378162 0.989965
121 3.299037 NaN NaN rbf -3.617871 0.989818
99 2.812451 NaN NaN rbf -3.547038 0.989810
129 4.212451 NaN NaN rbf -3.518478 0.989809
135 3.921212 NaN NaN rbf -3.422389 0.989800
90 3.050174 NaN NaN rbf -3.431659 0.989800
103 3.181445 NaN NaN rbf -3.525796 0.989650
93 2.714779 NaN NaN rbf -3.292463 0.989641
89 2.345784 NaN NaN rbf -3.313704 0.989641
149 3.995946 NaN NaN rbf -3.303042 0.989641
100 3.516840 NaN NaN rbf -3.664992 0.989500
119 3.745784 NaN NaN rbf -3.678403 0.989500
125 4.387879 NaN NaN rbf -3.486348 0.989485
24 1.914779 NaN NaN rbf -3.476204 0.989484
136 5.865572 NaN NaN rbf -3.226204 0.989483
80 2.583507 NaN NaN rbf -3.198326 0.989482
146 5.398905 NaN NaN rbf -3.459538 0.989325
102 5.558878 NaN NaN rbf -3.467218 0.989325
108 2.721828 NaN NaN rbf -3.463704 0.989325
98 2.255162 NaN NaN rbf -3.230371 0.989324
64 1.686680 NaN NaN rbf -3.240209 0.989320
140 3.965939 NaN NaN rbf -3.241095 0.989320
34 2.381445 NaN NaN rbf -3.242871 0.989320
... ... ... ... ... ... ...
68 1.608145 NaN NaN rbf -2.530371 0.979475
106 5.681445 NaN NaN rbf -2.526204 0.979156
50 1.477928 NaN NaN rbf -2.498326 0.977076
35 2.081445 NaN NaN rbf -2.459538 0.974526
15 3.014779 NaN NaN rbf -2.459538 0.974526
71 1.464779 NaN NaN rbf -2.451204 0.973405
49 2.239779 NaN NaN rbf -2.380371 0.969723
9 4.106445 NaN NaN rbf -2.380371 0.969723
53 3.648112 NaN NaN rbf -2.359129 0.968756
17 0.131419 NaN NaN linear NaN 0.967925
6 1.913086 NaN NaN linear NaN 0.967925
26 1.726419 NaN NaN linear NaN 0.967925
7 0.038086 NaN NaN linear NaN 0.967925
27 0.224753 NaN NaN linear NaN 0.967925
16 1.819753 NaN NaN linear NaN 0.967925
37 0.318086 NaN NaN linear NaN 0.967925
58 2.074811 NaN NaN rbf -2.297038 0.964444
61 1.931445 NaN NaN rbf -2.217871 0.960290
19 3.639779 NaN NaN rbf -2.147038 0.958086
39 2.706445 NaN NaN rbf -2.147038 0.958086
43 4.114779 NaN NaN rbf -2.125796 0.957296
48 2.541478 NaN NaN rbf -2.063704 0.954737
51 2.398112 NaN NaN rbf -1.984538 0.951944
29 3.173112 NaN NaN rbf -1.913704 0.942719
41 2.864779 NaN NaN rbf -1.751204 0.634160
11 4.264779 NaN NaN rbf -1.051204 0.500000
31 3.331445 NaN NaN rbf -1.517871 0.500000
1 4.731445 NaN NaN rbf -0.817871 0.500000
8 1.606445 NaN NaN rbf -1.130371 0.500000
21 3.798112 NaN NaN rbf -1.284538 0.500000

150 rows × 6 columns