sklearn: SVM classification¶

In this example we will use Optunity to optimize hyperparameters for a support vector machine classifier (SVC) in scikit-learn. We will learn a model to distinguish digits 8 and 9 in the MNIST data set in two settings

tune SVM with RBF kernel
tune SVM with RBF, polynomial or linear kernel, that is choose the kernel function and its hyperparameters at once

import optunity
import optunity.metrics

# comment this line if you are running the notebook
import sklearn.svm
import numpy as np

Create the data set: we use the MNIST data set and will build models to distinguish digits 8 and 9.

from sklearn.datasets import load_digits
digits = load_digits()
n = digits.data.shape[0]

positive_digit = 8
negative_digit = 9

positive_idx = [i for i in range(n) if digits.target[i] == positive_digit]
negative_idx = [i for i in range(n) if digits.target[i] == negative_digit]

# add some noise to the data to make it a little challenging
original_data = digits.data[positive_idx + negative_idx, ...]
data = original_data + 5 * np.random.randn(original_data.shape[0], original_data.shape[1])
labels = [True] * len(positive_idx) + [False] * len(negative_idx)

First, lets see the performance of an SVC with default hyperparameters.

# compute area under ROC curve of default parameters
@optunity.cross_validated(x=data, y=labels, num_folds=5)
def svm_default_auroc(x_train, y_train, x_test, y_test):
    model = sklearn.svm.SVC().fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_default_auroc()

0.7328666183635757

Tune SVC with RBF kernel¶

In order to use Optunity to optimize hyperparameters, we start by defining the objective function. We will use 5-fold cross-validated area under the ROC curve. For now, lets restrict ourselves to the RBF kernel and optimize \(C\) and \(\gamma\).

We start by defining the objective function svm_rbf_tuned_auroc(), which accepts \(C\) and \(\gamma\) as arguments.

#we will make the cross-validation decorator once, so we can reuse it later for the other tuning task
# by reusing the decorator, we get the same folds etc.
cv_decorator = optunity.cross_validated(x=data, y=labels, num_folds=5)

def svm_rbf_tuned_auroc(x_train, y_train, x_test, y_test, C, logGamma):
    model = sklearn.svm.SVC(C=C, gamma=10 ** logGamma).fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_rbf_tuned_auroc = cv_decorator(svm_rbf_tuned_auroc)
# this is equivalent to the more common syntax below
# @optunity.cross_validated(x=data, y=labels, num_folds=5)
# def svm_rbf_tuned_auroc...

svm_rbf_tuned_auroc(C=1.0, logGamma=0.0)

0.5

Now we can use Optunity to find the hyperparameters that maximize AUROC.

optimal_rbf_pars, info, _ = optunity.maximize(svm_rbf_tuned_auroc, num_evals=150, C=[0, 10], logGamma=[-5, 0])
# when running this outside of IPython we can parallelize via optunity.pmap
# optimal_rbf_pars, _, _ = optunity.maximize(svm_rbf_tuned_auroc, 150, C=[0, 10], gamma=[0, 0.1], pmap=optunity.pmap)

print("Optimal parameters: " + str(optimal_rbf_pars))
print("AUROC of tuned SVM with RBF kernel: %1.3f" % info.optimum)

Optimal parameters: {'logGamma': -3.0716796875000005, 'C': 3.3025997497032007}
AUROC of tuned SVM with RBF kernel: 0.987

We can turn the call log into a pandas dataframe to efficiently inspect the solver trace.

import pandas
df = optunity.call_log2dataframe(info.call_log)

Lets look at the best 20 sets of hyperparameters to make sure the results are somewhat stable.

df.sort('value', ascending=False)[:10]

	C	logGamma	value
149	3.822811	-3.074680	0.987413
92	3.302600	-3.071680	0.987413
145	3.259690	-3.033531	0.987252
14	3.542839	-3.080013	0.987237
131	3.232732	-3.080968	0.987237
53	7.328411	-3.103471	0.987237
70	3.632562	-3.088346	0.987237
146	3.067660	-3.091143	0.987237
124	2.566381	-3.114649	0.987237
100	3.340268	-3.092535	0.987237

Tune SVC without deciding the kernel in advance¶

In the previous part we choose to use an RBF kernel. Even though the RBF kernel is known to work well for a large variety of problems (and yielded good accuracy here), our choice was somewhat arbitrary.

We will now use Optunity’s conditional hyperparameter optimization feature to optimize over all kernel functions and their associated hyperparameters at once. This requires us to define the search space.

space = {'kernel': {'linear': {'C': [0, 2]},
                    'rbf': {'logGamma': [-5, 0], 'C': [0, 10]},
                    'poly': {'degree': [2, 5], 'C': [0, 5], 'coef0': [0, 2]}
                    }
         }

We will also have to modify the objective function to cope with conditional hyperparameters. The reason we need to do this explicitly is because scikit-learn doesn’t like dealing with None values for irrelevant hyperparameters (e.g. degree when using an RBF kernel). Optunity will set all irrelevant hyperparameters in a given set to None.

def train_model(x_train, y_train, kernel, C, logGamma, degree, coef0):
    """A generic SVM training function, with arguments based on the chosen kernel."""
    if kernel == 'linear':
        model = sklearn.svm.SVC(kernel=kernel, C=C)
    elif kernel == 'poly':
        model = sklearn.svm.SVC(kernel=kernel, C=C, degree=degree, coef0=coef0)
    elif kernel == 'rbf':
        model = sklearn.svm.SVC(kernel=kernel, C=C, gamma=10 ** logGamma)
    else:
        raise ArgumentError("Unknown kernel function: %s" % kernel)
    model.fit(x_train, y_train)
    return model

def svm_tuned_auroc(x_train, y_train, x_test, y_test, kernel='linear', C=0, logGamma=0, degree=0, coef0=0):
    model = train_model(x_train, y_train, kernel, C, logGamma, degree, coef0)
    decision_values = model.decision_function(x_test)
    return optunity.metrics.roc_auc(y_test, decision_values)

svm_tuned_auroc = cv_decorator(svm_tuned_auroc)

Now we are ready to go and optimize both kernel function and associated hyperparameters!

optimal_svm_pars, info, _ = optunity.maximize_structured(svm_tuned_auroc, space, num_evals=150)
print("Optimal parameters" + str(optimal_svm_pars))
print("AUROC of tuned SVM: %1.3f" % info.optimum)

Optimal parameters{'kernel': 'rbf', 'C': 3.634209495387873, 'coef0': None, 'degree': None, 'logGamma': -3.6018043228483627}
AUROC of tuned SVM: 0.990

Again, we can have a look at the best sets of hyperparameters based on the call log.

df = optunity.call_log2dataframe(info.call_log)
df.sort('value', ascending=False)

	C	coef0	degree	kernel	logGamma	value
147	3.806445	NaN	NaN	rbf	-3.594290	0.990134
124	3.634209	NaN	NaN	rbf	-3.601804	0.990134
144	4.350397	NaN	NaN	rbf	-3.539531	0.990128
82	5.998112	NaN	NaN	rbf	-3.611495	0.989975
75	2.245622	NaN	NaN	rbf	-3.392871	0.989965
139	4.462613	NaN	NaN	rbf	-3.391728	0.989965
111	2.832370	NaN	NaN	rbf	-3.384538	0.989965
92	5.531445	NaN	NaN	rbf	-3.378162	0.989965
121	3.299037	NaN	NaN	rbf	-3.617871	0.989818
99	2.812451	NaN	NaN	rbf	-3.547038	0.989810
129	4.212451	NaN	NaN	rbf	-3.518478	0.989809
135	3.921212	NaN	NaN	rbf	-3.422389	0.989800
90	3.050174	NaN	NaN	rbf	-3.431659	0.989800
103	3.181445	NaN	NaN	rbf	-3.525796	0.989650
93	2.714779	NaN	NaN	rbf	-3.292463	0.989641
89	2.345784	NaN	NaN	rbf	-3.313704	0.989641
149	3.995946	NaN	NaN	rbf	-3.303042	0.989641
100	3.516840	NaN	NaN	rbf	-3.664992	0.989500
119	3.745784	NaN	NaN	rbf	-3.678403	0.989500
125	4.387879	NaN	NaN	rbf	-3.486348	0.989485
24	1.914779	NaN	NaN	rbf	-3.476204	0.989484
136	5.865572	NaN	NaN	rbf	-3.226204	0.989483
80	2.583507	NaN	NaN	rbf	-3.198326	0.989482
146	5.398905	NaN	NaN	rbf	-3.459538	0.989325
102	5.558878	NaN	NaN	rbf	-3.467218	0.989325
108	2.721828	NaN	NaN	rbf	-3.463704	0.989325
98	2.255162	NaN	NaN	rbf	-3.230371	0.989324
64	1.686680	NaN	NaN	rbf	-3.240209	0.989320
140	3.965939	NaN	NaN	rbf	-3.241095	0.989320
34	2.381445	NaN	NaN	rbf	-3.242871	0.989320
...	...	...	...	...	...	...
68	1.608145	NaN	NaN	rbf	-2.530371	0.979475
106	5.681445	NaN	NaN	rbf	-2.526204	0.979156
50	1.477928	NaN	NaN	rbf	-2.498326	0.977076
35	2.081445	NaN	NaN	rbf	-2.459538	0.974526
15	3.014779	NaN	NaN	rbf	-2.459538	0.974526
71	1.464779	NaN	NaN	rbf	-2.451204	0.973405
49	2.239779	NaN	NaN	rbf	-2.380371	0.969723
9	4.106445	NaN	NaN	rbf	-2.380371	0.969723
53	3.648112	NaN	NaN	rbf	-2.359129	0.968756
17	0.131419	NaN	NaN	linear	NaN	0.967925
6	1.913086	NaN	NaN	linear	NaN	0.967925
26	1.726419	NaN	NaN	linear	NaN	0.967925
7	0.038086	NaN	NaN	linear	NaN	0.967925
27	0.224753	NaN	NaN	linear	NaN	0.967925
16	1.819753	NaN	NaN	linear	NaN	0.967925
37	0.318086	NaN	NaN	linear	NaN	0.967925
58	2.074811	NaN	NaN	rbf	-2.297038	0.964444
61	1.931445	NaN	NaN	rbf	-2.217871	0.960290
19	3.639779	NaN	NaN	rbf	-2.147038	0.958086
39	2.706445	NaN	NaN	rbf	-2.147038	0.958086
43	4.114779	NaN	NaN	rbf	-2.125796	0.957296
48	2.541478	NaN	NaN	rbf	-2.063704	0.954737
51	2.398112	NaN	NaN	rbf	-1.984538	0.951944
29	3.173112	NaN	NaN	rbf	-1.913704	0.942719
41	2.864779	NaN	NaN	rbf	-1.751204	0.634160
11	4.264779	NaN	NaN	rbf	-1.051204	0.500000
31	3.331445	NaN	NaN	rbf	-1.517871	0.500000
1	4.731445	NaN	NaN	rbf	-0.817871	0.500000
8	1.606445	NaN	NaN	rbf	-1.130371	0.500000
21	3.798112	NaN	NaN	rbf	-1.284538	0.500000

150 rows × 6 columns