One of our most important tasks as machine learning experts is optimising models. Especially for artificial neural networks, this is a task that can seem very confusing. In the following weeks, we will provide an insight into our working methods here.

Optimizing a neural network is still more art than science, even with the latest technology. This means that there are no "recipes" that can perfectly determine the structure of a neural network. Skilled data scientists possess a wealth of experience in model selection, parameterization, and training. Advances in these areas are often achieved by academic and commercial research institutions. However, breakthroughs, aside from the discovery of completely new algorithms, models, or techniques, are typically accomplished by modifying, expanding, and recombining existing structures. New, improved, and more efficient network architectures emerge from empirical insights gained through large-scale experiments. Many of the techniques used to improve neural networks today are applicable even to non-scientists. In this blog post, we will explore a relatively simple optimization method in more detail.

Hyperparameter Optimization using Grid Search

In machine learning, two types of parameters are distinguished. First, there are the parameters that the machine learning model learns on the given dataset, such as the weights in a neural network. Second, there are the so-called hyperparameters.

Hyperparameters are parameters that are passed to the model before the selected approach is carried out. They define the properties of the selected model. These can be, for example, different activation functions or the number of hidden layers in a neural network.

Hyperparameters have an extreme influence on the training of a neural network. Above all, computing time, memory requirements and network accuracy can be influenced in this way. By training runs with different configurations, one can find an optimal selection of hyperparameters. Repeated training runs often take a lot of time, which is of course particularly interesting if this time has to be paid for. However, many findings can also be transferred to other, similar problems. To illustrate this, let us optimise a concrete network. This network will ultimately be used to predict breast cancer on the Breast Cancer Wisconsin (Diagnostic) Data Set. (Source: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)

In the data set, various values are given to determine breast cancer, as well as the diagnosis of whether it is actually breast cancer or not. With the help of a neural network, we now want to predict cancer using the given properties of the data set. To do this, we split the dataset into training and test data to test our model only with data it has never seen. Then we determine which training run will be performed with which hyperparameters. Parameters that are variable here, and thus span our "grid" in five dimensions, are

Number of nodes in the hidden layer: 1, 32, 64

Activation functions: linear, relu

Dropout percentage in the hidden layer: 0, 0.3, 0.6

Batch size: 4, 16, 32

Learning Rate: 0.01, 0.1, 0.5, 1

It is easier to imagine this grid in two-dimensional space. So we take only the first two parameters. The parameter space from batch size and dropout would then look like this:

Implementing Gridsearch with Keras

For our example, a simple neural network should suffice. We write it with the help of Keras, a high-level API for neural networks.

```
def get_model(hidden_layer_units=32, activation='relu', learning_rate=0.01, dropout_value=0) -> Model:
input_layer = layers.Input(shape=(config.FEATURE_DIMENSIONALITY,))
hidden_layer = layers.Dropout(rate=dropout_value)(input_layer)
hidden_layer = layers.Dense(units=hidden_layer_units, activation=activation)(hidden_layer)
output_layer = layers.Dense(units=config.NUM_CLASSES, activation='softmax')(hidden_layer)
model = engine.Model(inputs=[input_layer], outputs=[output_layer])
optimizer = SGD(lr=learning_rate)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
return model
```

In Python code, Gridsearch can be implemented as follows:

```
# unsere Parameter
grid = [ACTIVATIONS, BATCH_SIZE, DROPOUT_VALUES, LEARNING_RATE, UNIT_LIST]
# hier werden Kombinationen aus den Parametern erstellt
params = itertools.product(*grid)
features, labels, weights = get_data()
# Für alle Parameterkombinationen...
for i, current_params in enumerate(params):
# Für reproduzierbare Ergebnisse müssen random seeds gesetzt werden
np.random.seed = 42
random.seed = 42
tensorflow.set_random_seed = 42
activation, batch_size, dropout_value, learning_rate, num_hidden_units = current_params
# Hier wird das Model gebaut
model = get_model(
num_hidden_units,
activation,
learning_rate,
dropout_value
)
# wir speichern die initialen Gewichte (Parameter),
# um nicht ein komplett neues Modell bauen zu müssen,
# was recht lange dauern kann
initial_weights = model.get_weights()
# der Trainingslauf wird wiederholt, um ein zuverlässigeres
# Ergebnis zu erhalten
for _ in range(repetitions):
# Tatsächliches Training des Netzes
model.fit(
features,
labels,
batch_size=batch_size,
epochs=25,
validation_split=.2,
callbacks=[custom_callback],
class_weight=weights
)
# Modell zurücksetzen
model.set_weights(initial_weights)
```

Simpler gridsearch with scikit-learn

This self-written process can also be achieved more quickly with the help of scikit-learn. To do this, however, the neural network must be transformed into a sklearn estimator and the hyperparameters must be in a certain form. Fortunately, this is very simple: from keras.wrappers.scikit_learn import KerasClassifier sklearn_estimator = KerasClassifier( get_model, num_hidden_units, activation, learning_rate, dropout_value ) sklearn_hyper_params = { batch_size=BATCH_SIZE }

With this estimator, sklearn can now be used to search for the best hyperparameter configuration. More detailed information can be found in the sklearn documentation.

```
sklearn_gridsearch = GridSearchCV(
estimator=sklearn_estimator,
param_grid=sklearn_hyper_params,
n_jobs=1,
cv=3
)
search_result = sklearn_gridsearch.fit(features, labels)
# beste Kombination ist hier zu finden
best_params = search_result.best_params_
```

The parameter cv indicates the number of cross validation folds, a technique that can be used to make more reliable statements about the accuracy of trained models. We will publish a detailed article about this on this website in the near future.

Visualising the results of a gridsearch

For each configuration, i.e. each node on this grid, the model is trained and the error value, or the accuracy, is observed. For our example, we can now plot this data. This data becomes clearer if two hyperparameters are always allowed to vary and the others are fixed. In the picture below, batch size and dropout are plotted against each other.

This plot allows the conclusion that for our example, the choice of dropout strongly influences the final accuracy of the model. Likewise, a higher batch size is responsible for making the model more accurate. Of course, your conclusions may differ greatly from ours and depend on the hyperparameters you are mining as well as your specific use case.

Did this article help you? Do you use the Gridsearch concept differently? We look forward to hearing from you at blog@neuroforge.de!

## Comments