Hyperparameter Tuning in CNNs - Grid Search and Random Search

Dec 1, 2023

Convolutional Neural Networks (CNNs) are powerful models for tasks like image classification and object detection. However, their performance heavily depends on the choice of hyperparameters, such as learning rate, batch size, number of filters, and kernel size. Hyperparameter tuning is crucial for optimizing model performance, but the process can be time-consuming and computationally expensive.

In this article, we’ll explore common methods for hyperparameter tuning in CNNs, including Grid Search, Random Search, and using Keras Tuner. We’ll also provide practical examples and discuss real-world applications.

Key Hyperparameters in CNNs
Grid Search
Random Search
Keras Tuner
Best Practices for Hyperparameter Tuning
Summary of Hyperparameter Tuning Techniques
Conclusion

1. Key Hyperparameters in CNNs

Before diving into tuning techniques, it’s important to understand some key hyperparameters in CNNs that significantly impact model performance:

Learning Rate $\alpha$ : Controls the step size of the optimizer during training.
Batch Size: Determines the number of samples per gradient update.
Number of Filters: Specifies the number of filters in convolutional layers, controlling the depth of feature extraction.
Kernel Size: Defines the size of the convolutional filter used to extract features.
Dropout Rate $p$ : The percentage of neurons dropped during training to prevent overfitting.
Number of Epochs: The number of times the entire dataset passes through the model during training.

2. Grid Search

Grid Search is an exhaustive search technique that evaluates the performance of a CNN model for every combination of hyperparameter values in a specified range. It is a brute-force approach that tries all possible values, providing a comprehensive overview of the model’s performance across the hyperparameter space.

Formula for Grid Search:

In Grid Search, the number of model evaluations is given by:

N = n_1 \times n_2 \times n_3 \times ... \times n_k

Where:

$n_i$ represents the number of values to search for the $i$ -th hyperparameter, and
$k$ is the total number of hyperparameters.

Why Grid Search is Not Suitable for Large Hyperparameter Spaces: Grid Search becomes inefficient as the number of hyperparameters increases, leading to an exponential increase in the number of model evaluations. For example, tuning five hyperparameters, each with ten possible values, would require 100,000 evaluations, making it impractical for large datasets or complex models.

Example:

Let’s tune the learning rate, batch size, and dropout rate of a CNN using Grid Search:

from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

def build_model(learning_rate, dropout_rate, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Define hyperparameter grid
param_grid = {
    'learning_rate': [0.001, 0.01, 0.1],
    'dropout_rate': [0.3, 0.5],
    'batch_size': [32, 64]
}

model = KerasClassifier(build_fn=build_model, epochs=10)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(x_train, y_train)

# Best hyperparameters
print(f"Best params: {grid_result.best_params_}")

Why These Specific Hyperparameter Values?
The values for learning rate (0.001, 0.01, 0.1) are common starting points in practice, representing low, medium, and high rates. The dropout rates (0.3, 0.5) are frequently used for regularization, and the batch sizes (32, 64) are practical choices for many CNN models.

Real-World Example:

Grid Search is suitable when working with smaller datasets where computational resources are not a limiting factor. For example, when tuning CNNs on the MNIST dataset, where the dataset is small, and model training is relatively fast, Grid Search can be used to find optimal hyperparameters.

3. Random Search

Random Search is a more efficient alternative to Grid Search. Instead of exhaustively searching every combination, Random Search samples random combinations of hyperparameters within the specified range. It is often more effective than Grid Search, especially when the hyperparameter space is large.

Formula for Random Search:

The number of model evaluations in Random Search is:

N = m

Where $m$ is the number of random combinations sampled. Unlike Grid Search, $N$ does not depend on the size of the hyperparameter grid.

Why Random Search is More Efficient:
Random Search is more efficient because it does not evaluate every possible combination of hyperparameters. Instead, it explores a wide range of hyperparameters by sampling random values. Research shows that Random Search can achieve similar or better results than Grid Search with fewer model evaluations.

Example:

Let’s use Random Search to tune the learning rate, dropout rate, and batch size of a CNN:

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

# Define the parameter distribution for random search
param_dist = {
    'learning_rate': uniform(0.001, 0.1),
    'dropout_rate': uniform(0.3, 0.7),
    'batch_size': [32, 64, 128]
}

# Randomized search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, cv=3)
random_result = random_search.fit(x_train, y_train)

# Best hyperparameters
print(f"Best params: {random_result.best_params_}")

Why These Specific Hyperparameter Distributions?
The uniform distribution for learning rate and dropout rate allows Random Search to sample values within the entire range, increasing the chances of finding the optimal values. The batch size values are typical choices for many deep learning models.

Real-World Example:

Random Search is particularly useful for larger datasets, such as CIFAR-10 or ImageNet, where training CNN models can take hours or days. By sampling a subset of hyperparameter combinations, Random Search can lead to good results in a shorter amount of time.

4. Keras Tuner

Keras Tuner is a specialized library for hyperparameter tuning in Keras models. It allows for easy implementation of both Grid Search and Random Search, as well as Bayesian Optimization. It automates the tuning process and provides a user-friendly interface.

Bayesian Optimization vs. Random Search:
Bayesian Optimization builds a probabilistic model of the objective function and uses this model to select hyperparameters that are more likely to improve model performance. Unlike Random Search, which samples hyperparameters randomly, Bayesian Optimization uses prior knowledge from previous evaluations to inform the search, making it more efficient in many cases.

Example:

Let’s use Keras Tuner to perform hyperparameter tuning on a CNN:

import keras_tuner as kt

def build_model(hp):
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(hp.Float('dropout_rate', min_value=0.3, max_value=0.7, step=0.1)),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer=tf.keras.optimizers.Adam(hp.Choice('learning_rate', values=[0.001, 0.01, 0.1])),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Instantiate a tuner
tuner = kt.RandomSearch(build_model, objective='val_accuracy', max_trials=5)

# Perform tuning
tuner.search(x_train, y_train, epochs=10, validation_data=(x_val, y_val))

# Get the best model
best_model = tuner.get_best_models(num_models=1)[0]

Why These Specific Hyperparameter Ranges?
The dropout rate range (0.3 to 0.7) reflects typical regularization strengths, while the learning rate choices

(0.001, 0.01, 0.1) span a common range for many CNN optimizations.

Real-World Example:

Keras Tuner is often used for hyperparameter tuning in deep learning competitions, such as those on Kaggle, where finding the right combination of hyperparameters can make a significant difference in model performance. Its flexibility allows for faster experimentation and automated fine-tuning, helping data scientists save time while optimizing their models.

5. Best Practices for Hyperparameter Tuning

Start with Random Search: For large models and datasets, Random Search can quickly narrow down the hyperparameter space before performing Grid Search for fine-tuning.
Use Cross-Validation: Always use cross-validation to evaluate the performance of different hyperparameter combinations. This helps avoid overfitting to the training set.
Optimize Key Hyperparameters First: Focus on tuning critical hyperparameters, such as the learning rate and batch size, before adjusting less impactful ones like dropout rate or kernel size.
Monitor Early Stopping: Implement early stopping to prevent overfitting during the tuning process. This will save time and prevent overfitting by halting training when performance stops improving on the validation set.

Summary of Hyperparameter Tuning Techniques

Technique	Strengths	Weaknesses	Best Use Cases
Grid Search	Exhaustive search ensures the best combination is found within the specified range; simple to implement and interpret.	Computationally expensive for large hyperparameter spaces; slow with large datasets or models.	Suitable for small datasets and simple models; fine-tuning after random search.
Random Search	More efficient than Grid Search for large hyperparameter spaces; finds good hyperparameters with fewer evaluations.	May miss the optimal combination; requires limiting the number of iterations.	Ideal for large datasets and complex models; useful when quick results are needed or when resources are limited.
Keras Tuner (Random Search + Bayesian Optimization)	Automates tuning with flexibility; supports random search and efficient Bayesian optimization; user-friendly interface with tracking and visualization.	More complex to set up; Bayesian optimization may be slower for certain models.	Great for deep learning models with complex architectures; ideal when tuning needs automation and fine-tuning over multiple trials.

7. Conclusion

Hyperparameter tuning is a crucial step in optimizing CNNs for real-world applications. Techniques like Grid Search, Random Search, and Keras Tuner offer different trade-offs between thoroughness and efficiency.

Keras Tuner adds flexibility and ease to the tuning process by automating the search and offering both random and Bayesian optimization options. This makes it especially beneficial in situations where time and computational resources are limited. The built-in visualization tools in Keras Tuner also allow for easier tracking of model performance across different hyperparameter combinations.

By using Random Search to quickly explore a large hyperparameter space and Grid Search for fine-tuning, you can optimize your CNN for tasks like image classification, object detection, and more. Keras Tuner adds flexibility to this process, making it a valuable tool for deep learning practitioners.

Hyperparameter Tuning in CNNs - Grid Search and Random Search

Table of Contents

1. Key Hyperparameters in CNNs

2. Grid Search

Formula for Grid Search:

Example:

Real-World Example:

3. Random Search

Formula for Random Search:

Example:

Real-World Example:

4. Keras Tuner

Example:

Real-World Example:

5. Best Practices for Hyperparameter Tuning

Summary of Hyperparameter Tuning Techniques

7. Conclusion