What is hyperparameter tuning?
When you’re training machine learning models, each dataset and model needs a different set of hyperparameters, which are a kind of variable. The only way to determine these is through multiple experiments, where you pick a set of hyperparameters and run them through your model. This is called hyperparameter tuning. In essence, you're training your model sequentially with different sets of hyperparameters. This process can be manual, or you can pick one of several automated hyperparameter tuning methods.
Whichever method you use, you need to track the results of your experiments. You’ll have to apply some form of statistical analysis, such as the loss function, to determine which set of hyperparameters gives the best result. Hyperparameter tuning is an important and computationally intensive process.
What are hyperparameters?
Hyperparameters are external configuration variables that data scientists use to manage machine learning model training. Sometimes called model hyperparameters, the hyperparameters are manually set before training a model. They're different from parameters, which are internal parameters automatically derived during the learning process and not set by data scientists.
Examples of hyperparameters include the number of nodes and layers in a neural network and the number of branches in a decision tree. Hyperparameters determine key features such as model architecture, learning rate, and model complexity.
How do you identify hyperparameters?
Selecting the right set of hyperparameters is important in terms of model performance and accuracy. Unfortunately, there are no set rules on which hyperparameters work best nor their optimal or default values. You need to experiment to find the optimum hyperparameter set. This activity is known as hyperparameter tuning or hyperparameter optimization.
Why is hyperparameter tuning important?
Hyperparameters directly control model structure, function, and performance. Hyperparameter tuning allows data scientists to tweak model performance for optimal results. This process is an essential part of machine learning, and choosing appropriate hyperparameter values is crucial for success.
For example, assume you're using the learning rate of the model as a hyperparameter. If the value is too high, the model may converge too quickly with suboptimal results. Whereas if the rate is too low, training takes too long and results may not converge. A good and balanced choice of hyperparameters results in accurate models and excellent model performance.
How does hyperparameter tuning work?
As previously stated, hyperparameter tuning can be manual or automated. While manual tuning is slow and tedious, a benefit is that you better understand how hyperparameter weightings affect the model. But in most instances, you would normally use one of the well-known hyperparameter learning algorithms.
The process of hyperparameter tuning is iterative, and you try out different combinations of parameters and values. You generally start by defining a target variable such as accuracy as the primary metric, and you intend to maximize or minimize this variable. It’s a good idea to use cross-validation techniques, so your model isn't centered on a single portion of your data.
What are the hyperparameter tuning techniques?
Numerous hyperparameter tuning algorithms exist, although the most commonly used types are Bayesian optimization, grid search and randomized search.
Bayesian optimization is a technique based on Bayes’ theorem, which describes the probability of an event occurring related to current knowledge. When this is applied to hyperparameter optimization, the algorithm builds a probabilistic model from a set of hyperparameters that optimizes a specific metric. It uses regression analysis to iteratively choose the best set of hyperparameters.
With grid search, you specify a list of hyperparameters and a performance metric, and the algorithm works through all possible combinations to determine the best fit. Grid search works well, but it’s relatively tedious and computationally intensive, especially with large numbers of hyperparameters.
Although based on similar principles as grid search, random search selects groups of hyperparameters randomly on each iteration. It works well when a relatively small number of the hyperparameters primarily determine the model outcome.
What are examples of hyperparameters?
While some hyperparameters are common, in practice you'll find that algorithms use specific sets of hyperparameters. For example, you can read how Amazon SageMaker uses image classification hyperparameters and read how SageMaker uses XGBoost algorithm hyperparameters.
Here are some examples of common hyperparameters:
- Learning rate is the rate at which an algorithm updates estimates
- Learning rate decay is a gradual reduction in the learning rate over time to speed up learning
- Momentum is the direction of the next step with respect to the previous step
- Neural network nodes refers to the number of nodes in each hidden layer
- Neural network layers refers to the number of hidden layers in a neural network
- Mini-batch size is training data batch size
- Epochs is the number of times the entire training dataset is shown to the network during training
- Eta is step size shrinkage to prevent overfitting
How can AWS help with hyperparameter tuning?
At Amazon Web Services (AWS), we offer Amazon SageMaker, a fully managed machine learning (ML) platform that allows you to perform automatic model tuning. Amazon SageMaker Automatic Model Tuning finds the best version of your ML model by running multiple training jobs on your dataset. It uses your specified algorithm and hyperparameter ranges.
SageMaker offers an intelligent version of hyperparameter tuning methods that is based on Bayesian search theory and is designed to find the best model in the shortest time. It starts with a random search but then learns how the model is behaving with respect to hyperparameter values. For more information, read how hyperparameter tuning works in SageMaker.
SageMaker Automatic Model Tuning also supports Hyperband, a new search strategy. Hyperband can find the optimal set of hyperparameters up to three times faster than Bayesian search for large-scale models such as deep neural networks that address computer vision problems.
You can also read how to perform automatic model tuning with SageMaker. You use the SageMaker hyperparameter tuning module with built-in SageMaker algorithms, with custom algorithms, and with SageMaker prebuilt containers. The webpage provides comprehensive self-learning tutorials and exercises to help you learn to perform hyperparameter optimization.