AzureML Classic Neural Network Regression GridSearch Hyperparameter Tuning
I would like to share with you my personal experience of hyperparameter tuning NNs with AzureML. I have been using AzureML as a primary technology for model training on a personal and professional level for the past 3 years or more. AzureML has a learning curve but once mastered, it takes me about 10–15 minutes to build a full pipeline. So it really comes in handy for fast prototyping, POCs..etc. NNs have been the hardest to tune and in this article, I hope to demonstrate that clearly. NNs nonetheless have a lot of utility in predicting continuous and gradient-based functions. The AzureML community I think does not share enough on NNs. The official documentation for NN on AzureML is overcomplicated and for advanced NN-focused users, not general practitioners. The use case I will use to demo is Meta stock forecasting which can be downloaded from Yahoo Finance.
Hyperparameter Tuning with Grid Search
Upon Initializing a NN Regression module and selecting the “parameter range” option the following hyperparameter ranges appear:
Learning Rate: 0.01, 0.02, 0.04
Iterations: 20, 40, 80, 160
The rest of the parameters are set to defaults without any recommendations on changing them so I left them as is. On running grid-search, we get the results in Fig1.1 and Fig1.2. In Fig1.1, R² is negative which is not a good sign but it indicates a potential improvement if #iterations are increased. The tuning time was as short as 18 seconds which is very fast of course.
In Fig1.2, it indicates that an LR of 0.01 was only used ignoring 0.02 and 0.04. In the output logs, you can see a warning that any LR above 0.01 will not be considered! This makes me wonder why even include higher numbers when “parameter range” is selected in the first place!! I never know!
[ModuleOutput] Warning: Lowering learning rate to 0.01.
[ModuleOutput] Warning: Lowering learning rate to 0.01.
In any case, we saw in the first run that Iterations are incremented in an x2 geometric series. So we will follow the same logic in run 2. We also notice the same logic for the Learning rate but we will use 1/2 instead and move down.
run 1
Learning Rate: 0.01, 0.02, 0.04
Iterations: 20, 40, 80, 160
time: 18 seconds
max R² : -1
run 2
Learning Rate: 0.0025,0.005,0.01
Iterations: 160, 320, 640, 1280
time: 1 minute and 3 seconds
max R² : 0.7
The results of run2 are much more promising than run1. The #iterations graph Fig2.1 shows drastic improvement going from the min to max iterations in the grid. However, the improvement rate in R² exponentially decreases as we move towards more iterations. Also, Fig 2.2 shows a huge improvement in slope in R² moving towards lower LR. All in all, a small number of iterations with a fast learning rate cannot work.
run 3
Learning Rate: 0.000625,0.00125,0.0025
iterations: 1280, 2560, 5120,10240
time: 6 minutes and 30 seconds
max R² : 0.875 at iteration 2560
run 4
In iteration/run 4, we keep reducing LR by half but start increasing iterations by 0.25 and then 0.125.
Learning Rate: 0.00015625, 0.0003125,0.000625
iterations: 2560 , 3200 (0.25), 4000 (0.25), 4500 (0.125)
time : 4 minutes 22 seconds
max R² : 0.875 at iteration 2560 and LR of 0.000625
Conclusion
Finally, in 4 runs of GridSearch, we saw R² go from -1 to 0.875. The main point is to find a NN default architecture to allow you to see improvement trends per doubling up or down on #iterations and learning rate. Good luck and thank you for your time and interest. Also here is the link to the experiment “https://gallery.azure.ai/Experiment/Meta-Forecast-on-2-2-2023-Day-Day-0-8-r2”.