AzureML Classic Neural Network Regression GridSearch Hyperparameter Tuning

--

I would like to share with you my personal experience of hyperparameter tuning NNs with AzureML. I have been using AzureML as a primary technology for model training on a personal and professional level for the past 3 years or more. AzureML has a learning curve but once mastered, it takes me about 10–15 minutes to build a full pipeline. So it really comes in handy for fast prototyping, POCs..etc. NNs have been the hardest to tune and in this article, I hope to demonstrate that clearly. NNs nonetheless have a lot of utility in predicting continuous and gradient-based functions. The AzureML community I think does not share enough on NNs. The official documentation for NN on AzureML is overcomplicated and for advanced NN-focused users, not general practitioners. The use case I will use to demo is Meta stock forecasting which can be downloaded from Yahoo Finance.

Hyperparameter Tuning with Grid Search

Upon Initializing a NN Regression module and selecting the “parameter range” option the following hyperparameter ranges appear:

Learning Rate: 0.01, 0.02, 0.04

Iterations: 20, 40, 80, 160

The rest of the parameters are set to defaults without any recommendations on changing them so I left them as is. On running grid-search, we get the results in Fig1.1 and Fig1.2. In Fig1.1, R² is negative which is not a good sign but it indicates a potential improvement if #iterations are increased. The tuning time was as short as 18 seconds which is very fast of course.

Fig.1.1 R² with #iterations
Fig 1.2 R² with Learning Rate

In Fig1.2, it indicates that an LR of 0.01 was only used ignoring 0.02 and 0.04. In the output logs, you can see a warning that any LR above 0.01 will not be considered! This makes me wonder why even include higher numbers when “parameter range” is selected in the first place!! I never know!

[ModuleOutput] Warning: Lowering learning rate to 0.01.
[ModuleOutput] Warning: Lowering learning rate to 0.01.

In any case, we saw in the first run that Iterations are incremented in an x2 geometric series. So we will follow the same logic in run 2. We also notice the same logic for the Learning rate but we will use 1/2 instead and move down.

run 1

Learning Rate: 0.01, 0.02, 0.04

Iterations: 20, 40, 80, 160

time: 18 seconds

max R² : -1

run 2

Learning Rate: 0.0025,0.005,0.01

Iterations: 160, 320, 640, 1280

time: 1 minute and 3 seconds

max R² : 0.7

The results of run2 are much more promising than run1. The #iterations graph Fig2.1 shows drastic improvement going from the min to max iterations in the grid. However, the improvement rate in R² exponentially decreases as we move towards more iterations. Also, Fig 2.2 shows a huge improvement in slope in R² moving towards lower LR. All in all, a small number of iterations with a fast learning rate cannot work.

Fig 2.1 — R² with #iterations for run 2
Fig 2.2 — R² with #iterations for run 2

run 3

Learning Rate: 0.000625,0.00125,0.0025

iterations: 1280, 2560, 5120,10240

time: 6 minutes and 30 seconds

max R² : 0.875 at iteration 2560

Fig 3.1
Fig 3.2

run 4

In iteration/run 4, we keep reducing LR by half but start increasing iterations by 0.25 and then 0.125.

Learning Rate: 0.00015625, 0.0003125,0.000625

iterations: 2560 , 3200 (0.25), 4000 (0.25), 4500 (0.125)

time : 4 minutes 22 seconds

max R² : 0.875 at iteration 2560 and LR of 0.000625

Fig 4.1
Fig 4.2

Conclusion

Finally, in 4 runs of GridSearch, we saw R² go from -1 to 0.875. The main point is to find a NN default architecture to allow you to see improvement trends per doubling up or down on #iterations and learning rate. Good luck and thank you for your time and interest. Also here is the link to the experiment “https://gallery.azure.ai/Experiment/Meta-Forecast-on-2-2-2023-Day-Day-0-8-r2”.

Fig5 — run1 through 4 from left to right

--

--

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup
Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

Written by Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

5 years Data Scientist and a MSc from George Mason University in Data Analytics. I enjoy experimenting with Data Science tools. emad.ezzeldin4@gmail.com

No responses yet