What
It is a result of the densenet121 hyperparameter tuning with our advanced options.
The thing I considered in this experiment
- Using advanced options: ASL, random augmentation
It was not possible to use the label smoothing technique because this experiment started before fixing the conflict between ASL and the label smoothing
- As many trials as possible
I tried 30 trials and it can not guarantee the tuning is optimal but still, this experiment took almost a week even I tried this with only the half of CheXpert dataset
- Enough epochs
Empirically, I observed improvement in best score after 15 epochs. Thus, I tried 20 epochs but hard to to more due to the time limit
Experiment settings
| Model |
Loss |
Raytune trials |
stop_patience |
Epoch |
Train size |
Optimizer |
Dataset |
Densenet121 (Imagenet pretrained) |
ASL |
30 |
10000 (not stop) |
20 |
50% |
Adam |
CheXpert-pad224 |
Experiment result
Link : https://wandb.ai/snuh_interns/kdg_tune_densenet121/groups/trainval_2023-01-27_08-17-57/workspace?workspace=user-snuh_interns

$$A\ highlightened\ top\ 9\ trials\ by\ parallel\ coordinates\ plot$$

$$A\ highlightened\ bottom\ 10\ trials\ by\ parallel\ coordinates\ plot$$
Key observations
- High lr & low weight_decay might be a bad idea
- The performance is not sensitive to lr unless lr is too high. Empirically, higher than 1e-2 could be too high, and lower than 1e-3 seems adequate
- The weight_decay and batch_size look not a good thing to tune. The range of weight_decay is too wide (be cautious because the weight_decay axis is on the log scale) in the top 9. The batch_size, also, does not show distinct patterns
- The results of ASL factors are in line with the intuition
- The range of gamma_neg of the top 9 is from around 2.2 to 4.6. It looks neither too low nor too high. Probably, 3-3.5 could be adequate for fixed gamma_neg. Higher than 4 seems not good because some low rank trials are with gamma_neg from 4 to 4.5
- The ps_factors (probability shifting factor) of the top 9 are from around 0.05 to 0.18. It is great that range of the top 9 is not out over 0.2 which could be suspiciously high. However, many low-rank trials are around 0.12. Thus, it is not clearer than asl_gamma_neg. If there are not enough resources to tune, removing ps_factor from the search space might be better
- Hard to find a good combination of Random Augmentation's magnitude & # of operations, but at least avoiding too strong augmentation could be wise
Why
To figure out moderate hyperparameters and get some hints for future experiments.
How
- Hyperparameter search space
- lr : loguniform (0.00001, 0.1)
- weight_decay : loguniform (0.00001, 0.1)
- batch_size : categorical [256, 512]
- asl_gamma_neg : uniform (1, 5)
- asl_ps_factor : uniform (0.05, 0.25)
- ra_num_ops : randint (2, 14)
- ra_magnitude : randint (5, 20)
- Hyperparamter search algorithm
- Algo : Hyperopt
- Metric : loss
- Mode : min
- Seed : 12345
What
It is a result of the densenet121 hyperparameter tuning with our advanced options.
The thing I considered in this experiment
It was not possible to use the label smoothing technique because this experiment started before fixing the conflict between ASL and the label smoothing
I tried 30 trials and it can not guarantee the tuning is optimal but still, this experiment took almost a week even I tried this with only the half of CheXpert dataset
Empirically, I observed improvement in best score after 15 epochs. Thus, I tried 20 epochs but hard to to more due to the time limit
Experiment settings
(Imagenet pretrained)
Experiment result

Link : https://wandb.ai/snuh_interns/kdg_tune_densenet121/groups/trainval_2023-01-27_08-17-57/workspace?workspace=user-snuh_interns
Key observations
Why
To figure out moderate hyperparameters and get some hints for future experiments.
How