Skip to content

Discussion: Densenet121 tuning result and future experimental plan #91

Description

@kdg1993

What

It is a result of the densenet121 hyperparameter tuning with our advanced options.

The thing I considered in this experiment

  • Using advanced options: ASL, random augmentation
    It was not possible to use the label smoothing technique because this experiment started before fixing the conflict between ASL and the label smoothing
  • As many trials as possible
    I tried 30 trials and it can not guarantee the tuning is optimal but still, this experiment took almost a week even I tried this with only the half of CheXpert dataset
  • Enough epochs
    Empirically, I observed improvement in best score after 15 epochs. Thus, I tried 20 epochs but hard to to more due to the time limit

Experiment settings

Model Loss Raytune trials stop_patience Epoch Train size Optimizer Dataset
Densenet121
(Imagenet pretrained)
ASL 30 10000 (not stop) 20 50% Adam CheXpert-pad224

Experiment result
Link : https://wandb.ai/snuh_interns/kdg_tune_densenet121/groups/trainval_2023-01-27_08-17-57/workspace?workspace=user-snuh_interns
image

$$A\ highlightened\ top\ 9\ trials\ by\ parallel\ coordinates\ plot$$

image

$$A\ highlightened\ bottom\ 10\ trials\ by\ parallel\ coordinates\ plot$$

Key observations

  • High lr & low weight_decay might be a bad idea
    • The performance is not sensitive to lr unless lr is too high. Empirically, higher than 1e-2 could be too high, and lower than 1e-3 seems adequate
    • The weight_decay and batch_size look not a good thing to tune. The range of weight_decay is too wide (be cautious because the weight_decay axis is on the log scale) in the top 9. The batch_size, also, does not show distinct patterns
  • The results of ASL factors are in line with the intuition
    • The range of gamma_neg of the top 9 is from around 2.2 to 4.6. It looks neither too low nor too high. Probably, 3-3.5 could be adequate for fixed gamma_neg. Higher than 4 seems not good because some low rank trials are with gamma_neg from 4 to 4.5
    • The ps_factors (probability shifting factor) of the top 9 are from around 0.05 to 0.18. It is great that range of the top 9 is not out over 0.2 which could be suspiciously high. However, many low-rank trials are around 0.12. Thus, it is not clearer than asl_gamma_neg. If there are not enough resources to tune, removing ps_factor from the search space might be better
  • Hard to find a good combination of Random Augmentation's magnitude & # of operations, but at least avoiding too strong augmentation could be wise

Why

To figure out moderate hyperparameters and get some hints for future experiments.

How

  • Hyperparameter search space
    • lr : loguniform (0.00001, 0.1)
    • weight_decay : loguniform (0.00001, 0.1)
    • batch_size : categorical [256, 512]
    • asl_gamma_neg : uniform (1, 5)
    • asl_ps_factor : uniform (0.05, 0.25)
    • ra_num_ops : randint (2, 14)
    • ra_magnitude : randint (5, 20)
  • Hyperparamter search algorithm
    • Algo : Hyperopt
    • Metric : loss
    • Mode : min
    • Seed : 12345

Metadata

Metadata

Labels

DiscussionExtra attention is neededRayTuneAbout RayTune (Hyperparameter Tuning)

Projects

Status
🤔 Discussion

Relationships

None yet

Development

No branches or pull requests

Issue actions