Discussion: Densenet121 tuning result and future experimental plan

## What
It is a result of the densenet121 hyperparameter tuning with our advanced options.

**The thing I considered in this experiment**
* **_Using advanced options: ASL, random augmentation_** It was not possible to use the label smoothing technique because this experiment started before fixing the conflict between ASL and the label smoothing
* _**As many trials as possible**_ I tried 30 trials and it can not guarantee the tuning is optimal but still, this experiment took almost a week even I tried this with only the half of CheXpert dataset
* **_Enough epochs_** Empirically, I observed improvement in best score after 15 epochs. Thus, I tried 20 epochs but hard to to more due to the time limit

**Experiment settings**
|Model|Loss|Raytune trials|stop_patience|Epoch|Train size|Optimizer|Dataset|
|---|---|---|---|---|---|---|---|
|Densenet121 (Imagenet pretrained)|ASL|30|10000 (not stop)|20|50%|Adam|CheXpert-pad224|

**Experiment result**
Link : https://wandb.ai/snuh_interns/kdg_tune_densenet121/groups/trainval_2023-01-27_08-17-57/workspace?workspace=user-snuh_interns
![image](https://user-images.githubusercontent.com/50651319/216863349-39ac4f17-ce49-4801-b0cd-b8e22cf68a9b.png)

$$A\ highlightened\ top\ 9\ trials\ by\ parallel\ coordinates\ plot$$

![image](https://user-images.githubusercontent.com/50651319/216868535-d3d3fce4-bb59-42f8-a5c7-9573986accc7.png)

$$A\ highlightened\ bottom\ 10\ trials\ by\ parallel\ coordinates\ plot$$

_**Key observations**_
* High lr & low weight_decay might be a bad idea
 - The performance is not sensitive to lr unless lr is too high. Empirically, higher than 1e-2 could be too high, and lower than 1e-3 seems adequate
 - The weight_decay and batch_size look not a good thing to tune. The range of weight_decay is too wide (be cautious because the weight_decay axis is on the log scale) in the top 9. The batch_size, also, does not show distinct patterns
* The results of ASL factors are in line with the intuition
 - The range of gamma_neg of the top 9 is from around 2.2 to 4.6. It looks neither too low nor too high. Probably, 3-3.5 could be adequate for fixed gamma_neg. Higher than 4 seems not good because some low rank trials are with gamma_neg from 4 to 4.5
 - The ps_factors (probability shifting factor) of the top 9 are from around 0.05 to 0.18. It is great that range of the top 9 is not out over 0.2 which could be suspiciously high. However, many low-rank trials are around 0.12. Thus, it is not clearer than asl_gamma_neg. If there are not enough resources to tune, removing ps_factor from the search space might be better
* Hard to find a good combination of Random Augmentation's magnitude & # of operations, but at least avoiding too strong augmentation could be wise
 

## Why
To figure out moderate hyperparameters and get some hints for future experiments.
 

## How
* Hyperparameter search space
 - lr : loguniform (0.00001, 0.1)
 - weight_decay : loguniform (0.00001, 0.1)
 - batch_size : categorical [256, 512]
 - asl_gamma_neg : uniform (1, 5)
 - asl_ps_factor : uniform (0.05, 0.25)
 - ra_num_ops : randint (2, 14)
 - ra_magnitude : randint (5, 20)
* Hyperparamter search algorithm
 - Algo : Hyperopt
 - Metric : loss
 - Mode : min
 - Seed : 12345

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Densenet121 tuning result and future experimental plan #91

What

Why

How

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discussion: Densenet121 tuning result and future experimental plan #91

Description

What

Why

How

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions