Weight initialization by OneAdder · Pull Request #204 · modern-fortran/neural-fortran

OneAdder · 2025-02-17T19:42:44Z

Weights Initialization

Added functions for Xavier and Kaiming. The rule of thumb here:

S-shaped activation (tanh, sigmoid, etc.) => Xavier
ReLU-shaped activation (relu, gelu, silu, etc.) => Kaiming

For networks without Layer or Batch Normalization, that simple tweak will significantly increase convergance

milancurcic · 2025-02-17T19:50:34Z

Thanks, Michael, this is definitely needed.

About 1.5 years ago I started an Initializers PR (#151) but forgot about it. Basically it follows a similar pattern to how activations and optimizers are done in NF, which allows complete customization if specified, and sane defaults (like the ones you have here) if unspecified.

Do you think it would work well?

OneAdder · 2025-02-17T19:50:55Z

Added it while doing this: https://github.com/OneAdder/neural-fortran/blob/text_classification_example/example/text_classification.f90

OneAdder · 2025-02-17T20:08:33Z

@milancurcic Yes, I think #151 will work!

jvdp1 · 2025-02-21T17:40:26Z

src/nf/nf_dense_layer_submodule.f90

+    if (&
+        self % activation_name == 'relu' &
+        .or. self % activation_name == 'leaky_relu' &
+        .or. self % activation_name == 'celu' &
+    ) then
+      call random_he(self % weights, self % input_size)
+    elseif (self % activation_name == 'sigmoid' .or. self % activation_name == 'tanhf') then
+      call random_xavier(self % weights, self % input_size)


Should these be as default? Or should the user be able to choose for another pseudo-random generator?

I like how it's done here: #151
In the DL Framework of my dreams, I would have an option to pass the algorithm of weights initialization into a layer's constructor. So:

What I want: Initializers stub #151 with Kaiming weights by default but it requires a lot of refactoring

Why I made this PR-draft: it is correct from the mathematical standpoint, Xavier for S-shaped and He for .*elu. Will probably resolve CNN training on MNIST does not converge #145 if added to Conv layer

How Torch does it: Kaiming for everything. Not ideal, but covers vast majority of cases

jvdp1 · 2025-02-21T17:42:42Z

src/nf/nf_random.f90


+  impure elemental subroutine random_he(x, n_prev)
+    !! Kaiming weight initialization
+    real, intent(in out) :: x


Suggested change

real, intent(in out) :: x

real, intent(out) :: x

jvdp1 · 2025-02-21T17:42:58Z

src/nf/nf_random.f90

+
+  impure elemental subroutine random_xavier(x, n_prev)
+    !! Kaiming weight initialization
+    real, intent(in out) :: x


Suggested change

real, intent(in out) :: x

real, intent(out) :: x

jvdp1 · 2025-02-21T17:43:54Z

src/nf/nf_random.f90

+    lower = -(1. / sqrt(real(n_prev)))
+    upper = 1. / sqrt(real(n_prev))


Suggested change

lower = -(1. / sqrt(real(n_prev)))

upper = 1. / sqrt(real(n_prev))

upper = 1. / sqrt(real(n_prev))

lower = -upper

jvdp1 · 2025-02-21T17:44:49Z

src/nf/nf_random.f90

+    lower = -(1. / sqrt(real(n_prev)))
+    upper = 1. / sqrt(real(n_prev))
+    call random_number(x)
+    x = lower + x * (upper - lower)


Is this correct if lower == -upper?

OneAdder added 2 commits February 17, 2025 23:16

weight_initialization: added xavier and he

368f686

weight_initialization: free efficiency boost for dense layer

ace6246

jvdp1 reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight initialization#204

Weight initialization#204
OneAdder wants to merge 2 commits intomodern-fortran:mainfrom
OneAdder:weight_initialization

OneAdder commented Feb 17, 2025

Uh oh!

milancurcic commented Feb 17, 2025

Uh oh!

OneAdder commented Feb 17, 2025

Uh oh!

OneAdder commented Feb 17, 2025 •

edited

Loading

Uh oh!

jvdp1 Feb 21, 2025

Uh oh!

OneAdder Feb 21, 2025

Uh oh!

jvdp1 Feb 21, 2025

Uh oh!

jvdp1 Feb 21, 2025

Uh oh!

jvdp1 Feb 21, 2025

Uh oh!

jvdp1 Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		lower = -(1. / sqrt(real(n_prev)))
		upper = 1. / sqrt(real(n_prev))

Conversation

OneAdder commented Feb 17, 2025

Weights Initialization

Uh oh!

milancurcic commented Feb 17, 2025

Uh oh!

OneAdder commented Feb 17, 2025

Uh oh!

OneAdder commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvdp1 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

OneAdder Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jvdp1 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jvdp1 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jvdp1 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

jvdp1 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OneAdder commented Feb 17, 2025 •

edited

Loading