You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 11, 2021. It is now read-only.
I noticed that you run the attention through a sigmoid because you were having numerical problems:
https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L54
This may work, but I think that should actually be a softmax. In the paper you cite, it only says that the activation should be proportional to
In another paper [1], they explicitly say it should be
[1] https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf