Sigmoid in AttentionLSTM

I noticed that you run the attention through a sigmoid because you were having numerical problems:

https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L54

This may work, but I think that should actually be a softmax.  In the paper you cite, it only says that the activation should be __proportional__ to 

```
exp(dot(m, U_s))
```

In another paper [1], they explicitly say it should be

```
softmax(exp(dot(m, U_s)))
```

[1] https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sigmoid in AttentionLSTM #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sigmoid in AttentionLSTM #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions