Commit 6f07be3
committed
Update on "[llm] Support different shape of input_pos"
For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.
This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.
To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)
[ghstack-poisoned]3 files changed
Lines changed: 8 additions & 5 deletions
File tree
- examples/models/llava/runner
- extension/llm/runner
- runtime/executor
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
| 32 | + | |
30 | 33 | | |
31 | 34 | | |
32 | | - | |
| 35 | + | |
33 | 36 | | |
34 | 37 | | |
35 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
0 commit comments