Update on "[llm] Support different shape of input_pos"

larryliu0820 · larryliu0820 · commit 3b95ef798ecf · 2025-06-24T21:57:50.000-07:00
For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor. This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`. To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`. Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/) [ghstack-poisoned]
diff --git a/kernels/portable/cpu/util/arange_util.cpp b/kernels/portable/cpu/util/arange_util.cpp
@@ -38,11 +38,13 @@ void arange_out_impl(
     double end,
     double step,
     Tensor& out) {
+  (void)ctx;
   Tensor::SizesType numel = compute_arange_out_size(start, end, step);
   ET_ARANGE_IMPL(ctx, start, numel, step, out, "arange.start_out");
 }
 
 void arange_out_impl(KernelRuntimeContext& ctx, double end, Tensor& out) {
+  (void)ctx;
   ET_ARANGE_IMPL(ctx, 0.0, end, 1.0, out, "arange.out");
 }
 

Original file line number	Diff line number	Diff line change
`@@ -38,11 +38,13 @@ void arange_out_impl(`
`38`	`38`	`double end,`
`39`	`39`	`double step,`
`40`	`40`	`Tensor& out) {`
	`41`	`+ (void)ctx;`
`41`	`42`	`Tensor::SizesType numel = compute_arange_out_size(start, end, step);`
`42`	`43`	`ET_ARANGE_IMPL(ctx, start, numel, step, out, "arange.start_out");`
`43`	`44`	`}`
`44`	`45`
`45`	`46`	`void arange_out_impl(KernelRuntimeContext& ctx, double end, Tensor& out) {`
	`47`	`+ (void)ctx;`
`46`	`48`	`ET_ARANGE_IMPL(ctx, 0.0, end, 1.0, out, "arange.out");`
`47`	`49`	`}`
`48`	`50`