Hello, Thank you for releasing the code. I have been trying to reproduce the ALFWorld experiments and have a design question.
At every step, ALFWorld's env exposes the full list of legal actions via info['admissible_commands']. The current prompt setup relies on the two few-shot trajectories to implicitly teach the action vocabulary (go to cabinet 1, take potato 1 from countertop 2, etc.), and the model is expected to produce syntactically valid actions from that demonstration alone.
I was curious why the admissible actions aren't explicitly included in the prompt at each step (e.g. appended as Valid actions: [...]). In my experiments with more recent models, I noticed that Large instruction-tuned LLMs (Llama 70B/90B) produce valid actions inconsistently across repeated runs.
I did go over the past issues of this repository where it was said that recent GPT models perform much worse in this type of task. I was thinking that if we passed the actions to the prompt, there might be an improvement. Moreover, I noticed that the Reflexion paper also followed the same approach which makes me wonder if this was a deliberate design choice.
Hello, Thank you for releasing the code. I have been trying to reproduce the ALFWorld experiments and have a design question.
At every step, ALFWorld's env exposes the full list of legal actions via info['admissible_commands']. The current prompt setup relies on the two few-shot trajectories to implicitly teach the action vocabulary (go to cabinet 1, take potato 1 from countertop 2, etc.), and the model is expected to produce syntactically valid actions from that demonstration alone.
I was curious why the admissible actions aren't explicitly included in the prompt at each step (e.g. appended as Valid actions: [...]). In my experiments with more recent models, I noticed that Large instruction-tuned LLMs (Llama 70B/90B) produce valid actions inconsistently across repeated runs.
I did go over the past issues of this repository where it was said that recent GPT models perform much worse in this type of task. I was thinking that if we passed the actions to the prompt, there might be an improvement. Moreover, I noticed that the Reflexion paper also followed the same approach which makes me wonder if this was a deliberate design choice.