Machine Learning Researcher | Embodied AI | Vision-Language Models | Agentic Systems
- Embodied Long-Horizon Reasoning
- Vision-Language Navigation (VLN)
- Agentic Systems with Reflection & Critic
- World Models for Robotics
- Models: Qwen-VL, InternVL, LLaVA
- Frameworks: PyTorch, Transformers
- Systems: Slurm, Docker, Multi-GPU Training
- π
- π§ Research Areas: Embodied AI Β· World Models Β· Agentic Systems
- π― Focus: Long-horizon reasoning, memory-driven embodied intelligence and world models
My research explores how to enable embodied agents to perform long-horizon reasoning through:
- Structured memory (Mem2Ego)
- Task-level planning (ET-Plan-Bench)
- Reflection-driven agentic systems
- World Models
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models (IROS 2025 Oral)
A benchmark for long-horizon embodied planning with spatiotemporal reasoning.
- Integrated into Embodied Arena
- Evaluates SOTA models on long-horizon tasks
π Publication
Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation (CVPR 2025 Workshop)
Mem2Ego addresses a fundamental limitation in embodied agents: the inability to maintain consistent long-term spatial memory while acting in an ego-centric manner.
We introduce a global-to-ego memory mechanism that unifies:
- Global scene accumulation for long-horizon reasoning
- Ego-centric perception for action grounding
Key contributions:
- A structured memory architecture that prevents memory fragmentation in long trajectories
- Improved alignment between language instructions, spatial context, and action decisions
- Significant gains in navigation success and robustness in complex environments
This work advances embodied AI from short-horizon imitation toward memory-driven decision making, a critical step toward scalable agentic systems.
π Arxiv
- Email: zlf465074419@gmail.com
- Medium Blogs: https://medium.com/@zlf465074419

