Skip to content

zjunlp/SkillX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‘‰ SkillX πŸ‘ˆ

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Awesome License: MIT

Table of Contents

πŸ“– Overview

SkillX is a fully automated framework that constructs a reusable, plug-and-play skill knowledge base for LLM agents from experience.

Instead of storing raw trajectories, workflows, or loosely structured reflections, SkillX distills agent experience into a three-level skill hierarchy:

  • Planning Skills for high-level task organization
  • Functional Skills for reusable tool-based subroutines
  • Atomic Skills for execution-oriented tool usage patterns

Built with a strong backbone agent, SkillX produces a transferable skill library that can be directly plugged into weaker base agents and new environments. Across challenging long-horizon, user-interactive benchmarks such as AppWorld, BFCL-v3, and Ο„2-Bench, SkillX consistently improves both task success and execution efficiency.

Case GIF

πŸ€– Key Features

Hierarchical Multi-Level Skill Design

SkillX transforms raw trajectories into a structured three-tier skill space:

  • Planning Skills capture high-level decomposition and ordering
  • Functional Skills represent reusable multi-step tool subroutines
  • Atomic Skills encode practical tool usage constraints and patterns

Fully Automated Skill KB Construction

SkillX provides an end-to-end automated pipeline that:

  • rolls out agents on training tasks,
  • extracts reusable skills from successful trajectories,
  • consolidates and filters low-quality skills,
  • and builds a reusable plug-and-play skill knowledge base.

Iterative Skill Refinement

SkillX continuously improves the skill library through:

  • skill merging for consolidating redundant behaviors,
  • quality filtering for removing brittle or hallucinated skills,
  • and iterative updates that add, modify, or keep skills based on execution feedback.

Exploratory Skill Expansion

Beyond seed demonstrations, SkillX proactively discovers new skills by:

  • identifying under-used and failure-prone tools,
  • guiding environment exploration,
  • synthesizing new tasks from exploratory trajectories,
  • and expanding skill coverage beyond the original training distribution.

Plug-and-Play Transfer Across Agents

The resulting skill library can be directly injected into different base agents, enabling strong-to-weak transfer without retraining the underlying model.

Better Performance and Efficiency

SkillX consistently improves:

  • task success rate on challenging benchmarks,
  • execution efficiency by reducing unnecessary exploration and tool misuse,
  • and generalization through structured, reusable experience abstraction.

πŸ“Š Highlights

  • ~10% absolute improvement for weaker base agents on multiple benchmarks
  • Consistent gains on AppWorld, BFCL-v3, and Ο„2-Bench
  • Stronger transferability than trajectory-based, workflow-based, and memory-based baselines
  • Improved execution efficiency with fewer redundant steps
  • Effective even when the skill library is built by a stronger model and used by weaker ones

🧠 Why SkillX?

Existing experience-learning methods often suffer from:

  • Isolated learning: agents repeatedly rediscover similar behaviors
  • Weak transferability: raw trajectories and reflections often do not generalize well
  • Capability bottlenecks: self-extracted experience is limited by the agent’s own strength

SkillX addresses these issues by building a structured skill knowledge base that is:

  • reusable across tasks
  • transferable across agents
  • lightweight to retrieve
  • easy to inject into prompts
  • more robust than long-context progressive skill formats

πŸ—οΈ Method Overview

SkillX consists of three core components:

1. Multi-Level Skills Extraction

From successful trajectories, SkillX automatically extracts:

  • Planning skills: concise, reusable task plans
  • Functional skills: reusable tool-composition procedures
  • Atomic skills: tool-specific usage guidance, constraints, and failure notes

2. Iterative Skills Refinement

SkillX improves library quality through:

  • Skills Merge: cluster and consolidate similar skills
  • Skills Filter: remove non-portable, hallucinated, or invalid skills
  • Skills Update: add, modify, or keep skills across iterations

3. Exploratory Skills Expansion

SkillX expands beyond observed demonstrations by:

  • guiding exploration toward under-covered tools and failure modes,
  • synthesizing new tasks from exploration,
  • and rerunning extraction + refinement to grow the skill library.

πŸ“ˆ Main Results

SkillX improves agentic performance across multiple LLM backbones and benchmarks.

Representative gains

  • On Qwen3-32B, SkillX brings around 10-point improvements on several benchmarks
  • On Kimi-K2-Instruct-0905, SkillX yields clear gains especially on AppWorld
  • On GLM-4.6, SkillX still improves performance and execution efficiency despite the model already being strong

Benchmarks

  • AppWorld
  • BFCL-v3
  • Ο„2-Bench

Key takeaway

SkillX outperforms strong experience-learning baselines such as:

  • A-Mem
  • AWM
  • ExpeL
  • No-memory

This shows that how experience is represented matters as much as, or more than, where it comes from.


πŸ” What Makes SkillX Different?

Compared with prior experience formats:

  • Raw trajectories are verbose and difficult to transfer
  • Insights/reflections are often too abstract
  • Workflows may miss low-level tool constraints
  • Claude-style skills rely on long-context progressive disclosure and complex environment support

In contrast, SkillX offers:

  • hierarchical, itemized, reusable skills
  • one-time prompt injection
  • lightweight retrieval
  • strong transfer across agents and environments

πŸš€ Use Cases

SkillX is especially useful for:

  • tool-using LLM agents
  • long-horizon task execution
  • interactive application environments
  • cross-agent knowledge transfer
  • building reusable agent skill libraries from experience

πŸ§ͺ Benchmarks Used

AppWorld

A realistic ecosystem of apps and APIs for long-horizon agent execution.

BFCL-v3

A challenging benchmark for multi-turn function calling and tool use.

Ο„2-Bench

A user-interactive benchmark focused on conversational tool-using agents.


πŸ“¦ Planned Release

We will publicly release:

  • the SkillX codebase
  • the automatically constructed skill knowledge base
  • and supporting resources for skill extraction, refinement, and retrieval

πŸ“š Citation

If you find this work helpful, please consider citing:

@article{wang2026skillx,
  author     = {Chenxi Wang and 
                Zhuoyun Yu and 
                Xin Xie and 
                Wuguannan Yao and 
                Runnan Fang and 
                Shuofei Qiao and 
                Kexin Cao and 
                Guozhou Zheng and 
                Xiang Qi and 
                Peng Zhang and 
                Shumin Deng},
  title      = {SkillX: Automatically Constructing Skill Knowledge Bases for Agents},
  year       = {2026},
  eprint     = {2604.04804},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url        = {https://arxiv.org/abs/2604.04804}
}

πŸ™ Acknowledgement

This repository builds upon code from ReMe and AgentEvolver. The baseline implementations are adapted from AMEM, AWM, and Expel. We sincerely thank all contributors for their outstanding work!

Releases

No releases published

Packages

 
 
 

Contributors

Languages