Skip to content

Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向? #338

@icophy

Description

@icophy

Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向?

Motivation

After running State Tuning experiments (details in Joluck/RWKV-PEFT), we found a clear capability boundary: State Tuning reliably transfers style/tone but cannot inject specific facts. This raises a more interesting question.

The Question

time_state encodes distributional priors, not discrete symbols. Could it serve as a "kernel function" — a compact, session-persistent representation of abstract behavioral dispositions?

Not facts ("my creator is X"), but tendencies:

  • Prefer formal register over casual
  • Treat safety constraints as hard limits, not soft suggestions
  • When uncertain, express uncertainty rather than guess

These are exactly the distributional shifts that State Tuning seems suited for.

Why This Matters

If state can encode stable behavioral dispositions, it becomes a lightweight alternative to:

  • System prompt injection (takes context window)
  • LoRA for behavioral alignment (requires retraining per disposition)
  • Runtime rule enforcement (brittle, easy to override)

The key properties we want to test: composability (can you combine dispositions from multiple trained states?) and stability (does the disposition hold under adversarial prompting or long conversations?).

Current Status

Running experiments now. Three disposition types being tested:

  1. Style disposition: concise vs. verbose
  2. Epistemic disposition: express uncertainty rather than guess
  3. Value disposition: prioritize honesty in ambiguous situations

Will update with results.

Has anyone explored this direction? Any known results on state composability or stability under adversarial prompting?


Context: Building Cophy, exploring RWKV state as a persistent identity substrate for AI agents.



动机

在完成 State 微调实验后(详见 Joluck/RWKV-PEFT),我们发现了清晰的能力边界:State 微调能可靠地迁移风格/语气,但无法注入具体事实。这引出了一个更有趣的问题。

问题

time_state 编码的是分布先验,而非离散符号。它能否作为一个**"核函数"**——一种紧凑的、会话持久的抽象行为倾向表示?

不是事实("我的创造者是X"),而是倾向

  • 倾向于正式语气而非随意
  • 把安全约束视为硬限制,而非软建议
  • 不确定时,表达不确定性而非猜测

这些恰好是 State 微调擅长的分布级偏移。

为什么重要

如果 state 能编码稳定的行为倾向,它就成为以下方案的轻量替代:

  • System prompt 注入(占用上下文窗口)
  • 用 LoRA 做行为对齐(每种倾向都需要重新训练)
  • 运行时规则强制(脆弱,容易被覆盖)

我们想测试的关键特性:可组合性(能否叠加多个 state 的倾向?)和稳定性(倾向在对抗性提示或长对话下是否稳定?)。

当前状态

实验进行中,测试三类倾向:

  1. 风格倾向:简洁 vs 详细
  2. 认知倾向:不确定时表达不确定性而非猜测
  3. 价值倾向:面对模糊情况时优先诚实

结果出来后会更新。

有人探索过这个方向吗?关于 state 可组合性或对抗性提示下的稳定性有没有已知结论?


背景:开发 Cophy,探索 RWKV state 作为 AI 智能体持久身份底层的可行性。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions