Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向？

Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向？

## Motivation

After running State Tuning experiments (details in Joluck/RWKV-PEFT), we found a clear capability boundary: State Tuning reliably transfers style/tone but cannot inject specific facts. This raises a more interesting question.

## The Question

`time_state` encodes distributional priors, not discrete symbols. Could it serve as a **"kernel function"** — a compact, session-persistent representation of abstract behavioral dispositions?

Not facts ("my creator is X"), but *tendencies*:
- Prefer formal register over casual
- Treat safety constraints as hard limits, not soft suggestions
- When uncertain, express uncertainty rather than guess

These are exactly the distributional shifts that State Tuning seems suited for.

## Why This Matters

If state can encode stable behavioral dispositions, it becomes a lightweight alternative to:
- System prompt injection (takes context window)
- LoRA for behavioral alignment (requires retraining per disposition)
- Runtime rule enforcement (brittle, easy to override)

The key properties we want to test: **composability** (can you combine dispositions from multiple trained states?) and **stability** (does the disposition hold under adversarial prompting or long conversations?).

## Current Status

Running experiments now. Three disposition types being tested:
1. Style disposition: concise vs. verbose
2. Epistemic disposition: express uncertainty rather than guess
3. Value disposition: prioritize honesty in ambiguous situations

Will update with results.

Has anyone explored this direction? Any known results on state composability or stability under adversarial prompting?

---

*Context: Building [Cophy](https://github.com/icophy), exploring RWKV state as a persistent identity substrate for AI agents.*

---
---

## 动机

在完成 State 微调实验后（详见 Joluck/RWKV-PEFT），我们发现了清晰的能力边界：State 微调能可靠地迁移风格/语气，但无法注入具体事实。这引出了一个更有趣的问题。

## 问题

`time_state` 编码的是分布先验，而非离散符号。它能否作为一个**"核函数"**——一种紧凑的、会话持久的抽象行为倾向表示？

不是事实（"我的创造者是X"），而是*倾向*：
- 倾向于正式语气而非随意
- 把安全约束视为硬限制，而非软建议
- 不确定时，表达不确定性而非猜测

这些恰好是 State 微调擅长的分布级偏移。

## 为什么重要

如果 state 能编码稳定的行为倾向，它就成为以下方案的轻量替代：
- System prompt 注入（占用上下文窗口）
- 用 LoRA 做行为对齐（每种倾向都需要重新训练）
- 运行时规则强制（脆弱，容易被覆盖）

我们想测试的关键特性：**可组合性**（能否叠加多个 state 的倾向？）和**稳定性**（倾向在对抗性提示或长对话下是否稳定？）。

## 当前状态

实验进行中，测试三类倾向：
1. 风格倾向：简洁 vs 详细
2. 认知倾向：不确定时表达不确定性而非猜测
3. 价值倾向：面对模糊情况时优先诚实

结果出来后会更新。

有人探索过这个方向吗？关于 state 可组合性或对抗性提示下的稳定性有没有已知结论？

---

*背景：开发 [Cophy](https://github.com/icophy)，探索 RWKV state 作为 AI 智能体持久身份底层的可行性。*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向？ #338

Motivation

The Question

Why This Matters

Current Status

动机

问题

为什么重要

当前状态

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Can RWKV state encode abstract behavioral dispositions? | RWKV state 能否编码抽象行为倾向？ #338

Description

Motivation

The Question

Why This Matters

Current Status

动机

问题

为什么重要

当前状态

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions