Skip to content

Commit 3011909

Browse files
[KYUUBI #7379][2b/4] Data Agent Engine: agent runtime, middleware stack, OpenAI provider, and live E2E tests
This PR delivers the runtime layer of the Data Agent Engine on top of the tool system and data source plumbing from 2a/4: - ReactAgent: ReAct-style loop with streaming LLM responses, per-step tool dispatch, and AgentRunContext tracking token usage, iterations, and session. - Middleware stack (AgentMiddleware + ReactAgent.Builder): * LoggingMiddleware -- structured per-step/LLM/tool/finish logs with MDC. * ApprovalMiddleware -- CompletableFuture-based resolve for DESTRUCTIVE tools; modes NORMAL / STRICT / AUTO_APPROVE. * CompactionMiddleware -- token-threshold-triggered history summarization with KEEP_RECENT_TURNS=4, emits a Compaction AgentEvent so clients can observe the mechanism firing. * ToolResultOffloadMiddleware -- spills large tool outputs to disk and surfaces `read_tool_output` / `grep_tool_output` companion tools for the LLM to re-query truncated previews. - OpenAiProvider: single shared ReactAgent, per-session ConversationMemory, streaming chat completions, Hikari-pooled JDBC data source; reads model and thresholds from KyuubiConf. - ExecuteStatement (Scala): encodes all AgentEvents (including compaction and approval_request) as SSE JSON rows streamed through the JDBC reply column. - KyuubiConf: new keys for LLM provider/api-url/model/api-key, approval mode, compaction trigger tokens, offload root/thresholds, max iterations, etc. - Tests: * Unit tests for runtime, middlewares, offload store, and event shapes. * Live tests gated on DATA_AGENT_LLM_API_KEY covering full LLM round-trips: ReactAgentLiveTest (offload+grep, approval approve/deny), DataAgentE2ESuite and DataAgentApprovalE2ESuite (JDBC layer), DataAgentCompactionE2ESuite (JDBC-observable compaction event + post-compaction recovery), CompactionMiddlewareLiveTest. * Compatibility verified against qwen3.6-plus, glm-5, and kimi-k2.5 via per-call `model=` logging in ReactAgent.
1 parent cca9581 commit 3011909

53 files changed

Lines changed: 5180 additions & 141 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/configuration/settings.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ You can configure the Kyuubi properties in `$KYUUBI_HOME/conf/kyuubi-defaults.co
144144
| kyuubi.engine.chat.provider | ECHO | The provider for the Chat engine. Candidates: <ul> <li>ECHO: simply replies a welcome message.</li> <li>GPT: a.k.a ChatGPT, powered by OpenAI.</li> <li>ERNIE: ErnieBot, powered by Baidu.</li></ul> | string | 1.8.0 |
145145
| kyuubi.engine.connection.url.use.hostname | true | (deprecated) When true, the engine registers with hostname to zookeeper. When Spark runs on K8s with cluster mode, set to false to ensure that server can connect to engine | boolean | 1.3.0 |
146146
| kyuubi.engine.data.agent.approval.mode | NORMAL | Default approval mode for tool execution in the Data Agent engine. Candidates: <ul> <li>AUTO_APPROVE: all tools are auto-approved without user interaction.</li> <li>NORMAL: only destructive tools require explicit approval.</li> <li>STRICT: all tools require explicit user approval.</li></ul> | string | 1.12.0 |
147+
| kyuubi.engine.data.agent.compaction.trigger.tokens | 128000 | The prompt-token threshold above which the Data Agent's compaction middleware summarizes older conversation history into a compact message. The check is made each turn as <code>real_prompt_tokens_of_previous_LLM_call + estimate_of_newly_appended_tail</code>; when this predicted prompt size reaches the configured value, older messages are replaced by a single summary message while the most recent exchanges are kept verbatim. Set to a very large value (e.g., <code>9223372036854775807</code>) to effectively disable compaction. | long | 1.12.0 |
147148
| kyuubi.engine.data.agent.extra.classpath | &lt;undefined&gt; | The extra classpath for the Data Agent engine, for configuring the location of the LLM SDK and etc. | string | 1.12.0 |
148149
| kyuubi.engine.data.agent.java.options | &lt;undefined&gt; | The extra Java options for the Data Agent engine | string | 1.12.0 |
149150
| kyuubi.engine.data.agent.jdbc.url | &lt;undefined&gt; | The JDBC URL for the Data Agent engine to connect to the target database. If not set, the Data Agent will connect back to Kyuubi server via ZooKeeper service discovery. | string | 1.12.0 |

externals/kyuubi-data-agent-engine/pom.xml

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -50,19 +50,48 @@
5050
<version>${project.version}</version>
5151
</dependency>
5252

53+
<!-- OpenAI official Java SDK -->
5354
<dependency>
5455
<groupId>com.openai</groupId>
5556
<artifactId>openai-java</artifactId>
57+
<version>${openai.sdk.version}</version>
5658
</dependency>
5759

60+
<!-- JSON Schema generation from Jackson-annotated classes -->
5861
<dependency>
5962
<groupId>com.github.victools</groupId>
6063
<artifactId>jsonschema-generator</artifactId>
64+
<version>${victools.jsonschema.version}</version>
6165
</dependency>
62-
6366
<dependency>
6467
<groupId>com.github.victools</groupId>
6568
<artifactId>jsonschema-module-jackson</artifactId>
69+
<version>${victools.jsonschema.version}</version>
70+
</dependency>
71+
72+
<!-- SQLite JDBC driver -->
73+
<dependency>
74+
<groupId>org.xerial</groupId>
75+
<artifactId>sqlite-jdbc</artifactId>
76+
<version>${sqlite.version}</version>
77+
</dependency>
78+
79+
<!-- MySQL JDBC driver (also works for StarRocks) -->
80+
<dependency>
81+
<groupId>com.mysql</groupId>
82+
<artifactId>mysql-connector-j</artifactId>
83+
</dependency>
84+
85+
<!-- Trino JDBC driver -->
86+
<dependency>
87+
<groupId>io.trino</groupId>
88+
<artifactId>trino-jdbc</artifactId>
89+
</dependency>
90+
91+
<!-- Connection pool -->
92+
<dependency>
93+
<groupId>com.zaxxer</groupId>
94+
<artifactId>HikariCP</artifactId>
6695
</dependency>
6796

6897
<!-- test dependencies -->
@@ -74,24 +103,12 @@
74103
<scope>test</scope>
75104
</dependency>
76105

77-
<dependency>
78-
<groupId>org.xerial</groupId>
79-
<artifactId>sqlite-jdbc</artifactId>
80-
<scope>test</scope>
81-
</dependency>
82-
83106
<dependency>
84107
<groupId>org.testcontainers</groupId>
85108
<artifactId>testcontainers-mysql</artifactId>
86109
<scope>test</scope>
87110
</dependency>
88111

89-
<dependency>
90-
<groupId>com.mysql</groupId>
91-
<artifactId>mysql-connector-j</artifactId>
92-
<scope>test</scope>
93-
</dependency>
94-
95112
<dependency>
96113
<groupId>junit</groupId>
97114
<artifactId>junit</artifactId>

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/JdbcDialect.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@
1717

1818
package org.apache.kyuubi.engine.dataagent.datasource;
1919

20+
import org.apache.kyuubi.engine.dataagent.datasource.dialect.GenericDialect;
21+
import org.apache.kyuubi.engine.dataagent.datasource.dialect.MysqlDialect;
22+
import org.apache.kyuubi.engine.dataagent.datasource.dialect.SparkDialect;
23+
import org.apache.kyuubi.engine.dataagent.datasource.dialect.SqliteDialect;
24+
import org.apache.kyuubi.engine.dataagent.datasource.dialect.TrinoDialect;
25+
2026
/**
2127
* SQL dialect abstraction for datasource-specific SQL generation.
2228
*

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/GenericDialect.java renamed to externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/dialect/GenericDialect.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.kyuubi.engine.dataagent.datasource;
18+
package org.apache.kyuubi.engine.dataagent.datasource.dialect;
19+
20+
import org.apache.kyuubi.engine.dataagent.datasource.JdbcDialect;
1921

2022
/**
2123
* Fallback dialect for JDBC subprotocols that have no dedicated implementation. Carries the

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/MysqlDialect.java renamed to externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/dialect/MysqlDialect.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.kyuubi.engine.dataagent.datasource;
18+
package org.apache.kyuubi.engine.dataagent.datasource.dialect;
19+
20+
import org.apache.kyuubi.engine.dataagent.datasource.JdbcDialect;
1921

2022
/** MySQL dialect. Uses backtick quoting for identifiers. */
2123
public final class MysqlDialect implements JdbcDialect {
2224

23-
static final MysqlDialect INSTANCE = new MysqlDialect();
25+
public static final MysqlDialect INSTANCE = new MysqlDialect();
2426

2527
private MysqlDialect() {}
2628

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/SparkDialect.java renamed to externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/dialect/SparkDialect.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.kyuubi.engine.dataagent.datasource;
18+
package org.apache.kyuubi.engine.dataagent.datasource.dialect;
19+
20+
import org.apache.kyuubi.engine.dataagent.datasource.JdbcDialect;
1921

2022
/** Spark SQL dialect. Uses backtick quoting for identifiers. */
2123
public final class SparkDialect implements JdbcDialect {
2224

23-
static final SparkDialect INSTANCE = new SparkDialect();
25+
public static final SparkDialect INSTANCE = new SparkDialect();
2426

2527
private SparkDialect() {}
2628

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/SqliteDialect.java renamed to externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/dialect/SqliteDialect.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.kyuubi.engine.dataagent.datasource;
18+
package org.apache.kyuubi.engine.dataagent.datasource.dialect;
19+
20+
import org.apache.kyuubi.engine.dataagent.datasource.JdbcDialect;
1921

2022
/** SQLite dialect. Uses double-quote quoting for identifiers. */
2123
public final class SqliteDialect implements JdbcDialect {
2224

23-
static final SqliteDialect INSTANCE = new SqliteDialect();
25+
public static final SqliteDialect INSTANCE = new SqliteDialect();
2426

2527
private SqliteDialect() {}
2628

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/TrinoDialect.java renamed to externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/datasource/dialect/TrinoDialect.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@
1515
* limitations under the License.
1616
*/
1717

18-
package org.apache.kyuubi.engine.dataagent.datasource;
18+
package org.apache.kyuubi.engine.dataagent.datasource.dialect;
19+
20+
import org.apache.kyuubi.engine.dataagent.datasource.JdbcDialect;
1921

2022
/** Trino SQL dialect. Uses double-quote quoting for identifiers. */
2123
public final class TrinoDialect implements JdbcDialect {
2224

23-
static final TrinoDialect INSTANCE = new TrinoDialect();
25+
public static final TrinoDialect INSTANCE = new TrinoDialect();
2426

2527
private TrinoDialect() {}
2628

externals/kyuubi-data-agent-engine/src/main/java/org/apache/kyuubi/engine/dataagent/provider/ProviderRunRequest.java

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,23 @@
1717

1818
package org.apache.kyuubi.engine.dataagent.provider;
1919

20+
import org.apache.kyuubi.engine.dataagent.runtime.ApprovalMode;
21+
import org.slf4j.Logger;
22+
import org.slf4j.LoggerFactory;
23+
2024
/**
2125
* User-facing request parameters for a provider-level agent invocation. Only contains fields from
2226
* the caller (question, model override, etc.). Adding new per-request options does not require
2327
* changing the {@link DataAgentProvider} interface.
28+
*
29+
* <p>The approval mode is accepted as a raw string (natural for config-driven callers) and parsed
30+
* into {@link ApprovalMode} by {@link #getApprovalMode()}. Unrecognised values fall back to {@link
31+
* ApprovalMode#NORMAL} with a warning.
2432
*/
2533
public class ProviderRunRequest {
2634

35+
private static final Logger LOG = LoggerFactory.getLogger(ProviderRunRequest.class);
36+
2737
private final String question;
2838
private String modelName;
2939
private String approvalMode;
@@ -45,8 +55,20 @@ public ProviderRunRequest modelName(String modelName) {
4555
return this;
4656
}
4757

48-
public String getApprovalMode() {
49-
return approvalMode;
58+
/**
59+
* Resolved approval mode. Returns {@link ApprovalMode#NORMAL} when the caller did not set one or
60+
* supplied an unknown value.
61+
*/
62+
public ApprovalMode getApprovalMode() {
63+
if (approvalMode == null || approvalMode.isEmpty()) {
64+
return ApprovalMode.NORMAL;
65+
}
66+
try {
67+
return ApprovalMode.valueOf(approvalMode.toUpperCase());
68+
} catch (IllegalArgumentException e) {
69+
LOG.warn("Unknown approval mode '{}', using default NORMAL", approvalMode);
70+
return ApprovalMode.NORMAL;
71+
}
5072
}
5173

5274
public ProviderRunRequest approvalMode(String approvalMode) {

0 commit comments

Comments
 (0)