Skip to content
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 14 additions & 31 deletions sdk_v2/cpp/src/inferencing/generative/chat/chat_session.cc
Original file line number Diff line number Diff line change
Expand Up @@ -117,51 +117,34 @@ ToolCallContext ChatSession::BuildToolCallContext(const Request& request) const
tool_ctx.tool_call_start = get_param(FOUNDRY_LOCAL_MODEL_PROP_TOOL_CALL_START_STR);
tool_ctx.tool_call_end = get_param(FOUNDRY_LOCAL_MODEL_PROP_TOOL_CALL_END_STR);

// Fall back to model info properties if not specified in the request
const auto& info = CatalogModel().Info();

// Check if the model supports tool calling
const auto* tool_calling_val = info.GetPropertyInt(FOUNDRY_LOCAL_MODEL_PROP_SUPPORTS_TOOL_CALLING_INT);
if (tool_calling_val && *tool_calling_val == 1) {
tool_ctx.supports_tool_calling = true;
}

// Fall back to GenAI model API (reads genai_config.json + model-family fallback map)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little clunky to have a bunch of metadata coming from the catalog and some from the genai_config.json but we only merge those when creating a session.

Should we instead when populating the catalog models update the ModelInfo for a cached model to incorporate metadata from genai_config.json at that point? Same process post-download of a new model. That way all the other code doesn't need to know/care about the multiple sources of metadata - the just use the ModelInfo as-is.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Refactored in the latest commit. Model::Load() now populates ModelInfo with tool/reasoning tags from the GenAI model immediately after loading. BuildToolCallContext() is unchanged; it just reads from ModelInfo as before. Downstream code doesn't need to know about multiple metadata sources anymore.

if (tool_ctx.tool_call_start.empty()) {
const auto* val = info.GetPropertyStr(FOUNDRY_LOCAL_MODEL_PROP_TOOL_CALL_START_STR);
if (val) {
tool_ctx.tool_call_start = *val;
}
tool_ctx.tool_call_start = Model().GetGenerationTag("tool_call_start");
}

if (tool_ctx.tool_call_end.empty()) {
const auto* val = info.GetPropertyStr(FOUNDRY_LOCAL_MODEL_PROP_TOOL_CALL_END_STR);
if (val) {
tool_ctx.tool_call_end = *val;
}
tool_ctx.tool_call_end = Model().GetGenerationTag("tool_call_end");
}

// Check if the model supports chain-of-thought reasoning
const auto* reasoning_val = info.GetPropertyInt(FOUNDRY_LOCAL_MODEL_PROP_SUPPORTS_REASONING_INT);
if (reasoning_val && *reasoning_val == 1) {
tool_ctx.supports_reasoning = true;
// Non-empty tool_call_start implies the model supports tool calling
if (!tool_ctx.tool_call_start.empty()) {
tool_ctx.supports_tool_calling = true;
}

// Read reasoning marker tokens — same pattern as tool_call tokens
tool_ctx.reasoning_start = get_param(FOUNDRY_LOCAL_MODEL_PROP_REASONING_START_STR);
tool_ctx.reasoning_end = get_param(FOUNDRY_LOCAL_MODEL_PROP_REASONING_END_STR);

// Fall back to GenAI model API for reasoning tokens
if (tool_ctx.reasoning_start.empty()) {
const auto* val = info.GetPropertyStr(FOUNDRY_LOCAL_MODEL_PROP_REASONING_START_STR);
if (val) {
tool_ctx.reasoning_start = *val;
}
tool_ctx.reasoning_start = Model().GetGenerationTag("reasoning_start");
}

if (tool_ctx.reasoning_end.empty()) {
const auto* val = info.GetPropertyStr(FOUNDRY_LOCAL_MODEL_PROP_REASONING_END_STR);
if (val) {
tool_ctx.reasoning_end = *val;
}
tool_ctx.reasoning_end = Model().GetGenerationTag("reasoning_end");
}

// Non-empty reasoning_start implies the model supports reasoning
if (!tool_ctx.reasoning_start.empty()) {
tool_ctx.supports_reasoning = true;
}

// Accumulate tool definitions from the session.
Expand Down
9 changes: 9 additions & 0 deletions sdk_v2/cpp/src/inferencing/generative/genai_model_instance.cc
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,15 @@ OgaModel& GenAIModelInstance::GetOgaModel() {
return *oga_model_;
}

std::string GenAIModelInstance::GetGenerationTag(const char* tag_name) const {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GetTag would be more generic unless there's an expected use-case where we would need to separate 'generation' tags from others. I assume the tag name could provide any separation required though.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, renamed to GetTag in both GenAI and FL. just need to update the GenAI version here in FL once the PR is merged and included.

if (!oga_model_) {
return {};
}
OgaString tag = oga_model_->GetGenerationTag(tag_name);
const char* p = tag;
return p ? std::string(p) : std::string();
}

OgaTokenizer& GenAIModelInstance::GetOgaTokenizer() {
if (!tokenizer_) {
FL_THROW(FOUNDRY_LOCAL_ERROR_INTERNAL, "OGA tokenizer is null");
Expand Down
5 changes: 5 additions & 0 deletions sdk_v2/cpp/src/inferencing/generative/genai_model_instance.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ class GenAIModelInstance {
ExecutionProvider EP() const { return ep_; }
bool IsMultiModal() const;

/// Query a generation tag from the GenAI model (reads genai_config.json with model-family fallback).
/// Returns the tag value or empty string if not found. Supported keys:
/// "tool_call_start", "tool_call_end", "reasoning_start", "reasoning_end".
std::string GetGenerationTag(const char* tag_name) const;

/// Access the underlying OGA objects (for future chat generation work).
OgaModel& GetOgaModel();
OgaTokenizer& GetOgaTokenizer();
Expand Down