The namer expects each LLM call to return JSON of the form {"topic_name": <topic_name>, "topic_score": <topic_score>} (or something like this in the new version). When a reasoning model exhausts its token budget, it returns an empty string ('') with finish_reason="length", but we currently do nothing to catch, warn, or log this. The failure passes through silently, making it hard to tell that breakage is happening at the LLM prompt/response layer. This affects any reasoning-capable model where the token budget can be consumed entirely by reasoning, leaving no tokens for the final answer.
Problem
Two distinct failure modes go undetected:
- Empty / off-format content — responses like
'' or {} that clearly don't match the expected schema are passed downstream without any signal that something went wrong.
- Non-
stop finish reasons — when finish_reason is "length" (or anything other than "stop"), it's a strong indicator the response is truncated or unusable, but we don't inspect it. The combination of an empty string and a non-stop finish reason is the specific signature of the reasoning-token-exhaustion bug.
Debugging this case also surfaced that the current debug tooling doesn't expose enough to diagnose problems like this efficiently.
Proposed changes
Response validation
- Check
finish_reason on each response. If it isn't "stop", at minimum log it.
- Escalate the response: if
finish_reason is not "stop" and the content is empty/off-format, warn (or raise) more aggressively, since this is almost certainly a real failure rather than a benign edge case.
- Validate that the content parses into the expected
{"topic_name", "topic_score"} shape; if it's very off-format (empty string, {}, unparseable), let the user know breakage is happening at the LLM layer rather than failing silently downstream.
Debug tooling
- Include
finish_reason in the debug callback.
- Include other useful response metadata that would help diagnose what's going wrong (e.g. token usage, raw content, model, refusal/stop details).
- Include a test with a user facing hook (like .connectivity_status()) that uses a real prompt to make it easier to surface what's going on
Documenting debug tooling/process
- Include debugging tooling and how to use it in the docs (currently there exists some tooling but it is not referenced in teh docs)
The namer expects each LLM call to return JSON of the form
{"topic_name": <topic_name>, "topic_score": <topic_score>}(or something like this in the new version). When a reasoning model exhausts its token budget, it returns an empty string ('') withfinish_reason="length", but we currently do nothing to catch, warn, or log this. The failure passes through silently, making it hard to tell that breakage is happening at the LLM prompt/response layer. This affects any reasoning-capable model where the token budget can be consumed entirely by reasoning, leaving no tokens for the final answer.Problem
Two distinct failure modes go undetected:
''or{}that clearly don't match the expected schema are passed downstream without any signal that something went wrong.stopfinish reasons — whenfinish_reasonis"length"(or anything other than"stop"), it's a strong indicator the response is truncated or unusable, but we don't inspect it. The combination of an empty string and a non-stopfinish reason is the specific signature of the reasoning-token-exhaustion bug.Debugging this case also surfaced that the current debug tooling doesn't expose enough to diagnose problems like this efficiently.
Proposed changes
Response validation
finish_reasonon each response. If it isn't"stop", at minimum log it.finish_reasonis not"stop"and the content is empty/off-format, warn (or raise) more aggressively, since this is almost certainly a real failure rather than a benign edge case.{"topic_name", "topic_score"}shape; if it's very off-format (empty string,{}, unparseable), let the user know breakage is happening at the LLM layer rather than failing silently downstream.Debug tooling
finish_reasonin the debug callback.Documenting debug tooling/process