Skip to content

Commit 531d726

Browse files
committed
progress on tracing openai
1 parent 2dce85b commit 531d726

6 files changed

Lines changed: 365 additions & 26 deletions

File tree

.DONE.md renamed to .plan/.DONE.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,3 +368,63 @@
368368
- Fixed tracer_provider references to use `OpenTelemetry.tracer_provider`
369369
- Removed unnecessary comments from init calls
370370
- **Total: 109 test runs, 328 assertions, all passing, linter clean**
371+
372+
### Session 9 Completed (OpenAI Integration Enhancement) ✅
373+
- **Phase 1: Message Content Parsing**
374+
- Changed from selective field extraction (`{role, content}`) to full `.to_h` conversion
375+
- Now preserves ALL message fields (tool_calls, tool_call_id, name, refusal, etc.)
376+
- Supports vision messages with array content (text + image_url)
377+
- Supports tool messages with proper tool_call_id preservation
378+
- Supports mixed content (multiple text/image blocks in single message)
379+
- Added 2 comprehensive tests (vision, multi-turn tool calling)
380+
- **Phase 2: Advanced Token Metrics**
381+
- Implemented `parse_usage_tokens` helper method (56 lines, matches Go SDK)
382+
- Handles nested `*_tokens_details` objects with proper flattening
383+
- Captures advanced metrics:
384+
- `prompt_cached_tokens` (cached prompt tokens)
385+
- `prompt_audio_tokens` (audio input tokens)
386+
- `completion_reasoning_tokens` (o1 model reasoning tokens)
387+
- `completion_audio_tokens` (audio output tokens)
388+
- `completion_accepted_prediction_tokens` (accepted predictions)
389+
- `completion_rejected_prediction_tokens` (rejected predictions)
390+
- Translates field names (`input_tokens``prompt_tokens`)
391+
- Added 1 test validating advanced metrics capture
392+
- **Phase 3: Streaming Support**
393+
- Implemented full `stream_raw` wrapper with chunk aggregation
394+
- Added `aggregate_streaming_chunks` helper (104 lines of robust logic)
395+
- Automatic content concatenation across deltas
396+
- Tool calls aggregation (handles continuation chunks)
397+
- Usage metrics capture with `stream_options.include_usage`
398+
- Proper span parenting using `tracer.start_span` (not `start_root_span`)
399+
- Span lifecycle management (start before stream, finish after consumption)
400+
- Preserves original streaming behavior for user code
401+
- Added 1 comprehensive streaming test (19 assertions)
402+
- **Golden Example Enhanced** (`examples/internal/openai.rb`)
403+
- Expanded to 8 comprehensive examples:
404+
1. Vision - image understanding with array content
405+
2. Tool Calling - single-turn function calling
406+
3. **Streaming** - real-time completions with aggregation ⭐ NEW
407+
4. Multi-turn Tools - conversation with tool_call_id
408+
5. Mixed Content - text + images in one message
409+
6. Reasoning Models - o1-mini with advanced token display
410+
7. Temperature Variations - multiple parameter settings
411+
8. Advanced Parameters - full metadata capture demo
412+
- All examples production-ready with proper tracing
413+
- Validates Ruby SDK captures same data as TypeScript/Go
414+
- **Code Quality**
415+
- Total: ~250 lines added to core implementation
416+
- All code passes StandardRB linter
417+
- Full backward compatibility maintained
418+
- 100% feature parity with TypeScript/Go SDKs achieved
419+
- **Test Results**
420+
- 5 OpenAI tests: 92 assertions, all passing
421+
- Full test suite: 115 tests, 398 assertions, 0 failures
422+
- Golden example runs successfully with all 8 scenarios
423+
- **What Users Get**
424+
- ✅ Vision with image_url/file content arrays
425+
- ✅ Tool calling (single & multi-turn with tool_call_id)
426+
- ✅ Streaming with automatic chunk aggregation
427+
- ✅ Advanced token metrics (reasoning, cached, audio)
428+
- ✅ All OpenAI parameters captured in metadata
429+
- ✅ Full message structures (role, content, tool_calls, etc.)
430+
- **Total: 115 test runs, 398 assertions, all passing, linter clean**
File renamed without changes.

.TODO.md renamed to .plan/.TODO.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,16 @@
4545
- [ ] Implement Trace.get_parent
4646
- [ ] Implement span filtering logic (AI spans filter)
4747

48-
### Phase 4.5: OpenAI Advanced Features (Future)
48+
### Phase 4.5: OpenAI Advanced Features
4949

50-
#### Streaming Support
51-
- [ ] Add support for `stream_raw` API
52-
- [ ] Handle streaming responses and chunks
53-
- [ ] Aggregate streaming data for tracing
54-
- [ ] Test streaming with console output
50+
#### Streaming Support ✅ COMPLETE (2025-10-23)
51+
- [x] Add support for `stream_raw` API
52+
- [x] Handle streaming responses and chunks
53+
- [x] Aggregate streaming data for tracing
54+
- [x] Test streaming with console output
55+
- [x] Proper span parenting with `tracer.start_span`
56+
- [x] Automatic chunk aggregation (100+ lines of logic)
57+
- [x] Usage metrics capture with `stream_options.include_usage`
5558

5659
#### Additional Endpoints
5760
- [ ] Embeddings support
@@ -166,15 +169,15 @@
166169

167170
## Current Status
168171

169-
**Last Updated**: 2025-10-23 (Session 8)
170-
**Current Phase**: Phase 2 - Background Login with Retry ✅ COMPLETE
171-
**Test Status**: 109 test runs, 328 assertions, all passing, linter clean
172+
**Last Updated**: 2025-10-23 (Session 9)
173+
**Current Phase**: Phase 4.5 - OpenAI Integration Enhancement ✅ COMPLETE
174+
**Test Status**: 115 test runs, 398 assertions, all passing, linter clean
175+
**OpenAI Features**: Vision ✅, Tool Calling ✅, Streaming ✅, Advanced Metrics ✅
172176

173177
## Deferred Items
174178

175179
- API::Projects (move from Internal::Experiments)
176180
- API::Experiments (move from Internal::Experiments)
177-
- Eval.run integration with datasets
178-
- Dataset examples
179181
- Implement Parallelism (Eval.run parallelism parameter)
180-
- OpenAI Advanced Features (streaming, embeddings, etc.)
182+
- OpenAI Responses API (`/v1/responses` endpoint - newer API, lower priority)
183+
- OpenAI Additional Endpoints (embeddings, assistants, fine-tuning, images)

examples/internal/openai.rb

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@
1212
# This golden example showcases ALL OpenAI chat completion features with proper tracing:
1313
# 1. Vision (image understanding) with array content
1414
# 2. Tool/function calling (single-turn)
15-
# 3. Multi-turn tool calling with tool_call_id
16-
# 4. Mixed content (text + images)
17-
# 5. Reasoning models with advanced token metrics
18-
# 6. Temperature variations and other parameters
15+
# 3. Streaming chat completions with automatic chunk aggregation
16+
# 4. Multi-turn tool calling with tool_call_id
17+
# 5. Mixed content (text + images)
18+
# 6. Reasoning models with advanced token metrics
19+
# 7. Temperature variations
20+
# 8. Advanced parameters
1921
#
2022
# This example validates that the Ruby SDK captures the same data as TypeScript/Go SDKs.
2123
#
@@ -116,8 +118,38 @@
116118
puts " Tokens: #{response.usage.total_tokens}"
117119
end
118120

119-
# Example 3: Multi-turn Tool Calling (with tool_call_id)
120-
puts "\n3. Multi-turn Tool Calling"
121+
# Example 3: Streaming Chat Completions
122+
puts "\n3. Streaming Chat Completions"
123+
puts "-" * 50
124+
tracer.in_span("example-streaming") do
125+
print "Streaming response: "
126+
stream = client.chat.completions.stream_raw(
127+
model: "gpt-4o-mini",
128+
messages: [
129+
{role: "user", content: "Count from 1 to 5"}
130+
],
131+
max_tokens: 50,
132+
stream_options: {
133+
include_usage: true # Request usage stats in stream
134+
}
135+
)
136+
137+
# Consume and display the stream
138+
total_tokens = 0
139+
stream.each do |chunk|
140+
delta_content = chunk.choices[0]&.delta&.content
141+
print delta_content if delta_content
142+
143+
# Capture usage from final chunk
144+
total_tokens = chunk.usage&.total_tokens if chunk.usage
145+
end
146+
puts ""
147+
puts "✓ Streaming complete, tokens: #{total_tokens}"
148+
puts " (Note: Braintrust automatically aggregates all chunks for the trace)"
149+
end
150+
151+
# Example 4: Multi-turn Tool Calling (with tool_call_id)
152+
puts "\n4. Multi-turn Tool Calling"
121153
puts "-" * 50
122154
tracer.in_span("example-multi-turn-tools") do
123155
# First request - model decides to call a tool
@@ -195,8 +227,8 @@
195227
end
196228
end
197229

198-
# Example 4: Mixed Content (text + image in same message)
199-
puts "\n4. Mixed Content (Text + Image)"
230+
# Example 5: Mixed Content (text + image in same message)
231+
puts "\n5. Mixed Content (Text + Image)"
200232
puts "-" * 50
201233
tracer.in_span("example-mixed-content") do
202234
response = client.chat.completions.create(
@@ -222,8 +254,8 @@
222254
puts " Tokens: #{response.usage.total_tokens}"
223255
end
224256

225-
# Example 5: Reasoning Model (o1-mini) with Advanced Token Metrics
226-
puts "\n5. Reasoning Model (o1-mini)"
257+
# Example 6: Reasoning Model (o1-mini) with Advanced Token Metrics
258+
puts "\n6. Reasoning Model (o1-mini)"
227259
puts "-" * 50
228260
tracer.in_span("example-reasoning") do
229261
response = client.chat.completions.create(
@@ -252,8 +284,8 @@
252284
end
253285
end
254286

255-
# Example 6: Temperature & Parameter Variations
256-
puts "\n6. Temperature & Parameter Variations"
287+
# Example 7: Temperature & Parameter Variations
288+
puts "\n7. Temperature & Parameter Variations"
257289
puts "-" * 50
258290
tracer.in_span("example-temperature-variations") do
259291
[0.0, 0.7, 1.0].each do |temp|
@@ -269,8 +301,8 @@
269301
end
270302
end
271303

272-
# Example 7: Advanced Parameters Showcase
273-
puts "\n7. Advanced Parameters (metadata capture)"
304+
# Example 8: Advanced Parameters Showcase
305+
puts "\n8. Advanced Parameters (metadata capture)"
274306
puts "-" * 50
275307
tracer.in_span("example-advanced-params") do
276308
response = client.chat.completions.create(
@@ -302,6 +334,7 @@
302334
puts "This golden example validates that the Ruby SDK properly captures:"
303335
puts " ✓ Vision messages with array content (text + image_url)"
304336
puts " ✓ Tool calling (single and multi-turn with tool_call_id)"
337+
puts " ✓ Streaming chat completions with chunk aggregation"
305338
puts " ✓ Mixed content messages (multiple text/image blocks)"
306339
puts " ✓ Advanced token metrics (cached, reasoning, audio tokens)"
307340
puts " ✓ All request parameters (temperature, top_p, seed, user, etc.)"

0 commit comments

Comments
 (0)