Fix: inconsistency between FuzzIntrospector API and YAML files #1183#1220
Open
dev-kvt wants to merge 1 commit intogoogle:mainfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
To maintain a strictly professional tone suitable for a high-level engineering contribution to a Google-managed repository, here is the refined Pull Request description.
Fix: Dynamic resolution for mismatched API/YAML function signatures (#1183)
Summary
This PR introduces a dynamic signature resolution strategy within ContextRetriever. It addresses critical discrepancies between static benchmark definitions (YAML) and the FuzzIntrospector (FI) API, ensuring accurate context retrieval for LLM prompt generation.
Root Cause Analysis
The oss-fuzz-gen pipeline utilizes function signatures defined in benchmark-sets/*.yaml to query the FI API. However, significant schema drift exists between these static YAML files and the internal representation in FuzzIntrospector.
YAML: GObex * g_obex_new(GIOChannel *, GObexTransportType, gssize, gssize)
FI API: GObex *g_obex_new(GIOChannel *io, GObexTransportType transport_type, ...)
Since the FI API enforces strict string matching for signature lookups, these semantic equivalents fail validation. This results in 404 errors for source code and cross-reference queries, which prevents the LLM from receiving necessary context during the prompt construction phase.
Proposed Solution
I have implemented a priority-based lookup mechanism that treats the FuzzIntrospector API as the canonical source of truth for signatures, while retaining the YAML configuration as a secondary fallback.
Workflow:
Query: The system attempts to resolve the canonical signature via the FI API using the function name, which remains stable across versions.
Canonicalization: If a signature is returned, it is cached as the _real_function_signature and used for all subsequent API operations (cross-references, headers, and implementation retrieval).
Graceful Degradation: If the API returns no result or is unreachable, the system falls back to the original signature provided in the benchmark YAML.
Implementation Details
File: data_prep/project_context/context_introspector.py
Centralized Resolution: Introduced the _get_real_function_signature method to abstract the lookup and caching logic.
Integration: Refactored _get_function_implementation, _get_xrefs_to_function, and _get_embeddable_declaration to utilize this resolved signature instead of accessing the raw benchmark data directly.
`def _get_real_function_signature(self) -> str:
"""
Resolves the authoritative function signature.
Prioritizes the FuzzIntrospector API representation; falls back to
the benchmark configuration if the API is unreachable or returns None.
"""
if self._real_function_signature:
return self._real_function_signature