Skip to content

5. Limited Documentation and Low Observability in Core Functions #6

Description

@sniperx-19

Problem

The codebase has minimal documentation and relies heavily on print() statements for operational visibility.

This creates problems in production:

  • Harder onboarding for new engineers
  • Inconsistent debugging experience
  • No structured logs for monitoring or incident response

Fix

Add clear docstrings, type hints, and structured logging in high-value functions (especially LLM analysis and orchestration paths).

This improves maintainability, debuggability, and production observability.

Updated Code

import logging
from typing import Any, Dict
from openai import OpenAI

logger = logging.getLogger(__name__)

def analyze_genai_relevance(llm: OpenAI, title: str) -> Dict[str, Any]:
    """
    Determine whether a Reddit post title is related to Generative AI.

    Args:
        llm: OpenAI-compatible client instance.
        title: Reddit submission title.

    Returns:
        Dictionary with:
        - thinking: model reasoning text (if returned)
        - response: parsed JSON payload with relevance fields

    Expected response shape:
        {
            "thinking": "...",
            "response": {
                "is_genai_related": bool,
                "relevance_type": "direct" | "indirect" | "none"
            }
        }
    """
    system_prompt = """You are a helpful AI assistant. Based on the title
of the article, determine if the content relates to Generative AI.

Return JSON:
{
  "is_genai_related": true/false,
  "relevance_type": "direct/indirect/none"
}
"""

    try:
        logger.info("Analyzing GenAI relevance for title", extra={"title": title})

        response = llm.chat.completions.create(
            model="deepseek-r1",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": title},
            ],
            max_tokens=500,
        )

        raw_content = response.choices[0].message.content or ""
        parsed = parse_llm_response_generic(raw_content)

        logger.info(
            "GenAI relevance analysis completed",
            extra={
                "title": title,
                "is_genai_related": parsed.get("response", {}).get("is_genai_related"),
                "relevance_type": parsed.get("response", {}).get("relevance_type"),
            },
        )

        return parsed

    except Exception as e:
        logger.exception("GenAI relevance analysis failed", extra={"title": title})
        return {"thinking": "", "response": {"is_genai_related": False, "relevance_type": "none"}}

Benefits

  • Maintainability

    • Clear function behavior and expected return structure
    • Easier for collaborators to extend safely
  • Observability

    • Structured logs improve debugging and monitoring
    • Better support for production incident investigation
  • Reliability

    • Safer fallback behavior on LLM failure
    • Reduces downstream crashes from malformed outputs

Notes (Recommended Next Step)

Replace remaining print() statements in the agent class with logging and add request IDs / run IDs for end-to-end traceability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions