Revert: Remove chat feature (accidental direct push)#206
Conversation
This reverts commit 7b3c22e.
Summary of ChangesHello @NotYuSheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request is a revert of a previous commit that accidentally removed the chat feature by being pushed directly to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||||||
There was a problem hiding this comment.
Code Review
This pull request reverts an accidental commit and correctly re-introduces the chat feature. This is a substantial addition, including a new chat_service, frontend UI components, extensive RAG-based processing logic, and all the necessary infrastructure and configuration changes. The implementation is comprehensive and well-structured. My review focuses on several critical bugs in the tests that would prevent them from passing, a potential runtime bug in the Dockerfile configuration, and several medium-severity suggestions to improve maintainability, robustness, and consistency across the new services.
| ENV PYTHONPATH=/app/chat_service:/app/shared_utils | ||
|
|
||
| RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Copy requirements.txt from chat_service folder in root context | ||
| COPY chat_service/requirements.txt . | ||
|
|
||
| RUN pip install --no-cache-dir -r requirements.txt | ||
|
|
||
| # Copy utils folder from root context into /app/utils | ||
| COPY shared_utils ./shared_utils | ||
|
|
||
| # Copy chat_service folder from root context into /app/chat_service | ||
| COPY chat_service ./chat_service | ||
|
|
||
| # Remove unnecessary system packages | ||
| RUN apt-get update && \ | ||
| DEBIAN_FRONTEND=noninteractive apt-get remove --purge -y linux-libc-dev && \ | ||
| apt-get autoremove -y && \ | ||
| apt-get clean && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Create non-root user for security and set permissions | ||
| RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \ | ||
| chown -R appuser:appuser /app | ||
|
|
||
| USER appuser | ||
|
|
||
| # Expose the FastAPI port | ||
| EXPOSE 8000 | ||
|
|
||
| CMD ["sh", "-c", "PYTHONPATH=/app/chat_service uvicorn chat_service.main:app --host 0.0.0.0 --port 8000"] |
There was a problem hiding this comment.
The CMD instruction is incorrectly overriding the PYTHONPATH environment variable, which will likely cause ImportError at runtime. Additionally, the Dockerfile can be optimized to reduce image size by combining RUN layers.
-
CMDBug: TheENVat line 5 correctly setsPYTHONPATHto include bothchat_serviceandshared_utils. However, theCMDat line 36 re-definesPYTHONPATHfor its execution scope but omits/app/shared_utils. This will prevent modules fromshared_utilsfrom being imported. -
Layer Optimization: The Dockerfile runs
apt-get updatetwice (lines 7 and 21) and uses separateRUNcommands to install and then remove packages. This creates unnecessary layers. These steps should be combined into a singleRUNcommand to install build dependencies, use them to install Python packages, and then clean them up in the same layer.
I suggest refactoring the CMD to use the exec form and rely on the ENV variable, and consolidating the RUN commands for better efficiency and a smaller final image.
ENV PYTHONPATH=/app
RUN apt-get update && \
apt-get install -y --no-install-recommends build-essential && \
rm -rf /var/lib/apt/lists/*
# Copy requirements.txt from chat_service folder in root context
COPY chat_service/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
apt-get purge -y --auto-remove build-essential && \
rm -rf /var/lib/apt/lists/*
# Copy utils folder from root context into /app/utils
COPY shared_utils ./shared_utils
# Copy chat_service folder from root context into /app/chat_service
COPY chat_service ./chat_service
# Create non-root user for security and set permissions
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /bin/bash appuser && \
chown -R appuser:appuser /app
USER appuser
# Expose the FastAPI port
EXPOSE 8000
CMD ["uvicorn", "chat_service.main:app", "--host", "0.0.0.0", "--port", "8000"]
| result = await perform_rag_query( | ||
| query="test query", | ||
| collection_name="test_collection", | ||
| top_k=5, | ||
| openai_client=AsyncMock(), | ||
| ) | ||
|
|
||
| user_prompt, chunks, system_prompt, query_type = result | ||
| assert "couldn't find any relevant information" in user_prompt | ||
| assert chunks == [] | ||
| assert query_type == "general" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_successful_rag_query(self, sample_chroma_results): | ||
| """Test successful RAG query execution""" | ||
| mock_chunks = [ | ||
| {"content": "test content", "similarity_score": 0.9, "doc_id": "doc1"} | ||
| ] | ||
|
|
||
| with ( | ||
| patch("routers.chat.get_chroma_client") as mock_chroma, | ||
| patch("routers.chat.prepare_retrieval_results") as mock_prepare, | ||
| patch("routers.chat.rag_optimizer") as mock_optimizer, | ||
| patch("routers.chat.prompt_templates") as mock_templates, | ||
| ): | ||
| mock_collection = AsyncMock() | ||
| mock_collection.query.return_value = sample_chroma_results | ||
| mock_chroma.return_value.get_collection.return_value = mock_collection | ||
| mock_prepare.return_value = mock_chunks | ||
|
|
||
| mock_optimizer.chunk_optimization.return_value = ( | ||
| mock_chunks, | ||
| "test context", | ||
| ) | ||
| # Make detect_query_type async mock | ||
| mock_optimizer.detect_query_type = AsyncMock(return_value="summary") | ||
|
|
||
| mock_templates.get_system_prompt.return_value = "system prompt" | ||
| mock_templates.format_user_prompt.return_value = "user prompt" | ||
|
|
||
| result = await perform_rag_query( | ||
| query="test query", | ||
| collection_name="test_collection", | ||
| top_k=5, | ||
| openai_client=AsyncMock(), | ||
| ) |
There was a problem hiding this comment.
| class TestChatRequest: | ||
| def test_valid_chat_request(self): | ||
| """Test creation of valid ChatRequest""" | ||
| request = ChatRequest( | ||
| message="What is this document about?", collection_name="test_collection" | ||
| ) | ||
|
|
||
| assert request.message == "What is this document about?" | ||
| assert request.collection_name == "test_collection" | ||
| assert request.doc_id is None | ||
|
|
||
| def test_chat_request_with_doc_id(self): | ||
| """Test ChatRequest with optional doc_id""" | ||
| request = ChatRequest( | ||
| message="What is this document about?", | ||
| collection_name="test_collection", | ||
| doc_id="doc123", | ||
| ) | ||
|
|
||
| assert request.doc_id == "doc123" | ||
|
|
||
| def test_chat_request_default_collection(self): | ||
| """Test ChatRequest with default collection name""" | ||
| request = ChatRequest(message="Test message") | ||
| assert request.collection_name == "default_collection" | ||
|
|
||
| def test_chat_request_empty_message_allowed(self): | ||
| """Test ChatRequest allows empty message (Pydantic v2 behavior)""" | ||
| # In Pydantic v2, empty strings are valid unless explicitly constrained | ||
| request = ChatRequest(message="", collection_name="test") | ||
| assert request.message == "" | ||
| assert request.collection_name == "test" | ||
|
|
||
| def test_invalid_chat_request_no_message(self): | ||
| """Test ChatRequest validation without message""" | ||
| with pytest.raises(ValidationError): | ||
| ChatRequest(collection_name="test") |
There was a problem hiding this comment.
The tests for the ChatRequest model are incorrect and will fail due to several issues:
- Missing
session_id: TheChatRequestmodel requires asession_id, but it is not provided in any of the test instantiations, which will cause aValidationError. - Incorrect Attribute Name: The tests refer to
doc_id(singular), but the model attribute isdoc_ids(plural). - Incorrect Attribute Type: The tests assign a string to
doc_id, but the model expectsdoc_idsto be aList[str].
These tests need to be corrected to align with the ChatRequest model definition.
| from typing import List, Dict, Any, Optional | ||
| from openai import AsyncOpenAI, APIError | ||
| import os | ||
| import logging | ||
| from enum import Enum | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class EnhancedQueryValidator: | ||
| """Enhanced query validation system for LLM""" | ||
|
|
||
| def __init__(self): | ||
| self.validation_examples = self._get_validation_examples() | ||
|
|
||
| def _get_validation_examples(self) -> List[Dict[str, Any]]: | ||
| """Get comprehensive examples for query validation""" | ||
| return [ | ||
| # PROCEED_WITH_RAG examples | ||
| { | ||
| "query": "Describe the CEO's role in the company", | ||
| "decision": "PROCEED_WITH_RAG", | ||
| "reason": "Factual extraction from specific documents", | ||
| }, | ||
| { | ||
| "query": "What are the main findings in the quarterly report?", | ||
| "decision": "PROCEED_WITH_RAG", | ||
| "reason": "Specific document content request", | ||
| }, | ||
| { | ||
| "query": "How does the company's revenue compare to last year?", | ||
| "decision": "PROCEED_WITH_RAG", | ||
| "reason": "Analytical question requiring document data", | ||
| }, | ||
| { | ||
| "query": "List all the safety protocols mentioned in the manual", | ||
| "decision": "PROCEED_WITH_RAG", | ||
| "reason": "Factual extraction from specific documents", | ||
| }, | ||
| { | ||
| "query": "Summarize the key recommendations from the research paper", | ||
| "decision": "PROCEED_WITH_RAG", | ||
| "reason": "Summarization of document content", | ||
| }, | ||
| # HANDLE_WITHOUT_RAG examples | ||
| { | ||
| "query": "What is machine learning?", | ||
| "decision": "HANDLE_WITHOUT_RAG", | ||
| "reason": "General knowledge question not requiring specific documents", | ||
| }, | ||
| { | ||
| "query": "How do I calculate compound interest?", | ||
| "decision": "HANDLE_WITHOUT_RAG", | ||
| "reason": "General procedural knowledge", | ||
| }, | ||
| { | ||
| "query": "Hello, how are you?", | ||
| "decision": "HANDLE_WITHOUT_RAG", | ||
| "reason": "Conversational greeting", | ||
| }, | ||
| { | ||
| "query": "Can you help me brainstorm ideas for a presentation?", | ||
| "decision": "HANDLE_WITHOUT_RAG", | ||
| "reason": "Creative assistance not requiring document search", | ||
| }, | ||
| # INVALID_QUERY examples | ||
| { | ||
| "query": "banana", | ||
| "decision": "INVALID_QUERY", | ||
| "reason": "Single word without context", | ||
| }, | ||
| { | ||
| "query": "ajsdlkfj askdjf", | ||
| "decision": "INVALID_QUERY", | ||
| "reason": "Gibberish text", | ||
| }, | ||
| { | ||
| "query": "???", | ||
| "decision": "INVALID_QUERY", | ||
| "reason": "No meaningful content", | ||
| }, | ||
| # NEEDS_CLARIFICATION examples | ||
| { | ||
| "query": "Tell me about it", | ||
| "decision": "NEEDS_CLARIFICATION", | ||
| "reason": "Ambiguous reference - what is 'it'?", | ||
| }, | ||
| { | ||
| "query": "What about the data?", | ||
| "decision": "NEEDS_CLARIFICATION", | ||
| "reason": "Vague reference - which data and what aspect?", | ||
| }, | ||
| { | ||
| "query": "More information please", | ||
| "decision": "NEEDS_CLARIFICATION", | ||
| "reason": "No specific topic or context provided", | ||
| }, | ||
| ] | ||
|
|
||
| def get_enhanced_validation_prompt( | ||
| self, query: str, collection_info: Optional[str] = None | ||
| ) -> str: | ||
| """Generate enhanced validation prompt with context awareness""" | ||
|
|
||
| examples_text = "\n".join( | ||
| [ | ||
| f'Query: "{ex["query"]}" -> {ex["decision"]} ({ex["reason"]})' | ||
| for ex in self.validation_examples | ||
| ] | ||
| ) | ||
|
|
||
| collection_context = "" | ||
| if collection_info: | ||
| collection_context = f"\nCollection Context: {collection_info}" | ||
|
|
||
| return f"""You are an intelligent query router for a document-based Q&A system using RAG (Retrieval Augmented Generation). | ||
|
|
||
| Your task is to analyze user queries and decide the best approach to handle them. | ||
|
|
||
| DECISION OPTIONS: | ||
| 1. PROCEED_WITH_RAG - Query needs document search and retrieval | ||
| - Asks for specific information from documents | ||
| - Requires analysis of document content | ||
| - Seeks factual data, summaries, or insights from specific sources | ||
|
|
||
| 2. HANDLE_WITHOUT_RAG - Query can be answered with general knowledge | ||
| - General knowledge questions | ||
| - Conversational queries | ||
| - Procedural or how-to questions not requiring specific documents | ||
| - Greetings and casual conversation | ||
|
|
||
| 3. INVALID_QUERY - Query is meaningless or too vague | ||
| - Gibberish or random characters | ||
| - Single words without context | ||
| - Incomplete thoughts | ||
|
|
||
| 4. NEEDS_CLARIFICATION - Query is ambiguous and needs more context | ||
| - Vague references without clear subjects | ||
| - Missing important context | ||
| - Ambiguous pronouns or references | ||
|
|
||
| EXAMPLES: | ||
| {examples_text}{collection_context} | ||
|
|
||
| INSTRUCTIONS: | ||
| Analyze the query considering: | ||
| 1. Specificity - Is the query specific enough to be actionable? | ||
| 2. Context dependency - Does it require specific document content? | ||
| 3. Intent clarity - Is the user's intent clear? | ||
| 4. Scope - Is it asking for general knowledge vs. specific document information? | ||
|
|
||
| Respond in this exact format: | ||
| DECISION: [PROCEED_WITH_RAG|HANDLE_WITHOUT_RAG|INVALID_QUERY|NEEDS_CLARIFICATION] | ||
| CONFIDENCE: [HIGH|MEDIUM|LOW] | ||
| REASON: [Brief explanation of your decision] | ||
| SUGGESTION: [If not PROCEED_WITH_RAG, suggest how to handle or what clarification is needed] | ||
|
|
||
| Query: "{query}" | ||
| """ | ||
|
|
||
|
|
||
| class QueryType(Enum): | ||
| """Enumeration of supported query types""" | ||
|
|
||
| GENERAL = "general" | ||
| FACTUAL = "factual" | ||
| ANALYTICAL = "analytical" | ||
| SUMMARIZATION = "summarization" | ||
|
|
||
|
|
||
| class RAGConfig: | ||
| """Configuration class for the model's RAG optimization""" | ||
|
|
||
| def __init__(self): | ||
| self.model_name = os.getenv("OPENAI_MODEL") | ||
|
|
||
| # LLM's Generation parameters optimized for RAG | ||
| self.generation_params = { | ||
| "temperature": float(os.getenv("MODEL_TEMPERATURE", "0.1")), | ||
| "max_tokens": int(os.getenv("MODEL_MAX_TOKENS", "2000")), | ||
| "top_p": float(os.getenv("MODEL_TOP_P", "0.8")), | ||
| "frequency_penalty": float(os.getenv("MODEL_FREQ_PENALTY", "0.1")), | ||
| "presence_penalty": float(os.getenv("MODEL_PRESENCE_PENALTY", "0.1")), | ||
| } | ||
|
|
||
| # Query parameters (lighter settings for query validation and classification) | ||
| self.validation_params = { | ||
| "temperature": 0.0, | ||
| "max_tokens": 500, | ||
| "top_p": float(os.getenv("MODEL_TOP_P", "0.8")), | ||
| "frequency_penalty": 0.0, | ||
| "presence_penalty": 0.0, | ||
| } | ||
|
|
||
| # Context management | ||
| self.max_context_length = int(os.getenv("MODEL_MAX_CONTEXT", "4000")) | ||
|
|
||
| # RAG-specific settings | ||
| self.min_similarity_score = float(os.getenv("MODEL_MIN_SIMILARITY", "0.1")) | ||
| self.enable_reranking = ( | ||
| os.getenv("MODEL_ENABLE_RERANKING", "true").lower() == "true" | ||
| ) | ||
|
|
||
| # RAG Optimization flags | ||
| self.enable_llm_query_classification = ( | ||
| os.getenv("ENABLE_LLM_QUERY_CLASSIFICATION", "true").lower() == "true" | ||
| ) | ||
| self.enable_response_post_processing = ( | ||
| os.getenv("ENABLE_RESPONSE_POST_PROCESSING", "true").lower() == "true" | ||
| ) | ||
|
|
||
|
|
||
| class QueryClassificationExamples: | ||
| """Few-shot examples for query type classification""" | ||
|
|
||
| @staticmethod | ||
| def get_classification_examples() -> List[Dict[str, str]]: | ||
| """Returns comprehensive examples for each query type""" | ||
| return [ | ||
| # Factual queries - seeking specific information, data, or facts | ||
| {"query": "What is the capital of France?", "type": "factual"}, | ||
| { | ||
| "query": "When was the Declaration of Independence signed?", | ||
| "type": "factual", | ||
| }, | ||
| {"query": "How many employees work at the company?", "type": "factual"}, | ||
| {"query": "List all the ingredients in the recipe.", "type": "factual"}, | ||
| { | ||
| "query": "What are the system requirements for the software?", | ||
| "type": "factual", | ||
| }, | ||
| {"query": "Who is the CEO of the organization?", "type": "factual"}, | ||
| {"query": "Define machine learning.", "type": "factual"}, | ||
| {"query": "What is my shopping list?", "type": "factual"}, | ||
| {"query": "Show me the sales figures for last quarter.", "type": "factual"}, | ||
| # Analytical queries - requiring analysis, comparison, evaluation | ||
| { | ||
| "query": "Why did the stock price decline last month?", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "How does renewable energy compare to fossil fuels?", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "Analyze the pros and cons of remote work.", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "What are the implications of this policy change?", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "Evaluate the effectiveness of the marketing campaign.", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "How do these two products differ in performance?", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "Assess the risks associated with this investment.", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "What factors contributed to the project's success?", | ||
| "type": "analytical", | ||
| }, | ||
| { | ||
| "query": "Examine the relationship between customer satisfaction and retention.", | ||
| "type": "analytical", | ||
| }, | ||
| # Summarization queries - requesting condensed overview or summary | ||
| { | ||
| "query": "Summarize the main points of the report.", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "Give me an overview of the quarterly results.", | ||
| "type": "summarization", | ||
| }, | ||
| {"query": "Summarize my shopping list.", "type": "summarization"}, | ||
| { | ||
| "query": "Provide a summary of the meeting minutes.", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "What are the key takeaways from the research paper?", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "Condense the main findings of the survey.", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "Give me the highlights of the project status.", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "Summarize the customer feedback trends.", | ||
| "type": "summarization", | ||
| }, | ||
| { | ||
| "query": "Provide an executive summary of the proposal.", | ||
| "type": "summarization", | ||
| }, | ||
| # General queries - conversational, complex, or multi-faceted | ||
| {"query": "How can I improve my productivity?", "type": "general"}, | ||
| {"query": "Tell me about artificial intelligence.", "type": "general"}, | ||
| {"query": "What should I consider when buying a car?", "type": "general"}, | ||
| {"query": "Help me understand this concept better.", "type": "general"}, | ||
| {"query": "Can you explain how this process works?", "type": "general"}, | ||
| { | ||
| "query": "What are some best practices for project management?", | ||
| "type": "general", | ||
| }, | ||
| {"query": "I need advice on career development.", "type": "general"}, | ||
| {"query": "How do I troubleshoot this technical issue?", "type": "general"}, | ||
| { | ||
| "query": "What would you recommend for this situation?", | ||
| "type": "general", | ||
| }, | ||
| ] | ||
|
|
||
|
|
||
| class PromptTemplates: | ||
| """Specialized prompt templates for different types of RAG queries with LLM""" | ||
|
|
||
| @staticmethod | ||
| def get_classification_prompt(query: str) -> str: | ||
| """Generate prompt for LLM-based query classification""" | ||
| examples = QueryClassificationExamples.get_classification_examples() | ||
|
|
||
| # Create few-shot examples string | ||
| examples_text = "\n".join( | ||
| [ | ||
| f'Query: "{example["query"]}" -> Type: {example["type"]}' | ||
| for example in examples | ||
| ] | ||
| ) | ||
|
|
||
| return f"""You are an expert at classifying user queries into specific types for optimal response generation. | ||
|
|
||
| Query Types: | ||
| - factual: Requests for specific information, data, facts, definitions, or lists | ||
| - analytical: Requests for analysis, comparison, evaluation, or reasoning about relationships and causes | ||
| - summarization: Requests for summaries, overviews, main points, or condensed information | ||
| - general: Conversational queries, advice-seeking, explanations, or multi-faceted questions | ||
|
|
||
| Examples: | ||
| {examples_text} | ||
|
|
||
| Instructions: | ||
| - Analyze the user's intent and primary goal | ||
| - Consider the expected response format and depth | ||
| - Classify based on what type of processing would best serve the user | ||
| - Respond with only one word: factual, analytical, summarization, or general | ||
|
|
||
| Query: "{query}" | ||
| Type:""" | ||
|
|
||
| @staticmethod | ||
| def get_system_prompt(query_type: str) -> str: | ||
| """Get system prompt based on query type""" | ||
|
|
||
| prompts = { | ||
| QueryType.GENERAL.value: """You are an advanced AI assistant specialized in document analysis and question answering. | ||
|
|
||
| Your primary responsibilities: | ||
| - Analyze the provided document context carefully | ||
| - Answer questions based ONLY on the information present in the context | ||
| - Maintain accuracy and avoid hallucination | ||
| - Provide structured, comprehensive responses | ||
| - Cite specific sections when making claims | ||
|
|
||
| Key principles: | ||
| - If information is not in the context, clearly state this limitation | ||
| - Use direct quotes from the context when appropriate | ||
| - Organize your response logically with clear reasoning | ||
| - Maintain professional tone while being accessible""", | ||
| QueryType.FACTUAL.value: """You are a precision-focused AI assistant for factual document analysis. | ||
|
|
||
| Your task is to extract and present factual information from documents with maximum accuracy: | ||
| - Only state facts that are explicitly mentioned in the context | ||
| - Use exact quotes when presenting specific data, numbers, or claims | ||
| - If asked about information not in the context, respond with "This information is not available in the provided document" | ||
| - Structure factual responses with clear categorization | ||
| - Distinguish between facts, opinions, and interpretations in the source | ||
| - For lists or itemized information, present them in a clear, organized format""", | ||
| QueryType.ANALYTICAL.value: """You are an analytical AI assistant specialized in document interpretation and analysis. | ||
|
|
||
| Your approach: | ||
| - Analyze the document context for patterns, themes, and key insights | ||
| - Synthesize information from multiple sections when relevant | ||
| - Provide reasoned interpretations based on the available evidence | ||
| - Highlight relationships between different parts of the document | ||
| - Distinguish between what the document states directly vs. what can be reasonably inferred | ||
| - Structure analytical responses with clear reasoning chains | ||
| - Consider multiple perspectives when the document presents them""", | ||
| QueryType.SUMMARIZATION.value: """You are an expert at document summarization and synthesis. | ||
|
|
||
| Your summarization strategy: | ||
| - Identify the main themes and key points from the context | ||
| - Organize information hierarchically (main points, supporting details) | ||
| - Preserve important nuances and qualifications | ||
| - Maintain the original document's tone and perspective | ||
| - Create coherent summaries that capture essential information | ||
| - Use bullet points or structured format when appropriate for clarity | ||
| - Ensure summaries are concise yet comprehensive""", | ||
| } | ||
|
|
||
| return prompts.get(query_type, prompts[QueryType.GENERAL.value]) | ||
|
|
||
| @staticmethod | ||
| def format_user_prompt(question: str, context: str, query_type: str) -> str: | ||
| """Format user prompt with context and question""" | ||
|
|
||
| prompt_formats = { | ||
| QueryType.FACTUAL.value: f"""**DOCUMENT CONTEXT:** | ||
| {context} | ||
|
|
||
| **FACTUAL QUERY:** {question} | ||
|
|
||
| **INSTRUCTIONS:** Extract and present only the factual information from the document that directly answers this question. Use exact quotes where appropriate and clearly indicate if the requested information is not available.""", | ||
| QueryType.ANALYTICAL.value: f"""**DOCUMENT CONTEXT:** | ||
| {context} | ||
|
|
||
| **ANALYTICAL QUERY:** {question} | ||
|
|
||
| **INSTRUCTIONS:** Analyze the document context to provide a comprehensive answer. Consider relationships between different parts of the document and provide reasoned interpretations based on the evidence presented.""", | ||
| QueryType.SUMMARIZATION.value: f"""**DOCUMENT CONTEXT:** | ||
| {context} | ||
|
|
||
| **SUMMARIZATION REQUEST:** {question} | ||
|
|
||
| **INSTRUCTIONS:** Create a well-structured summary addressing the request. Organize the information logically and maintain the document's key insights and perspective.""", | ||
| QueryType.GENERAL.value: f"""**DOCUMENT CONTEXT:** | ||
| {context} | ||
|
|
||
| **QUESTION:** {question} | ||
|
|
||
| **INSTRUCTIONS:** Based on the document context provided above, give a comprehensive and accurate answer to the question. If the context doesn't contain sufficient information, clearly explain what information is missing.""", | ||
| } | ||
|
|
||
| return prompt_formats.get(query_type, prompt_formats[QueryType.GENERAL.value]) | ||
|
|
||
|
|
||
| class RAGOptimizer: | ||
| """Advanced optimization techniques for RAG""" | ||
|
|
||
| @staticmethod | ||
| async def classify_query_with_llm( | ||
| question: str, | ||
| model_name: str, | ||
| classification_params: Dict[str, Any], | ||
| openai_client: AsyncOpenAI = None, | ||
| ) -> str: | ||
| """Use LLM to classify query type with few-shot examples""" | ||
| try: | ||
| classification_prompt = PromptTemplates.get_classification_prompt(question) | ||
|
|
||
| messages = [{"role": "user", "content": classification_prompt}] | ||
|
|
||
| response = await openai_client.chat.completions.create( | ||
| model=model_name, messages=messages, **classification_params | ||
| ) | ||
|
|
||
| if response.choices and response.choices[0].message.content: | ||
| predicted_type = response.choices[0].message.content.strip().lower() | ||
|
|
||
| # Validate the predicted type | ||
| valid_types = [qt.value for qt in QueryType] | ||
| if predicted_type in valid_types: | ||
| logger.info(f"LLM classified query as: {predicted_type}") | ||
| return predicted_type | ||
| else: | ||
| logger.warning(f"LLM returned invalid query type: {predicted_type}") | ||
| return QueryType.GENERAL.value | ||
| else: | ||
| logger.error("LLM classification returned empty response") | ||
| return QueryType.GENERAL.value | ||
|
|
||
| except APIError as e: | ||
| logger.error(f"LLM query classification failed: {e}") | ||
| return QueryType.GENERAL.value | ||
|
|
||
| @staticmethod | ||
| async def detect_query_type( | ||
| question: str, | ||
| model_name: str = None, | ||
| config: RAGConfig = None, | ||
| openai_client: AsyncOpenAI = None, | ||
| ) -> str: | ||
| """ | ||
| Detect query type using LLM classification | ||
| """ | ||
| # Try LLM-based classification if enabled and client is available | ||
| if ( | ||
| config | ||
| and config.enable_llm_query_classification | ||
| and openai_client | ||
| and model_name | ||
| ): | ||
| llm_classification = await RAGOptimizer.classify_query_with_llm( | ||
| question, model_name, config.validation_params, openai_client | ||
| ) | ||
|
|
||
| return llm_classification | ||
|
|
||
| # If LLM classification failed, return General | ||
| return QueryType.GENERAL.value | ||
|
|
||
| @staticmethod | ||
| def chunk_optimization( | ||
| chunks: List[Dict[str, Any]], max_context_length: int = 4000 | ||
| ) -> tuple[List[Dict[str, Any]], str]: | ||
| """Optimize chunk selection and formatting""" | ||
|
|
||
| if not chunks: | ||
| return [], "" | ||
|
|
||
| # Build context with length management | ||
| selected_chunks = [] | ||
| context_parts = [] | ||
| current_length = 0 | ||
|
|
||
| for i, chunk in enumerate(chunks): | ||
| content = chunk.get("content", "") | ||
|
|
||
| # Format each chunk | ||
| chunk_header = f"--- Document Section {i + 1} (Relevance: {chunk.get('similarity_score', 0):.2f}) ---" | ||
| formatted_chunk = f"{chunk_header}\n{content}\n" | ||
|
|
||
| # Check length constraints | ||
| chunk_length = len(formatted_chunk) | ||
| if current_length + chunk_length > max_context_length: | ||
| break | ||
|
|
||
| selected_chunks.append(chunk) | ||
| context_parts.append(formatted_chunk) | ||
| current_length += chunk_length | ||
|
|
||
| return selected_chunks, "\n".join(context_parts) | ||
|
|
||
| @staticmethod | ||
| def post_process_llm_response(response: str) -> str: | ||
| """Post-process LLM response for better formatting""" | ||
|
|
||
| # Remove only consecutive duplicate lines to avoid breaking formatted content. | ||
| lines = response.split("\n") | ||
| if not lines: | ||
| return "" | ||
|
|
||
| filtered_lines = [lines[0]] | ||
| for i in range(1, len(lines)): | ||
| current_line_stripped = lines[i].strip() | ||
| prev_line_stripped = lines[i - 1].strip() | ||
| if not current_line_stripped or current_line_stripped != prev_line_stripped: | ||
| filtered_lines.append(lines[i]) | ||
|
|
||
| cleaned_response = "\n".join(filtered_lines) | ||
|
|
||
| # Ensure response ends properly | ||
| if cleaned_response and not cleaned_response.rstrip().endswith( | ||
| (".", "!", "?", ":") | ||
| ): | ||
| cleaned_response = cleaned_response.rstrip() + "." | ||
|
|
||
| return cleaned_response.strip() |
There was a problem hiding this comment.
This file has grown to nearly 600 lines and contains multiple distinct classes (EnhancedQueryValidator, RAGConfig, QueryClassificationExamples, PromptTemplates, RAGOptimizer). For better maintainability and separation of concerns, consider splitting this file into smaller, more focused modules. For example, prompt generation logic could go into a prompts.py module, and configuration could be in a config.py module.
| # Ensure response ends properly | ||
| if cleaned_response and not cleaned_response.rstrip().endswith( | ||
| (".", "!", "?", ":") | ||
| ): | ||
| cleaned_response = cleaned_response.rstrip() + "." |
There was a problem hiding this comment.
The post_process_llm_response function unconditionally appends a period . to any response that doesn't end with standard punctuation. This could lead to incorrect formatting if the LLM returns a code snippet, a list, or any other formatted text that shouldn't end with a period. Consider making this behavior conditional or refining the logic to avoid altering non-prose responses.
| return ChatResponse( | ||
| response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""", | ||
| relevant_chunks=[], | ||
| metadata=metadata_without_rag, | ||
| ) |
There was a problem hiding this comment.
The error message returned when a query fails validation is quite long and repetitive, mentioning the collection name twice. This could be simplified for better readability and a cleaner user experience.
| return ChatResponse( | |
| response=f"""I'm sorry, but I couldn't process your query based on the documents in {chat_request.collection_name} that you want to query data from. {validation_error} Please provide a clear, specific question that I can help answer using the available documents in {chat_request.collection_name}.""", | |
| relevant_chunks=[], | |
| metadata=metadata_without_rag, | |
| ) | |
| return ChatResponse( | |
| response=f"I'm sorry, I couldn't process your query. {validation_error} Please provide a clear, specific question about the available documents.", | |
| relevant_chunks=[], | |
| metadata=metadata_without_rag, | |
| ) |
| response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/", | ||
| json={"message": prompt, | ||
| "doc_ids": doc_ids, | ||
| "collection_name": "SemanticEmbeds" | ||
| }) |
There was a problem hiding this comment.
The collection_name is hardcoded as "SemanticEmbeds" in the client.post call, ignoring the collection_name parameter passed to the chat_with_rag function. To make the function more reusable, you should use the collection_name parameter in the request payload.
| response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/", | |
| json={"message": prompt, | |
| "doc_ids": doc_ids, | |
| "collection_name": "SemanticEmbeds" | |
| }) | |
| response = await client.post(url=f"{PDF_PROCESSOR_URL}/chat/", | |
| json={"message": prompt, | |
| "doc_ids": doc_ids, | |
| "collection_name": collection_name | |
| }) |
| # Pre-staging environment overrides - minimal for disk pressure | ||
| replicaCount: 1 | ||
|
|
||
| # Pre-staging LLM configuration - External vLLM service on VPN | ||
| llm: | ||
| baseUrl: "http://100.108.2.57/vllm_qwen2.5/v1" | ||
| model: "Qwen2.5-14B-Coder-Instruct" # Model name as served by external vLLM | ||
|
|
||
| # Pre-staging resources - minimal for disk pressure | ||
| resources: | ||
| limits: | ||
| cpu: 200m | ||
| memory: 256Mi | ||
| requests: | ||
| cpu: 50m | ||
| memory: 64Mi | ||
|
|
||
| # Disable autoscaling for prestaging due to disk pressure | ||
| autoscaling: | ||
| enabled: false | ||
|
|
||
| # Pre-staging secrets | ||
| secrets: | ||
| create: false | ||
| name: chat-service-secrets | ||
|
|
||
| # Pre-staging specific configurations - using local CRC registry | ||
| image: | ||
| repository: default-route-openshift-image-registry.apps-crc.testing/omnipdf/chat_service | ||
| tag: "dev-v0.0.3-5d69f89" | ||
| pullPolicy: Always # Use local registry image | ||
|
|
||
| # Environment variables | ||
| env: | ||
| - name: LOG_LEVEL | ||
| value: "INFO" | ||
| - name: ENVIRONMENT | ||
| value: "prestaging" | ||
| - name: ENABLE_METRICS | ||
| value: "true" | ||
|
|
||
| # Network policy disabled for pre-staging flexibility | ||
| networkPolicy: | ||
| enabled: true | ||
| ingress: | ||
| enabled: true | ||
| allowedCallers: | ||
| # PDF Processor Service - RAG conversations | ||
| - podSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/name: pdf-processor-service | ||
| egress: | ||
| enabled: true | ||
| allowDNS: true | ||
| allowHTTPS: true # External vLLM API calls | ||
| allowHTTP: true # External vLLM API calls | ||
| allowedTargets: | ||
| # ChromaDB - query vectors | ||
| - podSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/name: chromadb | ||
| ports: [8000] | ||
| # MinIO - future job status + file operations | ||
| - podSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/name: minio | ||
| ports: [9000] | ||
| # Redis - future session management | ||
| - podSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/name: redis | ||
| ports: [6379] | ||
|
|
||
| # Pod disruption budget for pre-staging (2 replicas, allow 1 down) | ||
| podDisruptionBudget: | ||
| enabled: true | ||
| minAvailable: 1 | ||
|
|
||
| # Moderate deployment strategy for pre-staging (balanced speed/safety) | ||
| deploymentStrategy: | ||
| type: RollingUpdate | ||
| rollingUpdate: | ||
| minAvailable: 1 | ||
| maxSurge: 1 | ||
|
|
||
| # Moderate resource quota for pre-staging testing | ||
| serviceMonitor: | ||
| enabled: true | ||
| interval: "20s" # Between staging and prod intervals | ||
| scrapeTimeout: "8s" # Slightly shorter timeout for testing | ||
| labels: | ||
| environment: prestaging # Help identify prestaging metrics | ||
|
|
||
| # Add toleration for disk pressure to allow scheduling | ||
| tolerations: | ||
| - key: "node.kubernetes.io/disk-pressure" | ||
| operator: "Exists" | ||
| effect: "NoSchedule" | ||
|
|
||
| # Istio sidecar injection for service mesh | ||
| podAnnotations: | ||
| sidecar.istio.io/inject: "true" | ||
| sidecar.istio.io/proxyCPU: "15m" | ||
| sidecar.istio.io/proxyMemory: "128Mi" | ||
|
|
||
| # Service configuration | ||
| service: | ||
| type: ClusterIP | ||
| port: 8000 | ||
|
|
||
| # ServiceAccount configuration | ||
| serviceAccount: | ||
| create: false | ||
| automount: false | ||
|
|
||
| # Ingress configuration | ||
| ingress: | ||
| enabled: false | ||
| className: "" | ||
| annotations: {} | ||
| hosts: [] | ||
| tls: [] |
There was a problem hiding this comment.
The main values.yaml file contains values that are specific to a prestaging or development environment (e.g., image repository URLs pointing to a local CRC registry, specific image tags, and an IP-based URL for the LLM). Best practice for Helm charts is for the base values.yaml to contain generic, environment-agnostic defaults, with environment-specific overrides placed in separate files like values-prestaging.yaml. This makes the chart more reusable and easier to manage across different environments.
| from pydantic import BaseModel, Field | ||
| from typing import Optional, List, Dict, Any | ||
|
|
||
| SEMANTIC_EMBEDDING_COLLECTION = "SemanticEmbeds" | ||
| TEXTUAL_EMBEDDING_COLLECTION = "SentenceEmbeds" | ||
|
|
||
|
|
||
| class ChatRequest(BaseModel): | ||
| """ | ||
| Request model for chat API endpoints. | ||
| """ | ||
| message: str | ||
| doc_ids: list[str] | ||
| collection_name: Optional[str] = Field(default=SEMANTIC_EMBEDDING_COLLECTION, description="ChromaDB collection name") | ||
|
|
||
|
|
||
| class ChatResponse(BaseModel): | ||
| """ | ||
| Response model for chat API | ||
| """ | ||
| response: str | ||
| relevant_chunks: List[Dict[str, Any]] = Field(default_factory=list, description="Additional metadata about the RAG process") | ||
| metadata: Dict[str, Any] No newline at end of file |
There was a problem hiding this comment.
The ChatRequest and ChatResponse Pydantic models defined here are duplicates of the models in chat_service/models/chat.py. This duplication can lead to inconsistencies and maintenance issues if the models diverge. Consider moving these shared models to a common library, such as the existing shared_utils directory, to ensure a single source of truth.
| CHAT_SERVICE_URL = os.environ["CHAT_SERVICE_URL"] | ||
|
|
||
|
|
||
| @router.post("/", status_code=201, response_model=ChatResponse) | ||
| async def handle_chat( | ||
| chat_request: ChatRequest, | ||
| session_id: str = Depends(get_session_id), | ||
| _valid_session: bool = Depends(validate_session_id), | ||
| session_storage: SessionStorage = Depends(get_session_storage) | ||
| ): | ||
| """ | ||
| Handle chat requests with session validation and document access control. | ||
| If doc_id is provided, validates that the user has access to that document. | ||
| """ | ||
| for doc_id in chat_request.doc_ids: | ||
| validate_session_doc_pair(doc_id, session_id, session_storage, _valid_session) | ||
|
|
||
| logger.info(f"Processing chat request for session {session_id}") | ||
| logger.info(f"Query: {chat_request.message}") | ||
| logger.info(f"Document IDs: {chat_request.doc_ids}") | ||
| logger.info(f"Collection: {chat_request.collection_name}") | ||
|
|
||
| # Proxy request to chat service | ||
| chat_request_dict = chat_request.model_dump() | ||
| chat_request_dict['session_id'] = session_id | ||
| return await proxy_post(f"{CHAT_SERVICE_URL}/chat/", chat_request_dict) |
There was a problem hiding this comment.
The URL construction for proxying requests to the chat service is a bit confusing. The CHAT_SERVICE_URL environment variable includes a path segment (/chat), and then another /chat/ is appended at the call site. This makes the configuration brittle. It would be clearer to define CHAT_SERVICE_URL as just the base URL of the service (e.g., http://chat_service:8000) and construct the full path at the call site.
User description
Reverting the accidental direct push to dev. The proper workflow is to create a PR from a feature branch.
This PR reverts commit 7b3c22e which was accidentally pushed directly to dev instead of through a feature branch PR.
The correct PR for removing chat will be created from the
feat/remove-chat-featurebranch.PR Type
Enhancement, Tests, Documentation
Description
• Restores complete chat service functionality - Reverts accidental removal of comprehensive RAG-based chat system
• Implements RAG query processing - Adds query validation, document retrieval with ChromaDB, and LLM-based response generation
• Adds streaming chat UI - Creates interactive chat interface with document selection, preset queries, and word-by-word streaming
• Provides comprehensive testing - Includes 206+ unit tests across chat service components and models
• Configures full deployment infrastructure - Adds Helm charts, Kubernetes templates, network policies, and RBAC configuration
• Integrates with existing services - Updates PDF processor, frontend, and infrastructure components for chat service communication
• Adds security and monitoring - Includes Trivy security scanning, Prometheus metrics, and zero-trust network policies
• Updates documentation - Revises architecture diagrams, deployment guides, and service references
Diagram Walkthrough
File Walkthrough
13 files
rag_config.py
Add comprehensive RAG configuration and optimization systemchat_service/models/rag_config.py
• Added comprehensive RAG configuration system with query validation,
classification, and optimization
• Implemented
EnhancedQueryValidatorwith validation examples and prompt generation
• Created
RAGConfigclass for model parameters and settings management
• Added
QueryClassificationExamplesandPromptTemplatesfor different querytypes
• Implemented
RAGOptimizerwith LLM-based query classificationand response post-processing
chat.py
Implement chat router with RAG query processingchat_service/routers/chat.py
• Implemented complete chat endpoint with RAG query processing
• Added
query validation using LLM before performing RAG operations
•
Integrated ChromaDB for document retrieval with reranking capabilities
• Added session-based document access control and filtering
•
Implemented response generation with configurable post-processing
5_chat_UI.py
Add chat UI with document selection and streamingfrontend/my_pages/5_chat_UI.py
• Created complete chat interface with document selection and preset
queries
• Implemented streaming response display with configurable
word-by-word effect
• Added preset buttons for summarization, main
topic, and key findings
• Integrated session management and document
filtering for chat requests
main.py
Integrate chat service into frontend navigationfrontend/main.py
• Added
CHAT_URLenvironment variable configuration• Integrated chat
UI page into the main navigation structure
• Updated CSS styling for
chat container components
process.py
Add semantic embedder support to processing pipelinepdf_processor_service/utils/process.py
• Added
wait_for_semantic_embedderfunction for semantic embeddingprocessing
• Updated
process_file_basicto run semantic and sentenceembedders concurrently
• Imported
load_or_create_semantic_embedder_jobfunction
chat.py
Add chat router with session-based access controlpdf_processor_service/routers/chat.py
• Created chat router with session validation and document access
control
• Implemented request proxying to chat service with session ID
injection
• Added validation for document access permissions per
session
chat.py
Add chat data models for PDF processor servicepdf_processor_service/models/chat.py
• Defined
ChatRequestandChatResponsePydantic models• Added
collection name constants for semantic and textual embeddings
•
Configured default collection to use semantic embeddings
chat.py
Add chat data models for chat servicechat_service/models/chat.py
• Created
ChatRequestmodel with session ID and document ID support•
Implemented
ChatResponsemodel with relevant chunks and metadata•
Added field validation and default values for collection names
main.py
Integrate chat router into PDF processor servicepdf_processor_service/main.py
• Added chat router to the main FastAPI application
• Imported and
included chat router in the application routing
10_settings_UI.py
Add chat streaming settings to UI preferencesfrontend/my_pages/10_settings_UI.py
• Added chat settings section with streaming toggle control
•
Implemented user preference for word-by-word streaming effect
• Added
help text explaining the streaming functionality
main.py
Create main FastAPI application for chat servicechat_service/main.py
• Created main FastAPI application for chat service
• Added Prometheus
metrics instrumentation
• Configured logging and included health and
chat routers
process_pdf.py
Add chat service to backend health monitoringfrontend/components/process_pdf.py
• Added Chat Service to backend status monitoring
• Included
CHAT_URLenvironment variable in service health checks
health.py
Add health check endpoint for chat servicechat_service/routers/health.py
• Implemented basic health check endpoint
• Added simple status
response for service monitoring
5 files
test_chat.py
Add comprehensive unit tests for chat servicechat_service/tests/test_chat.py
• Added comprehensive unit tests for chat functionality
• Implemented
tests for retrieval result processing and chunk reranking
• Added
tests for RAG query execution and error handling
• Created tests for
chat endpoint with various scenarios including validation failures
test_models.py
Add unit tests for chat data modelschat_service/tests/test_models.py
• Added unit tests for
ChatRequestandChatResponsePydantic models•
Implemented validation tests for required fields and optional
parameters
• Added tests for model creation with various input
scenarios
conftest.py
Add test configuration with mocked dependencieschat_service/tests/conftest.py
• Created test configuration with mocked external dependencies
• Set
up environment variables for unit testing
• Added fixtures for mocking
OpenAI and ChromaDB clients
__init__.py
Initialize tests package for chat servicechat_service/tests/init.py
• Created tests package initialization file
• Added package marker for
chat service test modules
test-connection.yaml
Add Helm test template for chat servicehelm/chat-service/templates/tests/test-connection.yaml
• New Helm test template for chat service connectivity
• Defines
health check test using busybox wget
• Enables automated testing of
service availability
39 files
deploy-helm-charts.sh
Add chat service to Helm deployment scriptsscripts/deploy-helm-charts.sh
• Added chat-service to supported services list
• Updated examples and
documentation to reference chat-service
• Included chat-service in
deployment order for all services
load-images.sh
Update image loading examples for chat servicehelm/load-images.sh
• Updated examples to reference chat_service instead of
pdf_extraction_service
• Modified documentation and help text for chat
service image loading
create-secrets.sh
Add chat service to secret creation scriptcreate-secrets.sh
• Added chat-service to secret creation mapping
• Updated secret
listing pattern to include chat-service-secrets
scan_with_trivy.sh
Add chat service to security scanning pipelinescripts/scan_with_trivy.sh
• Added chat_service to the list of services for security scanning
•
Included chat service in Trivy vulnerability assessment
test-all-services.sh
Add chat service to testing pipelinescripts/test-all-services.sh
• Added chat_service to the list of services with unit tests
•
Included chat service in comprehensive testing pipeline
test-single-service.sh
Add chat service to single service testingscripts/test-single-service.sh
• Added chat_service to supported services list for individual testing
• Enabled single service test execution for chat service
_helpers.tpl
Add Helm template helpers for chat servicehelm/chat-service/templates/_helpers.tpl
• Created Helm template helpers for chat-service deployment
• Added
standard Kubernetes naming and labeling functions
• Implemented
service account and secret name generation helpers
Makefile
Update Makefile examples for chat serviceMakefile
• Updated examples and documentation to reference chat-service
•
Modified help text and usage examples for chat service deployment
•
Changed example service names from pdf-extraction-service to
chat-service
values-prestaging.yaml
Add chat service to Istio gateway configurationhelm/istio-gateway/values-prestaging.yaml
• Added chat-service destination rule for Istio service mesh
•
Configured mutual TLS mode for chat service communication
NOTES.txt
Add Helm chart notes template for chat servicehelm/chat-service/templates/NOTES.txt
• New Helm chart notes template for chat service deployment
• Includes
service access instructions, health checks, and testing commands
•
Provides monitoring and network security information
values.yaml
Add Helm values configuration for chat servicehelm/chat-service/values.yaml
• New Helm values configuration for chat service
• Defines LLM
configuration, resources, networking policies, and service settings
•
Includes Istio sidecar injection and monitoring configuration
values-prestaging.yaml
Add prestaging Helm values for chat servicehelm/chat-service/values-prestaging.yaml
• Prestaging-specific Helm values for chat service
• Configures
minimal resources, external vLLM integration, and network policies
•
Includes disk pressure tolerations and Istio configuration
deployment.yaml
Add Kubernetes deployment template for chat servicehelm/chat-service/templates/deployment.yaml
• New Kubernetes deployment template for chat service
• Defines
container configuration, environment variables, and resource limits
•
Includes LLM configuration and secret management
networkpolicy.yaml
Add NetworkPolicy template for chat servicehelm/chat-service/templates/networkpolicy.yaml
• New NetworkPolicy template for chat service zero-trust security
•
Defines ingress rules for PDF processor service communication
•
Configures egress rules for ChromaDB, MinIO, Redis, and external LLM
access
servicemonitor.yaml
Add ServiceMonitor template for chat service monitoringhelm/chat-service/templates/servicemonitor.yaml
• New ServiceMonitor template for Prometheus metrics collection
•
Configures scraping intervals, timeouts, and metric relabeling
•
Enables monitoring integration for chat service
Dockerfile
Add Dockerfile for chat service containerchat_service/Dockerfile
• New Dockerfile for chat service container image
• Configures Python
3.13 environment with FastAPI and dependencies
• Includes security
hardening with non-root user
values.yaml
Add RBAC configuration for chat servicehelm/rbac/values.yaml
• Added
chat-serviceto service accounts list• Added chat-service
RBAC configuration with ChromaDB, MinIO, and Redis access
• Updated
pdf-processor-service to include chat-service in canCall list
hpa.yaml
Add HPA template for chat service autoscalinghelm/chat-service/templates/hpa.yaml
• New HorizontalPodAutoscaler template for chat service
• Configures
CPU and memory-based autoscaling
• Enables dynamic scaling based on
resource utilization
poddisruptionbudget.yaml
Add PodDisruptionBudget template for chat servicehelm/chat-service/templates/poddisruptionbudget.yaml
• New PodDisruptionBudget template for chat service high availability
• Configures minimum available replicas during cluster maintenance
•
Ensures service availability during rolling updates
example.env
Add environment configuration for chat servicechat_service/example.env
• New environment configuration file for chat service
• Defines LLM
settings, RAG optimization parameters, and Redis connection
• Includes
model configuration and context management settings
example.env
Update image captioner service environment configurationimage_captioner_service/example.env
• Updated OPENAI_BASE_URL from IP address to hostname
• Changed
OPENAI_API_KEY from token format to lm-studio
docker-compose.yml
Add chat service to Docker Compose configurationdocker-compose.yml
• Added
chat_servicecontainer configuration• Configured build
context and environment file
• Added dependency on embedder_service
images.txt
Add chat service image to Helm images listhelm/images.txt
• Added
ghcr.io/notyusheng/chat_service:dev-v0.0.3-5d69f89to imagelist
• Updated core application services section
Chart.yaml
Add Helm chart metadata for chat servicehelm/chat-service/Chart.yaml
• New Helm chart metadata for chat service
• Defines chart version,
description, and keywords
• Configures chart as application type with
RAG capabilities
service.yaml
Add Kubernetes Service template for chat servicehelm/chat-service/templates/service.yaml
• New Kubernetes Service template for chat service
• Configures
ClusterIP service with port 8000
• Defines service selector and port
mapping
serviceaccount.yaml
Add ServiceAccount template for chat servicehelm/chat-service/templates/serviceaccount.yaml
• New ServiceAccount template for chat service
• Configures service
account creation and token mounting
• Includes metadata labels and
annotations support
values-prestaging.yaml
Update ChromaDB network policy for chat service accesshelm/chromadb/values-prestaging.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to ChromaDB
values.yaml
Update ChromaDB network policy for chat service accesshelm/chromadb/values.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to ChromaDB
example.env
Add chat service URL to PDF processor configurationpdf_processor_service/example.env
• Added
CHAT_SERVICE_URL=http://chat_service:8000/chatenvironmentvariable
• Updated service URL configuration to include chat service
endpoint
values-prestaging.yaml
Update PDF processor network policy for chat servicehelm/pdf-processor-service/values-prestaging.yaml
• Added chat-service to NetworkPolicy egress allowedTargets
• Updated
network policy to allow PDF processor to communicate with chat service
values.yaml
Update PDF processor network policy for chat servicehelm/pdf-processor-service/values.yaml
• Added chat-service to NetworkPolicy egress allowedTargets
• Updated
network policy to allow PDF processor to communicate with chat service
values.yaml
Add chat service to Istio gateway destination ruleshelm/istio-gateway/values.yaml
• Added
chat-service.omnipdf-prestaging.svc.cluster.localtodestination rules
• Configured mTLS traffic policy for chat service
values-prestaging.yaml
Update Redis network policy for chat service accesshelm/redis/values-prestaging.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to Redis
values-prestaging.yaml
Update MinIO network policy for chat service accesshelm/minio/values-prestaging.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to MinIO
example.env
Add chat service URL to frontend configurationfrontend/example.env
• Added
CHAT_URL=http://chat_service:8000environment variable•
Updated frontend configuration to include chat service endpoint
values.yaml
Update Redis network policy for chat service accesshelm/redis/values.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to Redis
values.yaml
Update MinIO network policy for chat service accesshelm/minio/values.yaml
• Added chat-service to NetworkPolicy ingress allowedCallers
• Updated
network policy to allow chat service access to MinIO
example.env
Add chat service URL to nginx configurationnginx/example.env
• Added
CHAT_URL=http://chat_service:8001environment variable•
Updated nginx configuration to include chat service endpoint
services-to-build.txt
Add chat service to GitHub Actions build list.github/services-to-build.txt
• Added
chat_serviceto the list of core application services to build• Updated CI/CD build configuration to include chat service
1 files
chat_service-report.txt
Add security scan report for chat servicetrivy_scan_results/chat_service-report.txt
• Generated security scan report for chat service Docker image
• Shows
clean scan results with no high or critical vulnerabilities
• Includes
comprehensive package and filesystem vulnerability assessment
9 files
README.md
Restore chat service references in main documentationREADME.md
• Added chat service to the architecture description and microservices
list
• Updated service count from 13 to 14 individual RBAC roles
•
Added chat-service to network policy tables and deployment examples
•
Updated test count from 180+ to 206+ tests across 7 services
SECRET-MANAGEMENT.md
Update secret management examples for chat servicehelm/SECRET-MANAGEMENT.md
• Updated examples to use
chat-serviceinstead ofpdf-extraction-service• Changed secret references from
pdf-extraction-service-secretstochat-service-secrets• Modified
command examples and service references throughout
LOCAL_IMAGE_REPOSITORY.md
Update local image repository examples for chat servicehelm/LOCAL_IMAGE_REPOSITORY.md
• Replaced
pdf_extraction_servicereferences withchat_serviceinexamples
• Updated Docker image names and repository paths
• Modified
deployment and configuration examples
c4-diagram.puml
Add chat service to C4 architecture diagramc4-diagram.puml
• Added
chat_servicecontainer definition with RAG capabilities•
Updated frontend description to include chat interface
• Added
relationships between chat service and other components (ChromaDB,
Redis, MinIO, vLLM)
README.md
Update testing documentation for chat servicescripts/README.md
• Updated examples to use
chat_serviceinstead ofpdf_extraction_service• Changed service count from 6 to 7 services in
testing documentation
• Updated test output examples and service
descriptions
README.md
Update RBAC documentation for chat servicehelm/rbac/README.md
• Updated service coverage from 13/13 to 14/14 complete
• Added
chat-serviceto service table and data access patterns• Updated
ChromaDB access patterns to include chat service
NETWORK-POLICY-REFERENCE.md
Update network policy reference for chat servicehelm/NETWORK-POLICY-REFERENCE.md
• Added
chat-serviceto external AI communication table• Updated
service references and communication patterns
• Modified ChromaDB
access patterns to include chat service
INSTALL.md
Update Istio gateway installation for chat servicehelm/istio-gateway/INSTALL.md
• Added
chat-serviceto service deployment loop• Updated service
communication testing examples
• Modified deployment instructions to
include chat service
Chart.yaml
Update frontend chart description for chat functionalityhelm/frontend/Chart.yaml
• Updated chart description to include "and chat" functionality
•
Modified description from PDF processing only to include chat
interface
1 files
requirements.txt
Add Python requirements for chat servicechat_service/requirements.txt
• New Python dependencies file for chat service
• Includes FastAPI,
OpenAI, ChromaDB, and monitoring libraries
• Defines specific versions
for all required packages
4 files