Skip to content

migrate content to s3#191

Merged
ArnavAgrawal03 merged 2 commits into
mainfrom
smaller-chunks-1
Jun 8, 2025
Merged

migrate content to s3#191
ArnavAgrawal03 merged 2 commits into
mainfrom
smaller-chunks-1

Conversation

@ArnavAgrawal03

Copy link
Copy Markdown
Collaborator

No description provided.

@jazzberry-ai

jazzberry-ai Bot commented Jun 8, 2025

Copy link
Copy Markdown

Bug Report

Name Severity Example test case Description
Incorrect S3 Key Generation High Configure the system to use S3 storage, ingest a document, and query the chunks. Verify that content retrieval fails. Inconsistent storage key generation and retrieval logic in core/vector_store/multi_vector_store.py and core/storage/s3_storage.py lead to incorrect S3 paths.
File Extension Detection Errors Medium Ingest a document with text chunks and verify if the file extension has been incorrectly assigned, preventing correct retrieval. Misidentification of file types by detect_file_type in _determine_file_extension may lead to incorrect file extensions.
Inaccurate Migration Check Medium Set up a local storage path that results in long storage keys and run the migration script. Erroneous logic in the data migration script may cause some chunks to not be migrated to external storage.
Poor Error Handling in Content Retrieval Low Manually corrupt a chunk file in external storage and query the multi-vector store to retrieve it. The error handling in _retrieve_content_from_storage provides a poor user experience by displaying storage keys when content retrieval fails.
Potential Race Condition Low Concurrent requests to retrieve the same app_id may cause a race condition. The cache may be overwritten multiple times, potentially reducing its efficiency.

Comments? Email us.

@jazzberry-ai

jazzberry-ai Bot commented Jun 8, 2025

Copy link
Copy Markdown

Bug Report

Name Severity Example test case Description
Incorrect File Extension in S3Storage High Upload a base64 encoded image to S3 storage. The upload_from_base64 method in S3Storage did not correctly append the file extension to the storage key, resulting in files being stored without an extension.
Non-ASCII Characters in Image Data Medium Upload a base64 encoded image containing non-ASCII characters. The base64 encoded image data caused errors when attempting to store it due to non-ASCII characters.

Comments? Email us.

@ArnavAgrawal03 ArnavAgrawal03 merged commit 41a6972 into main Jun 8, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant