Skip to content

sanitise keywords query param to prevent IndexError crash and abuse (#140)#172

Open
ahmedk20 wants to merge 1 commit into
c2siorg:mainfrom
ahmedk20:fix/keywords-input-sanitisation
Open

sanitise keywords query param to prevent IndexError crash and abuse (#140)#172
ahmedk20 wants to merge 1 commit into
c2siorg:mainfrom
ahmedk20:fix/keywords-input-sanitisation

Conversation

@ahmedk20

Copy link
Copy Markdown

Description

Added a _validate_keyword() helper in routes/NewsRoutes.py that validates
the ?keywords= query parameter before it reaches the controller or database.
Both /<llm_name>/news_keywords and /raw/news_keywords now use it.

Related Issue

Fixes #140

Motivation and Context

The keywords endpoint passed user input directly to Pinecone with zero
validation, causing three problems:

1. Crash — IndexError on missing param

GET /mistralai/news_keywords        ← no ?keywords= provided
→ user_keywords = []
→ user_keywords[0]                  ← IndexError → unhandled 500

2. Empty input passes through

GET /mistralai/news_keywords?keywords=   ← only whitespace
→ blank string gets embedded and sent to Pinecone
→ wastes API quota and returns garbage results

3. No length limit

GET /mistralai/news_keywords?keywords=<10,000 chars>
→ goes straight into HuggingFace embedding call
→ risks hitting token limits and slowing the server

How Has This Been Tested?

Added 4 unit tests in tests/unitTest.py covering each validation case:

Test Input Expected
test_keywords_missing_param no ?keywords= 400
test_keywords_empty_string ?keywords= 400
test_keywords_too_long keyword > 200 chars 400
test_raw_keywords_missing_param /raw/news_keywords no param 400

Results:

tests/unitTest.py::FlaskAppTestCase::test_keywords_missing_param     PASSED
tests/unitTest.py::FlaskAppTestCase::test_keywords_empty_string      PASSED
tests/unitTest.py::FlaskAppTestCase::test_keywords_too_long          PASSED
tests/unitTest.py::FlaskAppTestCase::test_raw_keywords_missing_param PASSED

4 passed in 3.29s

Testing environment: Python 3.14.2, Flask test client.
Validation runs entirely before any external call is made, so dummy
env vars for Pinecone/HuggingFace are sufficient to run these tests.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

                                                                                                                                                                             - Add _validate_keyword() helper that checks for missing param,
     empty string after strip, and length > 200 characters
   - Both /<llm_name>/news_keywords and /raw/news_keywords now return
     HTTP 400 with a JSON error instead of crashing with IndexError
   - Add 4 unit tests covering each validation case

   Fixes c2siorg#140
@arjunKalare

Copy link
Copy Markdown

Hi @ahmedk20, this was an issue that I raised and it is therefore up to our mentor @hardik1408 to determine who gets assigned the issue. You opening a PR on this issue without assignment is directly going against the protocol that is outlined in the contributions document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lack of input sanitisation on keywords endpoint

2 participants