Add benchmark tool for superseded additions detection (#113)#122
Add benchmark tool for superseded additions detection (#113)#122xenacode-art wants to merge 6 commits intoWikimedia-Suomi:mainfrom
Conversation
…i#113) I've implemented a Django management command to compare two methods of detecting superseded additions in pending revisions: 1. Current similarity-based method (using SequenceMatcher) 2. Proposed word-level diff method (using MediaWiki REST API) ## What I Added ### Management Command - app/reviews/management/commands/benchmark_superseded.py (450 lines) - Compares both methods across sample revisions - Generates detailed statistics and JSON output - Provides diff URLs for manual review - Configurable sample size, threshold, and wiki ### Documentation - BENCHMARK_SUPERSEDED.md (comprehensive guide) - Explains current implementation (autoreview.py:755-813) - Documents word-level diff approach - Usage examples and interpretation guide - Performance considerations and integration path ### Supporting Files - app/reviews/management/__init__.py (package marker) - app/reviews/management/commands/__init__.py (package marker) - benchmark_results_example.json (sample output format) ## Usage python manage.py benchmark_superseded --wiki=1 --sample-size=50 --threshold=0.2 --output=results.json ## Key Features Similarity Method (Current): - Character-level text matching with SequenceMatcher - Normalizes wikitext (removes refs, templates, formatting) - Fast, no external dependencies Word-Level Method (Proposed): - Uses MediaWiki REST API visual diff endpoint - Tracks word-level changes and block moves - More precise semantic understanding Comparison Output: - Agreement rate between methods - Disagreement breakdown (similarity-only vs word-level-only approvals) - Per-revision results with diff URLs - JSON export for further analysis ## Testing I validated the command structure with: - Python AST syntax checking (passed) - Django package structure (proper __init__.py files) Addresses issue Wikimedia-Suomi#113
zache-fi
left a comment
There was a problem hiding this comment.
Even though I didn't test the results. I was able to run the command and based on that it would be useful
- if there is some warning that it will require that data is loaded using web interface first because now user needs to guess it
- wiki parameter now requires NUMBER as parameter. It would be better if it would require language code as parameter and resolve correct wiki based on that.
| from django.core.management.base import BaseCommand | ||
| from pywikibot.comms import http | ||
|
|
||
| from app.reviews.models import PendingPage, PendingRevision, Wiki |
There was a problem hiding this comment.
app.reviews.models import throws ModuleNotFoundError: No module named 'app' It is easy to be fixed by removing the app part from import but i am not sure when it was required at first place.
| self, revision: PendingRevision, wiki: Wiki, threshold: float | ||
| ) -> dict[str, Any]: | ||
| """Compare similarity-based vs word-level diff methods.""" | ||
| from app.reviews.autoreview import ( |
There was a problem hiding this comment.
These were refactored in 3a7f185 so there is no more private _extract_additions. _get_parent_wikitext etc functions but they are now public extract_additions, get_parent_wikitext utility functions. I tested to fix these but the result AFAIK was that something what functions returned were changed too so i didn't continue to test further.
Address review comments from PR Wikimedia-Suomi#122: 1. Fix import statements - Remove 'app.' prefix - Changed: from app.reviews.models to reviews.models - Changed: from app.reviews.autoreview imports 2. Update to use refactored autoreview utility functions - Use extract_additions (was _extract_additions) - Use get_parent_wikitext (was _get_parent_wikitext) - Use normalize_wikitext (was _normalize_wikitext) - Use is_addition_superseded (was _is_addition_superseded) - Import from reviews.autoreview.utils.wikitext and reviews.autoreview.utils.similarity - Functions were refactored in commit 3a7f185 3. Change --wiki parameter to accept language code - Now accepts language codes (e.g., 'fi', 'en') instead of numeric Wiki ID - More user-friendly and intuitive - Provides helpful error message with available wiki codes if not found 4. Add data loading requirement warnings - Added note about needing to load data via web interface first - Improved error message when no suitable revisions found - Explains possible reasons for empty results 5. Update documentation - Updated BENCHMARK_SUPERSEDED.md to reflect all changes - Fixed function references (removed underscores) - Updated file location references for refactored code - Updated all usage examples to use language codes
|
Hi @zache-fi, I've addressed all the review comments you provided. The command now uses the refactored utility Thanks for the detailed feedback! |
|
Hi @zache-fi! 👋 Just wanted to follow up on this PR since I've addressed all the review feedback you provided: ✅ Fixed all import issues - Removed app. prefix The command is now more user-friendly and follows the current codebase architecture. All syntax checks How it supports the roadmap: Ready for another review when you have time! Let me know if there's anything else needed. |
Hi, @zache-fi ,
I've implemented the Django management command to compare two methods of detecting superseded additions in pending revisions:
What I Added
Management Command
Documentation
Supporting Files
Usage
python manage.py benchmark_superseded --wiki=1 --sample-size=50 --threshold=0.2 --output=results.json
Key Features
Similarity Method (Current):
Word-Level Method (Proposed):
Comparison Output:
Testing
I validated the command structure with:
Addresses issue #113