Add benchmark tool for superseded additions detection (#113) by xenacode-art · Pull Request #122 · Wikimedia-Suomi/PendingChangesBot-ng

xenacode-art · 2025-10-27T09:12:26Z

Hi, @zache-fi ,
I've implemented the Django management command to compare two methods of detecting superseded additions in pending revisions:

Current similarity-based method (using SequenceMatcher)
Proposed word-level diff method (using MediaWiki REST API)

What I Added

Management Command

app/reviews/management/commands/benchmark_superseded.py (450 lines)
- Compares both methods across sample revisions
- Generates detailed statistics and JSON output
- Provides diff URLs for manual review
- Configurable sample size, threshold, and wiki

Documentation

BENCHMARK_SUPERSEDED.md (comprehensive guide)
- Explains current implementation (autoreview.py:755-813)
- Documents word-level diff approach
- Usage examples and interpretation guide
- Performance considerations and integration path

Supporting Files

app/reviews/management/init.py (package marker)
app/reviews/management/commands/init.py (package marker)
benchmark_results_example.json (sample output format)

Usage

python manage.py benchmark_superseded --wiki=1 --sample-size=50 --threshold=0.2 --output=results.json

Key Features

Similarity Method (Current):

Character-level text matching with SequenceMatcher
Normalizes wikitext (removes refs, templates, formatting)
Fast, no external dependencies

Word-Level Method (Proposed):

Uses MediaWiki REST API visual diff endpoint
Tracks word-level changes and block moves
More precise semantic understanding

Comparison Output:

Agreement rate between methods
Disagreement breakdown (similarity-only vs word-level-only approvals)
Per-revision results with diff URLs
JSON export for further analysis

Testing

I validated the command structure with:

Python AST syntax checking (passed)
Django package structure (proper init.py files)

Addresses issue #113

…i#113) I've implemented a Django management command to compare two methods of detecting superseded additions in pending revisions: 1. Current similarity-based method (using SequenceMatcher) 2. Proposed word-level diff method (using MediaWiki REST API) ## What I Added ### Management Command - app/reviews/management/commands/benchmark_superseded.py (450 lines) - Compares both methods across sample revisions - Generates detailed statistics and JSON output - Provides diff URLs for manual review - Configurable sample size, threshold, and wiki ### Documentation - BENCHMARK_SUPERSEDED.md (comprehensive guide) - Explains current implementation (autoreview.py:755-813) - Documents word-level diff approach - Usage examples and interpretation guide - Performance considerations and integration path ### Supporting Files - app/reviews/management/__init__.py (package marker) - app/reviews/management/commands/__init__.py (package marker) - benchmark_results_example.json (sample output format) ## Usage python manage.py benchmark_superseded --wiki=1 --sample-size=50 --threshold=0.2 --output=results.json ## Key Features Similarity Method (Current): - Character-level text matching with SequenceMatcher - Normalizes wikitext (removes refs, templates, formatting) - Fast, no external dependencies Word-Level Method (Proposed): - Uses MediaWiki REST API visual diff endpoint - Tracks word-level changes and block moves - More precise semantic understanding Comparison Output: - Agreement rate between methods - Disagreement breakdown (similarity-only vs word-level-only approvals) - Per-revision results with diff URLs - JSON export for further analysis ## Testing I validated the command structure with: - Python AST syntax checking (passed) - Django package structure (proper __init__.py files) Addresses issue Wikimedia-Suomi#113

zache-fi

Even though I didn't test the results. I was able to run the command and based on that it would be useful

if there is some warning that it will require that data is loaded using web interface first because now user needs to guess it
wiki parameter now requires NUMBER as parameter. It would be better if it would require language code as parameter and resolve correct wiki based on that.

zache-fi · 2025-11-01T08:44:45Z

+from django.core.management.base import BaseCommand
+from pywikibot.comms import http
+
+from app.reviews.models import PendingPage, PendingRevision, Wiki


app.reviews.models import throws ModuleNotFoundError: No module named 'app' It is easy to be fixed by removing the app part from import but i am not sure when it was required at first place.

zache-fi · 2025-11-01T09:03:51Z

+        self, revision: PendingRevision, wiki: Wiki, threshold: float
+    ) -> dict[str, Any]:
+        """Compare similarity-based vs word-level diff methods."""
+        from app.reviews.autoreview import (


These were refactored in 3a7f185 so there is no more private _extract_additions. _get_parent_wikitext etc functions but they are now public extract_additions, get_parent_wikitext utility functions. I tested to fix these but the result AFAIK was that something what functions returned were changed too so i didn't continue to test further.

Address review comments from PR Wikimedia-Suomi#122: 1. Fix import statements - Remove 'app.' prefix - Changed: from app.reviews.models to reviews.models - Changed: from app.reviews.autoreview imports 2. Update to use refactored autoreview utility functions - Use extract_additions (was _extract_additions) - Use get_parent_wikitext (was _get_parent_wikitext) - Use normalize_wikitext (was _normalize_wikitext) - Use is_addition_superseded (was _is_addition_superseded) - Import from reviews.autoreview.utils.wikitext and reviews.autoreview.utils.similarity - Functions were refactored in commit 3a7f185 3. Change --wiki parameter to accept language code - Now accepts language codes (e.g., 'fi', 'en') instead of numeric Wiki ID - More user-friendly and intuitive - Provides helpful error message with available wiki codes if not found 4. Add data loading requirement warnings - Added note about needing to load data via web interface first - Improved error message when no suitable revisions found - Explains possible reasons for empty results 5. Update documentation - Updated BENCHMARK_SUPERSEDED.md to reflect all changes - Fixed function references (removed underscores) - Updated file location references for refactored code - Updated all usage examples to use language codes

xenacode-art · 2025-11-01T10:32:41Z

Hi @zache-fi,

I've addressed all the review comments you provided. The command now uses the refactored utility
functions, accepts language codes instead of numeric IDs, and includes warnings about data requirements.
All imports have been fixed and the documentation has been updated accordingly. Please let me know if
there's anything else that needs adjustment.

Thanks for the detailed feedback!

xenacode-art · 2025-11-03T07:54:57Z

Hi @zache-fi! 👋

Just wanted to follow up on this PR since I've addressed all the review feedback you provided:

✅ Fixed all import issues - Removed app. prefix
✅ Updated to refactored functions - Now using public utility functions from the new modular structure
✅ Changed --wiki parameter - Now accepts language codes (e.g., --wiki=fi) instead of numeric IDs
✅ Added data loading warnings - Users get clear messaging about prerequisites

The command is now more user-friendly and follows the current codebase architecture. All syntax checks
passing.

How it supports the roadmap:
This benchmark tool helps validate the effectiveness of superseded additions detection, which is part of
the autoreview pipeline that will run on Toolforge (Goal #1). Understanding which detection method works
best will help us optimize the production deployment.

Ready for another review when you have time! Let me know if there's anything else needed.

Kaja Obinna and others added 3 commits October 27, 2025 10:11

Merge branch 'main' into feature/113-benchmark-superseded-clean

f49bbaf

Merge branch 'main' into feature/113-benchmark-superseded-clean

e8d60c0

zache-fi requested changes Nov 1, 2025

View reviewed changes

Merge branch 'main' into feature/113-benchmark-superseded-clean

1f64935

Merge branch 'main' into feature/113-benchmark-superseded-clean

aaf55dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark tool for superseded additions detection (#113)#122

Add benchmark tool for superseded additions detection (#113)#122
xenacode-art wants to merge 6 commits intoWikimedia-Suomi:mainfrom
xenacode-art:feature/113-benchmark-superseded-clean

xenacode-art commented Oct 27, 2025 •

edited

Loading

Uh oh!

zache-fi left a comment

Uh oh!

zache-fi Nov 1, 2025

Uh oh!

zache-fi Nov 1, 2025

Uh oh!

xenacode-art commented Nov 1, 2025

Uh oh!

xenacode-art commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xenacode-art commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I Added

Management Command

Documentation

Supporting Files

Usage

Key Features

Testing

Uh oh!

zache-fi left a comment

Choose a reason for hiding this comment

Uh oh!

zache-fi Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

zache-fi Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

xenacode-art commented Nov 1, 2025

Uh oh!

xenacode-art commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xenacode-art commented Oct 27, 2025 •

edited

Loading