Skip to content

Add reference-only edit detection and auto-approval#127

Open
miraclousharshita wants to merge 10 commits intoWikimedia-Suomi:mainfrom
miraclousharshita:miraclousharshita/feat/adding-autoreview-for-ref-links
Open

Add reference-only edit detection and auto-approval#127
miraclousharshita wants to merge 10 commits intoWikimedia-Suomi:mainfrom
miraclousharshita:miraclousharshita/feat/adding-autoreview-for-ref-links

Conversation

@miraclousharshita
Copy link
Copy Markdown
Contributor

closes : #24

Implements automatic approval for edits that only add or modify references, reducing manual review workload for citation improvements.

Auto-approve when:

  • Edit only adds or modifies references (no other content changes)
  • All external link domains have been previously used on Wikipedia

Require manual review when:

  • References are removed without replacement
  • New/unknown external domains are added
  • Edit changes content beyond references

@System625
Copy link
Copy Markdown
Contributor

Hi @miraclousharshita

Great work on this PR! The implementation is clean, well-structured, and has excellent test coverage (22 test cases covering all the requirements). Really impressive! 🎉

Issue to Fix

There's one bug that needs to be addressed I believe:

In app/reviews/services/wiki_client.py:368 - the has_domain_been_used() method:

ext_url_usage = self.site.exturlusage(
    url=domain, protocol="http", namespaces=[0], total=1
)

Problem: protocol="http" only checks HTTP URLs, not HTTPS. Modern Wikipedia uses HTTPS extensively, so domains that have only been used with HTTPS will incorrectly show as "never used", causing unnecessary manual reviews.

Fix: Remove the protocol parameter:

ext_url_usage = self.site.exturlusage(
    url=domain, namespaces=[0], total=1
)

According to the MediaWiki API docs, when protocol is omitted, it defaults to checking both HTTP and HTTPS.


Once this is fixed, I think it will be good to go! 🚀

Copy link
Copy Markdown
Contributor

@zache-fi zache-fi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, i tested this. I think that coding style etc point of view and others it is very nice.

However, I noticed that there were some irregularities on how it passes and failes the changes. Here is example list for some of the changes. My guess that the checking if the edit is reference edit fails because it pics more text than expected. The test for checking if the link is new least partially fails because if the link is still in latest revision then it will beckome self-match.

Correctly detecs the ref change

Correctly detects that the domain was new

Incorrect: Edit modifies content beyond references detection

Incorrectly detects that domain was new/old

Incorrectly detects if it is ref change

Comment thread app/reviews/services/wiki_client.py Outdated
return False

try:
ext_url_usage = self.site.exturlusage(url=domain, namespaces=[0], total=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If total=1 then this will match to itself if it still exists in latest revision. You can somewhat mitigate self-references by adding total=2 and check if there is least 2 links.

@miraclousharshita
Copy link
Copy Markdown
Contributor Author

@zache-fi can you please review it again

@zache-fi
Copy link
Copy Markdown
Contributor

zache-fi commented Nov 5, 2025

Thanks, there is still some incorrect review results. Note. this is probably not because from your code / changes, but it already did wrong reviews which weren't notified because there werent thant many test cases.

In any case I added now a management command which will run reference-only-edit tests against diffs

Usage
python manage.py run_wiki_diff_tests

The configuration page for diff tests here.

Note: I have only tested management command with the reference-only-edit autoreview check. Currently it also pollutes the database with orphan revisions, so data needs to be refreshed if you want clean data in the database.

@miraclousharshita
Copy link
Copy Markdown
Contributor Author

@zache-fi can you please recheck, I update it to fix the abnormalities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Check for Reference-Only Changes

3 participants