Skip to content

Add global bot and former global bot detection#79

Open
Ezi-code wants to merge 8 commits intoWikimedia-Suomi:mainfrom
Ezi-code:feature/6
Open

Add global bot and former global bot detection#79
Ezi-code wants to merge 8 commits intoWikimedia-Suomi:mainfrom
Ezi-code:feature/6

Conversation

@Ezi-code
Copy link
Copy Markdown
Contributor

Description

This PR enhances the bot detection logic to recognize and auto-approve edits made by global bots and former global bots. This improves the accuracy of the autoreview process by correctly identifying legitimate automated edits from across the Wikimedia ecosystem.

Implementation Details:

  • Database Caching: Adds is_global_bot and is_former_global_bot fields to the EditorProfile model. This allows the bot status to be cached, avoiding redundant API calls and improving performance.

  • Service Layer: The ensure_editor_profile method in reviews/services.py now queries the Meta-Wiki API to fetch a user's global groups and updates the profile with their global bot status.

  • Autoreview Logic: The _is_bot_user function in autoreview.py has been updated to use these new database flags, ensuring that users identified as global or former global bots are approved.

  • Testing: Includes comprehensive unit tests to verify the new detection logic and ensure the changes are working as expected.

issue #6

@Teja-Sri-Surya
Copy link
Copy Markdown
Contributor

Code Review - PR #79: Global Bot Detection

Hi @Ezi-code! 👋

I've reviewed your implementation for adding global bot detection. Here's my detailed feedback:

Strengths:

  1. Correct API Usage

    • Uses globaluserinfo Meta-Wiki API (efficient approach)
    • Proper error handling with try/except
    • Returns sensible defaults on failure
  2. Good Database Design 📊

    • Added is_global_bot and is_former_global_bot fields
    • Migration file included
    • Properly integrated into EditorProfile model
  3. Clean Integration

    • Updated _is_bot_user() to check global bot flags (lines 462-463)
    • Caching in database avoids repeated API calls
    • Follows existing bot detection pattern
  4. Test Coverage 🧪

    • Tests for global bots
    • Tests for former global bots
    • Tests for service layer API calls

⚠️ Issues & Recommendations:

1. Former Global Bot Detection Missing 🔴

Problem: Lines 65-66 in services.py:

is_global_bot = "global-bot" in current_groups
is_former_global_bot = False  # ❌ Always False!

Issue: The API doesn't provide former groups in globaluserinfo!

Solution: According to the issue description, you need to fetch former global groups. Check the API documentation - there might be a guiprop=unattachedaccounts or similar parameter, or you may need a different approach.

Recommendation:

# Check if API supports former groups
parameters={
    "action": "query",
    "meta": "globaluserinfo",
    "guiuser": username,
    "guiprop": "groups",  # May need additional props for former groups
}

2. Merge Conflicts (Critical) 🔴

Problem: PR #96 restructured the codebase:

  • autoreview.pyautoreview/ directory with separate check modules
  • models.pymodels/ directory
  • services.pyservices/ directory

What needs updating:

  1. Create app/reviews/autoreview/checks/bot_user.py (new file)
  2. Update to use CheckContext pattern
  3. Update imports throughout

Example new structure:

# app/reviews/autoreview/checks/bot_user.py
from ..base import CheckResult
from ..context import CheckContext
from ..decision import AutoreviewDecision

def check_bot_user(context: CheckContext) -> CheckResult:
    """Check if user is a bot (local, former, or global)."""
    profile = context.profile
    
    if not profile:
        return CheckResult(
            check_id="bot-user",
            check_title="Bot user check",
            status="pass",
        )
    
    is_bot = (
        profile.is_bot
        or profile.is_former_bot
        or profile.is_global_bot
        or profile.is_former_global_bot
    )
    
    if is_bot:
        return CheckResult(
            check_id="bot-user",
            check_title="Bot user check",
            status="pass",
            message=f"User {profile.username} is a bot",
            decision=AutoreviewDecision(
                status="approve",
                label="Auto-approved",
                reason="Edit made by bot user",
            ),
        )
    
    return CheckResult(
        check_id="bot-user",
        check_title="Bot user check",
        status="pass",
    )

3. Performance Consideration

Line 339: check_global_bot_user() is called for EVERY user profile refresh

Concern: This makes an API call to Meta-Wiki for every user!

Recommendation: Add caching or rate limiting:

# Option 1: Cache results
@lru_cache(maxsize=1000)
def check_global_bot_user_cached(self, username: str) -> tuple[bool, bool]:
    ...

# Option 2: Only check if not already cached
if profile.is_global_bot or profile.is_former_global_bot:
    # Already know they're a global bot, skip API call
    pass
else:
    # Only check if we haven't checked before
    is_global_bot, is_former_global_bot = self.check_global_bot_user(username)

4. Test Updates Needed 🧪

After PR #96 restructuring, tests need to:

  • Move to app/reviews/tests/autoreview/test_bot_user.py
  • Update imports to match new structure
  • Use CheckContext in test setup

📋 Action Items:

  1. Investigate former global bot API

    • Check if globaluserinfo supports former groups
    • Or find alternative API endpoint
  2. Resolve merge conflicts

    • Rebase on latest main
    • Migrate to restructured codebase
  3. Add caching/optimization

    • Prevent redundant Meta-Wiki API calls
    • Consider LRU cache
  4. Update tests

    • Move to new test structure
    • Update to CheckContext pattern

🤝 Offer to Help:

I'm familiar with the restructured codebase (worked on PR #99). Happy to help with:

  • Resolving merge conflicts
  • Migrating to new CheckContext structure
  • Performance optimization

Let me know! 😊

Copy link
Copy Markdown
Contributor

@zache-fi zache-fi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented this but it seems that i left the comment pending without noticing it, sorry about that.

About the code : In any case it technically worked, but requesting global bot rights with every user was little bit slow and it would be faster to fetch all with single request.

  • it should be now refactored latest main.

Comment thread app/reviews/services/wiki_client.py Outdated
Copy link
Copy Markdown
Contributor

@zache-fi zache-fi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also python manage.py test reviews.tests.test_services fails after the changes

Comment thread app/reviews/services/wiki_client.py Outdated
)

response = request.submit()
user_info = response.get("query", {}).get("globaluserinfo", {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query returns "globalallusers" list and not globaluserinfo tms.

{'batchcomplete': '', 'limits': {'globalallusers': 500}, 'query': {'globalallusers': [{'id': 12943, 'name': 'Chobot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 13387278, 'name': 'Dexbot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 5195574, 'name': 'EmausBot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 44776378, 'name': 'InternetArchiveBot', 'groups': ['global-bot', 'oathauth-tester'], 'existslocally': ''}, {'id': 3114, 'name': 'JAnDbot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 6223, 'name': 'JhsBot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 53086278, 'name': 'KiranBOT', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 54496224, 'name': 'Lingua Libre Bot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 28974611, 'name': 'Revibot', 'groups': ['global-bot', 'oathauth-tester'], 'existslocally': ''}, {'id': 583857, 'name': 'Xqbot', 'groups': ['global-bot'], 'existslocally': ''}, {'id': 27868374, 'name': 'YiFeiBot', 'groups': ['global-bot'], 'existslocally': ''}], 'userinfo': {'id': 1051277, 'name': 'Zache-test'}}}

self.site = pywikibot.Site(code=wiki.code, fam=wiki.family)

@lru_cache(maxsize=1000)
def check_global_bot_user(self, username: str) -> tuple[bool, bool]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that it would do the slow API query only once and then it would use cached result. Now it it still makes one http query for every user even every time the result from http query is identical.

@Ezi-code
Copy link
Copy Markdown
Contributor Author

also python manage.py test reviews.tests.test_services fails after the changes

Thanks for the feedback. I was still working on the feature at the time. We experienced network disruptions in my region, so I pushed the code to GitHub. However, I have made the changes, and I am having a bit of a problem with the tests. I get an AttributeError that indicates protocol is not a valid method of FakeSite, which is a class in the test_services file. After adding the protocol method, I get additional methods that were needed to test the check_global_bot_user method.

I'm stuck with that at the moment and would be glad if you could give me some pointers.

@Ezi-code
Copy link
Copy Markdown
Contributor Author

Ezi-code commented Nov 2, 2025

also python manage.py test reviews.tests.test_services fails after the changes

Thanks for the feedback. I was still working on the feature at the time. We experienced network disruptions in my region, so I pushed the code to GitHub. However, I have made the changes, and I am having a bit of a problem with the tests. I get an AttributeError that indicates protocol is not a valid method of FakeSite, which is a class in the test_services file. After adding the protocol method, I get additional methods that were needed to test the check_global_bot_user method.

I'm stuck with that at the moment and would be glad if you could give me some pointers.

@zache-fi, about the issues I was facing, I discovered something:

In the check_global_bot_user method, we are using request = pywikibot.data.api.Request(site=site, parameters={}) to make the API request, which returns globalalluser, but looking at the request patterns in the wiki_client.py file, requests are made using self.site.simple_request, which is a high-level wrapper around request = pywikibot.data.api.Request(site=site, parameters={}).

This means that if the pattern for making requests changed, the test cases will expect us to explicitly define all the methods the low-level Request object needs to simulate an exact api call. After changing the API call to use self.site.simple_request, the method returns globalallusers data, and the test cases passed after a few minor refactoring.

[{'id': 12943, 'name': 'Chobot', 'groups': ['global-bot']}, {'id': 13387278, 'name': 'Dexbot', 'groups': ['global-bot']}, {'id': 5195574, 'name': 'EmausBot', 'groups': ['global-bot']}, {'id': 44776378, 'name': 'InternetArchiveBot', 'groups': ['global-bot', 'oathauth-tester']}, {'id': 3114, 'name': 'JAnDbot', 'groups': ['global-bot']}, {'id': 6223, 'name': 'JhsBot', 'groups': ['global-bot']}, {'id': 53086278, 'name': 'KiranBOT', 'groups': ['global-bot']}, {'id': 54496224, 'name': 'Lingua Libre Bot', 'groups': ['global-bot']}, {'id': 28974611, 'name': 'Revibot', 'groups': ['global-bot', 'oathauth-tester']}, {'id': 583857, 'name': 'Xqbot', 'groups': ['global-bot']}, {'id': 27868374, 'name': 'YiFeiBot', 'groups': ['global-bot']}]

I'm still trying to figure out how every component of the codebase works explicitly, but I think if we update the API call, we should acheive the same results.

@zache-fi
Copy link
Copy Markdown
Contributor

zache-fi commented Nov 3, 2025

Yes, you can use simple requests for this

@Ezi-code
Copy link
Copy Markdown
Contributor Author

Ezi-code commented Nov 3, 2025

Yes, you can use simple requests for this

Ok, just did.

@Ezi-code Ezi-code requested a review from zache-fi November 3, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants