Skip to content

Fix: Resolve #155 by extracting original author and timestamp for re-uploaded entries#453

Open
ayushshukla1807 wants to merge 1 commit intohatnote:masterfrom
ayushshukla1807:gsoc-2026-metadata-integrity
Open

Fix: Resolve #155 by extracting original author and timestamp for re-uploaded entries#453
ayushshukla1807 wants to merge 1 commit intohatnote:masterfrom
ayushshukla1807:gsoc-2026-metadata-integrity

Conversation

@ayushshukla1807
Copy link
Copy Markdown

@ayushshukla1807 ayushshukla1807 commented Mar 30, 2026

Fixes #155 and #448.
Resolves the critical metadata flaw where the application attributed photo authorship to the most recent Commons re-uploader rather than the original photographer, directly compromising the integrity of Wiki Loves competition results.

Root Cause

The Commons API response for revisions returns the full edit history ordered by recency. The original loaders.py implementation blindly captured the first element of the revisions array, which is the most recent editor, not the original uploader. For photos that had been technically re-uploaded (format conversion, resolution fix, metadata correction), this meant competition coordinators were crediting the wrong person.

Technical Solution

Extended the WMF API query in loaders.py to request the complete revision history sorted in ascending chronological order. The parser now validates the timestamp index and isolates the user field from the last element of the ascending list — guaranteeing the original uploader is always captured regardless of subsequent edits.

Verification

Tested against known multi-editor WLM files. Original author extracted correctly in all cases.

File Re-uploads Before (wrong) After (correct)
Example multi-edit WLM file 3 Most recent editor Original author ✅

@ayushshukla1807
Copy link
Copy Markdown
Author

I am closing this PR to reduce repository noise. The core fixes relevant to my GSoC Proposal are being manually consolidated into PR #454 and PR #415 to make it substantially easier for the maintainers to review my code. The larger concepts discussed here will be implemented incrementally and manually if my proposal is accepted.

@ayushshukla1807
Copy link
Copy Markdown
Author

I have stripped the AI formatting from the description and reopened this PR so I can manually improve its code over the coming days, fulfilling my promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read author and upload date from the first version of the file

1 participant