Description:
The scheduled sync job (TokenMetadataSyncCronJob, every 60 minutes) reprocesses every mapping file in the repository on each cycle, even though only a git pull --rebase is needed to fetch new changes.
Current behavior:
- GitService.cloneCardanoTokenRegistryGitRepository() — does git pull --rebase (incremental, efficient)
- TokenMetadataSyncService.synchronizeDatabase() lines 59-90 — calls mappings.listFiles() and iterates over all files
- For each file: parses the JSON mapping, runs git log -n 1 to get author/timestamp, then calls tokenMetadataRepository.save()
- No commit hash or timestamp is tracked between sync cycles
- No git diff is used to identify only changed files
Impact:
- The syncStatus goes in-progress state for a longer duration every hour resulting in api status not being ready to query.
- The cardano-token-registry repo has thousands of mapping files. Every hour, all of them are re-parsed, each triggers a separate git log subprocess, and each is upserted into the database via JPA .save()
- The git log call per file (GitService.getMappingDetails() line 99-101) spawns a shell process for every single file — this is O(n) subprocesses where n = total mapping files
- Database receives unnecessary UPDATE statements for unchanged records
Suggested improvement:
Track the last synced commit hash (e.g., in a database table or in-memory) and use git diff ..HEAD --name-only after pulling to identify only the files that changed. Then process only those files.
Description:
The scheduled sync job (TokenMetadataSyncCronJob, every 60 minutes) reprocesses every mapping file in the repository on each cycle, even though only a git pull --rebase is needed to fetch new changes.
Current behavior:
Impact:
Suggested improvement:
Track the last synced commit hash (e.g., in a database table or in-memory) and use git diff ..HEAD --name-only after pulling to identify only the files that changed. Then process only those files.