When a retrieval job runs for a given SP, it selects a deal to retrieve using ORDER BY RANDOM() LIMIT 1 from the eligible pool. As the pool grows over time, this produces uneven coverage: some deals are retrieved repeatedly while others are never selected.
In a 3-day run with ~144 stored deals and ~266 retrieval records, only 92 unique deals were ever retrieved - 36% were never sampled despite being eligible and old enough. The 266 retrievals represent ~2.9 draws per deal on average for the sampled subset.
(This is the coupon collector problem. To have 99% probability of covering all N deals at least once requires approximately N × ln(N) draws. At N=92 that's ~420 draws; the pool keeps growing as new deals are created, so the required draw count grows faster than the retrieval rate.)
Proposed fix: change the deal selection query to prefer deals with the fewest prior retrievals, falling back to random for ties:
ORDER BY (
SELECT COUNT(*) FROM retrievals r
WHERE r.deal_id = deal.id
AND r.service_type = 'ipfs_pin'
) ASC, RANDOM()
LIMIT 1
This ensures every eligible deal gets at least one retrieval before any deal gets a second, which also makes retrieval coverage predictable and independent of pool size.
When a retrieval job runs for a given SP, it selects a deal to retrieve using ORDER BY RANDOM() LIMIT 1 from the eligible pool. As the pool grows over time, this produces uneven coverage: some deals are retrieved repeatedly while others are never selected.
In a 3-day run with ~144 stored deals and ~266 retrieval records, only 92 unique deals were ever retrieved - 36% were never sampled despite being eligible and old enough. The 266 retrievals represent ~2.9 draws per deal on average for the sampled subset.
(This is the coupon collector problem. To have 99% probability of covering all N deals at least once requires approximately N × ln(N) draws. At N=92 that's ~420 draws; the pool keeps growing as new deals are created, so the required draw count grows faster than the retrieval rate.)
Proposed fix: change the deal selection query to prefer deals with the fewest prior retrievals, falling back to random for ties:
This ensures every eligible deal gets at least one retrieval before any deal gets a second, which also makes retrieval coverage predictable and independent of pool size.