Retrieval job selection skips deals due to random sampling from growing pool

When a retrieval job runs for a given SP, it selects a deal to retrieve using ORDER BY RANDOM() LIMIT 1 from the eligible pool. As the pool grows over time, this produces uneven coverage: some deals are retrieved repeatedly while others are never selected.

In a 3-day run with ~144 stored deals and ~266 retrieval records, only 92 unique deals were ever retrieved - 36% were never sampled despite being eligible and old enough. The 266 retrievals represent ~2.9 draws per deal on average for the sampled subset.

(This is the coupon collector problem. To have 99% probability of covering all N deals at least once requires approximately N × ln(N) draws. At N=92 that's ~420 draws; the pool keeps growing as new deals are created, so the required draw count grows faster than the retrieval rate.)

Proposed fix: change the deal selection query to prefer deals with the fewest prior retrievals, falling back to random for ties:

```SQL
  ORDER BY (
      SELECT COUNT(*) FROM retrievals r
      WHERE r.deal_id = deal.id
        AND r.service_type = 'ipfs_pin'
  ) ASC, RANDOM()
  LIMIT 1
```

This ensures every eligible deal gets at least one retrieval before any deal gets a second, which also makes retrieval coverage predictable and independent of pool size.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval job selection skips deals due to random sampling from growing pool #584

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Retrieval job selection skips deals due to random sampling from growing pool #584

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions