perf: pool http connections per miner scoring iteration#687
perf: pool http connections per miner scoring iteration#687seroperson wants to merge 4 commits intoentrius:testfrom
Conversation
|
Can you prove the actual performance gain from this change? Otherwise it seems like changing something that already is proven to work |
Not a game-changing x10 of course, but still something. Considering how many requests it does during a round, it should result into noticeable gain. |
1b07936 to
53b897e
Compare
53b897e to
eaae8ae
Compare
|
Updated to match current |
anderdc
left a comment
There was a problem hiding this comment.
Bench is convincing — happy to take this. Three asks before merge:
-
Mirror parity.
MirrorClientalready holds arequests.Session(gittensor/utils/mirror/client.py:44), but inevaluate_miners_pull_requests(reward.py:75-83)load_mirror_miner_prsandscore_mirror_miner_prseach defaultclient = client or MirrorClient(), so per miner there are two separate Session pools tomirror.gittensor.ioand no connection reuse across the load→score boundary — the same waste this PR fixes for the PAT path. Instantiate oneMirrorClient()inevaluate_miners_pull_requestsand thread it through both calls via the existingclient=kwarg. -
Accept-header bleed into GraphQL.
_build_sessionappliesmake_headers(token)tosession.headers, which setsAccept: application/vnd.github.v3+json. Inexecute_graphql_queryandget_github_graphql_querythe per-requestheaders=make_graphql_headers(token)only overridesAuthorization/Content-Type, so GraphQL POSTs now carry the v3 RESTAcceptthey didn't before. GitHub tolerates it, but it's an unintended drift. Either dropAcceptfrom_build_session(let call sites own it) or addAccept: application/jsontomake_graphql_headers. -
Cover `session_scope` directly. The new `tests/conftest.py` shim swaps `get_session` autouse, so the suite doesn't exercise `_session_cache`, the re-entrance guard, or close-on-exit. Add a small test: same Session returned for same token within a scope, distinct per token, re-entrance raises, sessions `.close()`d on exit. While there: the shim calls `make_headers(token)` unconditionally, but prod uses `make_anonymous_headers()` for `token=''` — minor test/prod drift on the unauth REST search fallback.
Validator scoring makes many GitHub REST and GraphQL requests per miner and each of those previously opened a new TCP socket and negotiated a new TLS session. This PR introduces a
requests.Session-based pool scoped to one miner's fetch iteration, so every http call within that iteration reuses the same kept-alive TCP connection.Re-using existing connections makes things bit faster and overall it's a good practice when you connect to the same host again and again.
Summary
session_scope()context manager holds a module-level_session_cache: Optional[Dict[str, requests.Session]], populated on enter and cleaned up (sessions.close()d, cache set back toNone) on exit.get_session(token)returns a per-PAT cachedSessioninside a scope and a fresh uncached session outside it. Tokenless calls route through the same helper viaget_session(token or '').make_anonymous_headers()mirrors the existingmake_headers/make_graphql_headerspattern for unauthenticated GitHub calls.session = get_session(token)session_scope()wrapsload_miners_prs+score_miner_prsinsideevaluate_miners_pull_requestsso one cache covers an entire miner's load+score cycle; connections close before the function returns.Testing
No new tests are introduced. Added the
tests/conftest.py, which installs an autouse_ForwardingSessionshim that delegates.get/.postback to the module-levelrequests.get/post, so existing dozens of tests which use@patch('requests.get'/'requests.post')keep working unchanged.