fix: add contributor churn analysis script#112
Conversation
exploreriii
left a comment
There was a problem hiding this comment.
Hi @prajeeta15
Thanks!
I wonder how much of the existing project utils we can use to abstract some of the logic in this script, so we have less to maintain over time as the repo scales
Also, can you push the output PNGs?
Signed-off-by: Prajeeta Pal <prajeetapal@gmail.com>
new changes made:
|
|
@exploreriii i dont have dev access for |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRuns contributor churn/progression analysis strictly from real GitHub data (requires GITHUB_TOKEN). Fetches merged PRs via GraphQL, maps issue labels to difficulty levels, computes per-author progression and transition metrics, filters Good First Issue starters, and emits CSV plus funnel and retention charts. ChangesContributor churn analysis + plotting
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 317249d4-7bea-4225-8a3f-81a07a800a37
⛔ Files ignored due to path filters (2)
outputs/charts/repo/hiero-ledger_hiero-sdk-python/contributor_churn_funnel.pngis excluded by!**/*.pngoutputs/charts/repo/hiero-ledger_hiero-sdk-python/contributor_retention.pngis excluded by!**/*.png
📒 Files selected for processing (1)
src/hiero_analytics/run_contributor_churn_analysis.py
MonaaEid
left a comment
There was a problem hiding this comment.
Nice work overall, I have a concern...the summary counts are based on max_level thresholds, which works for a funnel, but there are no computed transition metrics across all levels.. is this the final approach you plan to go with?
adding transition metrics to capture movement between levels across contributor journeys. this will show "how" the contributors progress, not just how many reach each stage.... |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: fede5a09-03a8-4428-9df8-55562f2ae243
📒 Files selected for processing (2)
src/hiero_analytics/analysis/contributor_churn.pysrc/hiero_analytics/run_contributor_churn_analysis.py
MonaaEid
left a comment
There was a problem hiding this comment.
lgtm, could you please do a commit with the updated PNGs? thanks
MonaaEid
left a comment
There was a problem hiding this comment.
lgtm. need a second opinion from @exploreriii @Adityarya11 @danielmarv
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: a7a7e511-9a44-4e19-8f6e-8276696cf7f6
⛔ Files ignored due to path filters (2)
outputs/charts/repo/hiero-ledger_hiero-sdk-python/contributor_churn_funnel.pngis excluded by!**/*.pngoutputs/charts/repo/hiero-ledger_hiero-sdk-python/contributor_retention.pngis excluded by!**/*.png
📒 Files selected for processing (2)
src/hiero_analytics/plotting/scatter.pysrc/hiero_analytics/run_contributor_churn_analysis.py
exploreriii
left a comment
There was a problem hiding this comment.
Good start, looking at this, I think it is fairly accurate what you have plotted!
For this reason we can merge as is, then make some tweaks.
Alternatively:
I would like to suggest strongly simplifying this.
Right now your code captures:
1- progression (e.g. beginner->advanced)
2- regression (e.g. advanced-->beginner)
3- prediction based on a hard coded rule
I think we can delete functionality doing 2 & 3
Advanced users with merged PRs have already demonstrated competence at that level, even if they change to beginner or etc, it still shows they completed an advanced PR and can work at that level
The prediction -- we can do this formally at a later date, by fitting and not hard coding a model
Recommended definition of progression that demonstrates the strength of the GFI pipeline:
A contributor progresses when they reach a higher level than any level they have previously reached. (this will include people that did not ever do a GFI)
or
A GFI starter progresses when, after their first Good First Issue PR, they reach a level with a higher rank than any level they previously reached. (possibly my preference?)
Core metrics to report
GFI starters
Progressed to Beginner+
Progressed to Intermediate+
Progressed to Advanced
You do log output, but the output is aggregated - there is no way to verify the names being recorded/etc are as expected - would suggest adding more logs or writing to a CSV before plotting from that.
What would you like to do?
I agree with simplifying the logic to focus exclusively on progression. Tracking the "highest level reached" is a more robust indicator of contributor growth than handling regressions or using hard-coded predictions. My recommendation for the next steps:
I would like to proceed with these logic updates and the addition of CSV output for better data transparency. |
|
Yap sounds great ! |
|
CodeRabbit chat interactions are restricted to organization members for this repository. Ask an organization member to interact with CodeRabbit, or set |
exploreriii
left a comment
There was a problem hiding this comment.
This logged output does not make sense given what you plotted
--- Contributor Churn Analysis ---
GFI Starters: 132 (100.0%)
Progressed to Beginner+: 39 (29.5%)
Progressed to Intermediate+: 24 (18.2%)
Progressed to Advanced: 6 (4.5%)
--- Level Transition Metrics ---
from to count
Beginner Advanced 1
Beginner Intermediate 12
Good First Issue Advanced 1
Good First Issue Beginner 25
Good First Issue Intermediate 14
Intermediate Advanced 10
GFI to advanced = 1, why plot 4?
There's also a few unknowns
prajeeta15,"['Good First Issue', 'Unknown', 'Unknown', 'Intermediate', 'Intermediate']",2025-10-22 21:59:46+00:00,2025-12-29 10:26:56+00:00,5,Intermediate,Good First Issue,67
has exploreriii never completed a GFI?
|
contributor_transitions.png: Shows the specific upward paths contributors took. |
exploreriii
left a comment
There was a problem hiding this comment.
Thank you very much @prajeeta15
You have added some new functionality and thank you for making the corrections.
I have some questions how we are getting some users going from beginner->advanced, etc and not gfi->advanced, so would want to learn more about the pipeline as it sees their first issue-linked, difficulty-labeled, merged PR, and then start level is defined as the first non unknown level. Or perhaps i'm not fully understanding that chart as it shows different data from contributor churn funnel
In this case, we can merge and then open a new issue to investigate
exploreriii
left a comment
There was a problem hiding this comment.
Could you please correct the DCO signing then this can be merged
subsequent issues to investigate either in this pr or a next pr:
Mounil2005, aceppaluni, emiliyank , exploreriii, Akshat8510 <-- i think these started from beginner/intermediate but actually started from unknown (before we had the issue requirement bots it seems)
looks like some issues are being labelled post closure, which will bypass the issue guard bots. maybe we should skip these labels in this case as was probably a tidy up for issue by difficulty
Title : Add contributor churn analysis and progression visualization
issue : #76
Description :
This PR introduces a contributor churn analysis pipeline to measure how users progress through contribution difficulty levels:
Good First Issue → Beginner → Intermediate → AdvancedThe implementation focuses on quantifying drop-offs, conversion rates, and progression patterns, along with a clear, high-level presentation of the data.
Changes Made :
src/hiero_analytics/run_contributor_churn_analysis.pyImpact :