Skip to content

Commit 3b6071c

Browse files
abhijnakhorne3
andauthored
[TEC-545] Customer facing docs for how contributors are counted (#2545)
* How contributors are counted * bots language softened * Update docs/usage-and-billing/contributors.md Co-authored-by: Katie Horne <katie.horne@semgrep.com> * Update docs/usage-and-billing/contributors.md Co-authored-by: Katie Horne <katie.horne@semgrep.com> * Update docs/usage-and-billing/contributors.md Co-authored-by: Katie Horne <katie.horne@semgrep.com> * Update docs/usage-and-billing/contributors.md Co-authored-by: Katie Horne <katie.horne@semgrep.com> * Update docs/usage-and-billing/contributors.md Co-authored-by: Katie Horne <katie.horne@semgrep.com> --------- Co-authored-by: Katie Horne <katie.horne@semgrep.com>
1 parent 18eaae9 commit 3b6071c

File tree

3 files changed

+76
-1
lines changed

3 files changed

+76
-1
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
slug: contributor-count-explained
3+
append_help_link: true
4+
title: Calculate Semgrep contributor count
5+
description: Learn how Semgrep calculates contributor count, including deduplication, bot filtering, and repository visibility rules.
6+
displayed_sidebar: aboutSidebar
7+
tags:
8+
- Support
9+
- Semgrep AppSec Platform
10+
hide_title: true
11+
---
12+
13+
# How Semgrep calculates contributor count
14+
15+
This page explains how Semgrep calculates contributor count beyond the [basic billing definition](/usage-and-billing/overview#contributor-counts). It is intended to help explain why Semgrep's contributor count may differ from your organization's internal estimate, and how Semgrep reduces double-counting when using repository history.
16+
17+
## Why contributor counts can be hard to calculate
18+
19+
Raw commit history does not always cleanly map to unique people. The same contributor can appear under multiple identities over time, including:
20+
- Multiple company email addresses
21+
- Email aliases or formatting variations
22+
- GitHub-generated noreply addresses used in merge commits
23+
24+
Repository history can also include bots and automation accounts that should not count as human contributors. To make contributor counts more accurate, Semgrep applies normalization, deduplication, and filtering steps to the underlying commit data.
25+
26+
## How Semgrep reduces double-counting
27+
28+
Semgrep uses commit metadata from scanned repositories to identify likely duplicate identities and count them once.
29+
30+
This process can include:
31+
- normalizing common email variations
32+
- matching contributors who appear under multiple company domains
33+
- resolving GitHub noreply addresses back to known contributor identities when possible
34+
35+
The goal is to better reflect distinct human contributors rather than counting every raw identity in commit history as a separate person.
36+
37+
## How Semgrep handles personal email addresses
38+
39+
Personal email addresses sometimes appear in repository history alongside company-managed identities. Personal emails are weak identifiers and are harder to match reliably across environments. Semgrep applies some filtering rules to reduce overcounting and also keeps a pre-filtered version of the data for auditing and comparison.
40+
41+
- If the primary domain for the deployment is a company domain, Semgrep does not count contributors who appear only with personal email addresses. It still counts contributors who have at least one company email address.
42+
- If the primary domain for the deployment is a personal email domain, such as gmail.com, Semgrep counts only contributors whose email matches that domain. It does not count contributors who appear only with other personal email domains.
43+
- If Semgrep cannot identify a primary domain, it does not apply personal email filtering.
44+
45+
46+
## How Semgrep handles bots and automation accounts
47+
48+
49+
Contributor count is intended to measure human contributors, not automated systems. Semgrep excludes known bot and automation accounts from the calculation using maintained exclusion lists informed by bot-related patterns in commit metadata.
50+
51+
## Public and private repositories
52+
53+
Public GitHub repositories that are explicitly set to be visible to everyone are excluded from contributor count calculations.
54+
55+
All GitHub Enterprise Server repositories are treated as private for this purpose, regardless of visibility.
56+
57+
## Why your internal estimate might differ
58+
59+
Your internal estimate of contributors may differ from Semgrep’s for the following reasons:
60+
- One person appears under multiple identities in commit history
61+
- Bots or service accounts are present in raw repository data
62+
- Public repositories are excluded
63+
- Personal email addresses cannot always be matched reliably
64+
- Limited git history reduces the set of visible contributors
65+
66+
Because of this, contributor count should be understood as a usage metric based on observed repository activity over a defined period.
67+
68+
## Why git history matters
69+
70+
Contributor count depends on the commit history available at scan time. If a checkout includes limited history, Semgrep might not see every contributor active during the full 90-day lookback window.
71+
72+
## Questions about your contributor count
73+
74+
If you have questions about your contributor count, contact [Semgrep support](/support) or your account manager

docs/usage-and-billing/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Semgrep calculates contributor counts using information from the `git log` over
3636
- The date of your license purchase
3737
- The date of your account creation, if you and your team are within usage limits
3838

39-
**Bots** and other automations are excluded from the contributor count.
39+
Semgrep tries to exclude **bots** and other automations as much as possible. Learn more about [how Semgrep calculates contributors](/docs/usage-and-billing/contributor-count-explained).
4040

4141
#### Contributor usage across multiple Semgrep organizations
4242

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -757,6 +757,7 @@ module.exports = {
757757
items: [
758758
'deployment/claim-a-license',
759759
'usage-and-billing/plan-changes-and-payments',
760+
'usage-and-billing/contributors',
760761
'usage-and-billing/reconciliation'
761762
]
762763
},

0 commit comments

Comments
 (0)