gh-repo-stats-plus

A GitHub CLI extension for gathering comprehensive repository statistics from GitHub organizations. This TypeScript implementation builds upon the solid foundation of mona-actions/gh-repo-stats, adding modern features and performance improvements for enterprise-scale repository analysis.

🚀 Quick Start

Install the extension:

gh extension install mona-actions/gh-repo-stats-plus

Authenticate with GitHub:
```
gh auth login
```

Collect repository statistics:

gh repo-stats-plus repo-stats --org-name my-org

The tool will generate a CSV file with comprehensive repository statistics in the ./output/ directory (or a custom directory you specify).

Key Features

This TypeScript rewrite offers several advantages:

Octokit SDK Integration: Built on GitHub's official Octokit.js SDK, providing:
- Token renewal
- Built-in retries
- Rate limit handling
- Pagination
- GraphQL and REST API support
Streaming Processing with Async Generators: Writes results incrementally as they're processed rather than collecting everything up front, resulting in better memory management and reliability.
State Persistence with Multi-Organization Support: Saves processing state to organization-specific files (e.g., last_known_state_<org>.json) after each successful repository, storing the current cursor position and processed repositories. Each organization maintains its own isolated state, allowing sequential or parallel processing of multiple organizations without conflicts.
Resume Capability: Can resume operations from the last saved state in case of interruptions or failures.
Smart Duplicate Avoidance: Skips already processed repositories when resuming to prevent duplicates and save processing time.
Advanced Retry Logic: Implements exponential backoff strategy for retries to gracefully handle rate limits and transient errors.
Enhanced Debugging: Easier to debug and maintain with modern TypeScript development tools like VS Code.
Comprehensive Logging: Detailed logs stored in log files for later review and troubleshooting.
Missing Repositories Detection: Dedicated command to identify repositories that might have been missed during processing.
Configurable Output Directory: Control where output files and state files are saved with the --output-dir option (defaults to ./output/) for organized file management.
Project Stats Tracking: Counts unique ProjectsV2 linked to repositories via issues and directly, based on jcantosz/Count-repo-projects.
Batch Processing: Split large organizations into parallel batches using --batch-size and --batch-index, ideal for GitHub Actions matrix strategies. Includes a combine-stats command to merge batch results. See the Batch Processing Guide.
CSV Post-Processing: Transform and standardize CSV data using configurable rules for pattern matching, value replacement, and indicator column generation. Based on jcantosz/generate-repo-report/post-process. See the Post-Process Command Reference.
Rows-to-Columns Pivot: Convert rows from an additional CSV (e.g., migration audit data) into new columns in a base CSV by matching rows and pivoting values. Based on jcantosz/generate-repo-report/rows-to-columns. See the Rows-to-Columns Command Reference.
Package Stats: Retrieve package statistics (Maven, npm, Docker, NuGet, RubyGems, PyPI) for organizations, including version counts, file counts, download counts, and total storage sizes. Based on scottluskcis/gh-data-fetch. See the Package Stats Command Reference.
Codespace Stats: Retrieve codespace usage statistics for organizations, including machine details (CPU, memory, storage), ownership information, and lifecycle timestamps. Based on scottluskcis/gh-data-fetch. See the Codespace Stats Command Reference.

Technical Implementation

The extension is built using modern TypeScript patterns with:

Async Generators for streaming large datasets
Retry Logic with exponential backoff
Rate Limit Handling via GitHub Octokit SDK
State Persistence for resumable operations
Comprehensive Logging with Winston
Type Safety throughout the codebase
On-demand Building for clean installation without pre-built artifacts

GitHub Action

This project also ships as a GitHub Action so you can gather repository statistics directly from your workflows:

- name: Gather Repository Stats
  uses: mona-actions/gh-repo-stats-plus@v1
  with:
    github-token: ${{ github.token }}
    access-token: ${{ secrets.ACCESS_TOKEN }}
    organization: my-org
    repository: my-repo

The action supports all stats types (repository, organization, project-stats, app-install-stats, migration-audit, combine), batch processing with matrix strategy, resume on failure, post-processing, rows-to-columns pivoting, and GitHub Enterprise environments.

See the GitHub Action documentation for full inputs/outputs reference, authentication options, and example workflows.

Documentation

Guide	Description
Installation	Prerequisites and installation methods
Usage Guide	Authentication and usage examples
Commands	Complete command reference
GitHub Action	Using as a GitHub Action in workflows
LFS Sizing	Git LFS storage analysis per repo
Development	Setup and development workflow
Batch Processing	Parallel batch processing with GitHub Actions
Post-Processing	CSV transformation with configurable rules
Rows-to-Columns	Pivot additional CSV rows into columns
Package Stats	Retrieve package statistics for organizations
Codespace Stats	Retrieve codespace usage for organizations

Common Usage Examples

Basic Organization Analysis

# Generate repository statistics (output saved to ./output/ directory)
gh repo-stats-plus repo-stats --org-name my-org

Multiple Organizations

Process multiple organizations from a single file:

# Create an org list file (one org per line)
cat > orgs.txt << EOF
Org1
Org2
Org3
EOF

# Process all organizations with a single command
gh repo-stats-plus repo-stats --org-list orgs.txt

# Add delays between organizations (default: 5 seconds)
gh repo-stats-plus repo-stats --org-list orgs.txt --delay-between-orgs 10

# Continue processing other orgs if one fails
gh repo-stats-plus repo-stats --org-list orgs.txt --continue-on-error

# Combine options
gh repo-stats-plus repo-stats \
  --org-list orgs.txt \
  --delay-between-orgs 10 \
  --continue-on-error \
  --output-dir ./reports

Note

Organizations are processed strictly sequentially. This design choice is intentional to respect GitHub API rate limits and provide predictable resource usage. For large organization lists, consider the configurable delay between organizations and the estimated processing time logged at startup.

Or process organizations individually:

# Process multiple organizations sequentially (each maintains its own state)
gh repo-stats-plus repo-stats --org-name org1
gh repo-stats-plus repo-stats --org-name org2
gh repo-stats-plus repo-stats --org-name org3

# Use custom output directory (state files are stored here too)
gh repo-stats-plus repo-stats --org-name my-org --output-dir ./reports

# Clean up state file after successful completion
gh repo-stats-plus repo-stats --org-name my-org --clean-state

Custom Output Directory

# Save output files to a custom directory
gh repo-stats-plus repo-stats --org-name my-org --output-dir /path/to/my/reports

# Use relative path from current directory
gh repo-stats-plus repo-stats --org-name my-org --output-dir reports

Resume Long-Running Collection

gh repo-stats-plus repo-stats --org-name my-org --resume-from-last-save

High-Volume Processing with GitHub App

gh repo-stats-plus repo-stats \
  --org-name my-org \
  --app-id 12345 \
  --private-key-file app.pem \
  --app-installation-id 67890 \
  --output-dir /path/to/reports

Find and Process Missing Data

# Check for missing repositories (looks for CSV in ./output/ by default)
gh repo-stats-plus missing-repos --org-name my-org --file results.csv

# Use custom output directory for missing repos check
gh repo-stats-plus missing-repos \
  --org-name my-org \
  --file results.csv \
  --output-dir /path/to/reports

# Auto-process missing repositories
gh repo-stats-plus repo-stats --org-name my-org --auto-process-missing

Batch Processing

Split a large organization into parallel batches (e.g., for GitHub Actions matrix jobs):

# Use a dedicated directory for this workflow/run to avoid mixing CSVs from other commands
RUN_OUTPUT_DIR="output/run-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$RUN_OUTPUT_DIR"

# Process batch 0 of 50 repos each
gh repo-stats-plus repo-stats \
  --org-name my-org \
  --batch-size 50 \
  --batch-index 0 \
  --output-dir "$RUN_OUTPUT_DIR"

# Combine only this run's batch CSV files after all batches complete
gh repo-stats-plus combine-stats \
  --files "$RUN_OUTPUT_DIR"/*.csv \
  --output-dir "$RUN_OUTPUT_DIR" \
  --output-file-name combined-stats.csv

See the Batch Processing Guide for complete GitHub Actions workflow examples.

Project Statistics

# Count ProjectsV2 linked to repositories via issues
gh repo-stats-plus project-stats --org-name my-org

# Process specific repos from a file
gh repo-stats-plus project-stats --org-name my-org --repo-list repos.txt

# Multiple organizations
gh repo-stats-plus project-stats --org-list orgs.txt --continue-on-error

# Resume interrupted processing
gh repo-stats-plus project-stats --org-name my-org --resume-from-last-save

Post-Processing

# Post-process a combined CSV with rules
gh repo-stats-plus post-process \
  --input output/combined-stats.csv \
  --rules-file post-process.rules.json

# Custom output file and directory
gh repo-stats-plus post-process \
  --input output/combined-stats.csv \
  --rules-file rules.json \
  --output-file-name final-report.csv \
  --output-dir ./reports

See the Post-Process Command Reference for rules configuration details and examples. A sample rules file is included at docs/examples/post-process.rules.json.

Rows-to-Columns

# Combine repo-stats with migration audit data
gh repo-stats-plus rows-to-columns \
  --base-csv-file output/combined-stats.csv \
  --additional-csv-file output/migration-audit.csv \
  --header-column-keys type \
  --header-column-values message

# Custom column mappings and output
gh repo-stats-plus rows-to-columns \
  --base-csv-file output/stats.csv \
  --additional-csv-file output/audit.csv \
  --header-column-keys type \
  --header-column-values message \
  --base-csv-columns Org_Name,Repo_Name \
  --additional-csv-columns owner,name \
  --output-file-name final-report.csv

See the Rows-to-Columns Command Reference for details on how values are parsed and examples.

Repo Stats Options

Organization Selection (one required):

-o, --org-name <org>: Process a single organization
--org-list <file>: Process multiple organizations from a file (one org per line)

Multi-Organization Options:

--delay-between-orgs <seconds>: Delay between processing organizations (Default: 5)
--continue-on-error: Continue processing other organizations if one fails

Authentication:

-t, --access-token <token>: GitHub access token
--app-id <id>: GitHub App ID
--private-key <key>: GitHub App private key
--private-key-file <file>: Path to GitHub App private key file
--app-installation-id <id>: GitHub App installation ID

Processing Options:

--resume-from-last-save: Resume from the last saved state
--repo-list <file>: Path to file containing list of repositories to process (format: owner/repo_name)
--auto-process-missing: Automatically process any missing repositories when main processing is complete
--clean-state: Remove state file after successful completion

Batch Processing:

--batch-size <size>: Number of repositories per batch
--batch-index <index>: Zero-based index of the batch to process
--batch-delay <seconds>: Delay before starting a batch (multiplied by batch index to stagger parallel runs)

Configuration:

-u, --base-url <url>: GitHub API base URL (Default: https://api.github.com)
--proxy-url <url>: Proxy URL if required
--output-dir <dir>: Output directory for generated files (Default: ./output)
-v, --verbose: Enable verbose logging

Performance Tuning:

--page-size <size>: Number of items per page (Default: 10)
--extra-page-size <size>: Extra page size (Default: 50)
--rate-limit-check-interval <seconds>: Interval for rate limit checks (Default: 60)
--retry-max-attempts <attempts>: Maximum number of retry attempts (Default: 3)
--retry-initial-delay <milliseconds>: Initial delay for retry (Default: 1000)
--retry-max-delay <milliseconds>: Maximum delay for retry (Default: 30000)
--retry-backoff-factor <factor>: Backoff factor for retry delays (Default: 2)
--retry-success-threshold <count>: Successful operations before resetting retry count (Default: 5)

Project Stats Options

The project-stats command supports the same authentication, retry, multi-org, and processing options as repo-stats above, with these differences:

--page-size <size>: Number of issues per page (Default: 100)
No --extra-page-size or --auto-process-missing options

See the Commands Reference for the complete list of project-stats options.

Permissions

The permissions needed by repo-stats-ts depends on the authentication method:

For Personal Access Token (PAT)

repo: Full control of private repositories
read:org: Read organization membership
read:project: Read project information
read:user: Read user information

For GitHub App

The app requires Read-only permissions to the following:

Repository Administration
Repository Contents
Repository Issues
Repository Metadata
Repository Projects
Repository Pull requests
Organization Members

Output

The tool generates:

A CSV file with repository statistics (or project statistics for the project-stats command)
A last_known_state.json file with the current processing state
Log files in the logs/ directory

CSV Output Columns

The CSV output includes detailed information about each repository:

Org_Name: Organization login
Repo_Name: Repository name
Is_Empty: Whether the repository is empty
Last_Push: Date/time when a push was last made
Last_Update: Date/time when an update was last made
isFork: Whether the repository is a fork
isArchived: Whether the repository is archived
isTemplate: Whether the repository is a template repository
Visibility: Repository visibility (e.g., PUBLIC, PRIVATE, INTERNAL)
Repo_Size_mb: Size of the repository in megabytes
Record_Count: Total number of database records this repository represents
Collaborator_Count: Number of users who have contributed to this repository
Protected_Branch_Count: Number of branch protection rules on this repository
Ruleset_Count: Number of rulesets that apply to this repository, inclusive of active rulesets defined at the organization level
PR_Review_Count: Number of pull request reviews
Milestone_Count: Number of issue milestones
Issue_Count: Number of issues
PR_Count: Number of pull requests
PR_Review_Comment_Count: Number of pull request review comments
Commit_Comment_Count: Number of commit comments
Issue_Comment_Count: Number of issue comments
Issue_Event_Count: Number of issue events
Release_Count: Number of releases
Project_Count: Number of projects
Branch_Count: Number of branches
Tag_Count: Number of tags
Discussion_Count: Number of discussions
Star_Count: Number of stargazers
Fork_Count: Number of forks
Watcher_Count: Number of watchers
Has_Wiki: Whether the repository has wiki feature enabled
Has_LFS: Whether the repository has Git LFS tracking configured (see LFS Detection Limitations)
Default_Branch: Name of the default branch
Primary_Language: Primary programming language of the repository
Languages: Semicolon-separated list of languages with usage percentages (e.g., TypeScript:85.2%;JavaScript:14.8%)
License: License identifier (SPDX ID or name)
Topics: Semicolon-separated list of repository topics
Description: Repository description
Homepage_URL: Repository homepage URL
Auto_Merge_Allowed: Whether auto-merge is enabled for pull requests
Delete_Branch_On_Merge: Whether branches are automatically deleted after merging
Merge_Commit_Allowed: Whether merge commits are allowed
Squash_Merge_Allowed: Whether squash merging is allowed
Rebase_Merge_Allowed: Whether rebase merging is allowed
Full_URL: Repository URL
Migration_Issue: Indicates whether the repository might have problems during migration due to:
- 60,000 or more objects being imported
- 1.5 GB or larger size on disk
Created: Date/time when the repository was created

LFS Detection Limitations

The Has_LFS column indicates whether the repository's .gitattributes file on the default branch contains filter=lfs entries. This is a lightweight check performed as part of the existing GraphQL query with no additional API calls.

Limitations to be aware of:

Default branch only: The check reads .gitattributes from HEAD (the default branch). LFS tracking configured only on other branches will not be detected.
Root .gitattributes only: Nested .gitattributes files in subdirectories are not inspected.
Detection, not sizing: This column only indicates whether LFS is configured — it does not report the number or size of LFS objects.
Empty repositories: Empty repositories will always report FALSE since there is no .gitattributes file to read.

For actual LFS sizing, use the standalone script/lfs-size.sh script to inspect individual repositories where Has_LFS is TRUE. This performs a shallow bare clone and reports per-file LFS sizes and totals. See the LFS Sizing Guide for prerequisites and usage.

Project Stats CSV Output Columns

The project-stats command generates a separate CSV file with the following columns:

Org_Name: Organization login
Repo_Name: Repository name
Issues_Linked_To_Projects: Number of issues that have at least one linked ProjectV2
Unique_Projects_Linked_By_Issues: Count of distinct ProjectV2 items found across all issues
Projects_Linked_To_Repo: Total count of projects directly associated with the repository

🛠️ Development Quick Start

git clone https://github.com/mona-actions/gh-repo-stats-plus.git
cd gh-repo-stats-plus
npm install
npm run build
npm test

See the Development Guide for detailed setup instructions.

Requirements

Node.js 20 or later
GitHub CLI (latest version recommended)
GitHub Authentication (personal token, GitHub App, or GitHub CLI)

Contributing

We welcome contributions! Please see our Development Guide for setup instructions and guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
.vscode		.vscode
__mocks__		__mocks__
__tests__		__tests__
action		action
docs		docs
script		script
src		src
.copilotignore		.copilotignore
.env.example		.env.example
.gitignore		.gitignore
.markdown-lint.yml		.markdown-lint.yml
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.yaml-lint.yml		.yaml-lint.yml
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
eslint.config.js		eslint.config.js
gh-repo-stats-plus		gh-repo-stats-plus
package-lock.json		package-lock.json
package.json		package.json
rollup.config.ts		rollup.config.ts
tsconfig.base.json		tsconfig.base.json
tsconfig.eslint.json		tsconfig.eslint.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

gh-repo-stats-plus

🚀 Quick Start

Key Features

Technical Implementation

GitHub Action

Documentation

Common Usage Examples

Basic Organization Analysis

Multiple Organizations

Custom Output Directory

Resume Long-Running Collection

High-Volume Processing with GitHub App

Find and Process Missing Data

Batch Processing

Project Statistics

Post-Processing

Rows-to-Columns

Repo Stats Options

Project Stats Options

Permissions

For Personal Access Token (PAT)

For GitHub App

Output

CSV Output Columns

LFS Detection Limitations

Project Stats CSV Output Columns

🛠️ Development Quick Start

Requirements

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages