Skip to content

Feature Request: Support Git Submodules with Per-Submodule Commit References #326

@dreambuildor

Description

@dreambuildor

Summary

Currently, CodeTour does not support repositories that use Git submodules.
When a project contains submodules, tour steps that reference files inside
submodule directories cannot properly track code changes, because the ref
field only records the parent repository's commit SHA, not the individual
submodule commits.

Motivation

Git submodules are widely used in large projects to organize code across
multiple repositories. A common example is a monorepo-style setup where:

project-root/
├── .gitmodules
├── src/                  # parent repo code
├── libs/
│   ├── foo/              # submodule A (has its own commit history)
│   └── bar/              # submodule B (has its own commit history)

When creating a CodeTour for such a project, steps may reference files
from both the parent repo and submodules. However, since each submodule
maintains its own independent commit history, the current single ref
field is insufficient to accurately pin the exact version of code being
referenced in submodule files.

This means that after code changes, the "view at ref" feature breaks
for submodule files
, making the tour outdated and unreliable.

Expected Behavior

CodeTour should:

  1. Detect submodules in the repository by reading .gitmodules
  2. Automatically record each submodule's commit SHA when a tour is
    created or a step is added
  3. Correctly restore the referenced version of submodule files when
    viewing a tour step, using the submodule's own commit SHA

Proposed Schema Change

To maintain backward compatibility, I suggest adding an optional
submoduleRefs field at the tour level:

Current schema:

{
  "title": "My Tour",
  "ref": "abc1234",         // parent repo commit SHA
  "steps": [
    { "file": "src/main.ts", "line": 10, "description": "..." },
    { "file": "libs/foo/bar.ts", "line": 42, "description": "..." }
  ]
}

Proposed schema:

{
  "title": "My Tour",
  "ref": "abc1234",         // parent repo commit SHA (unchanged)
  "submoduleRefs": {        // NEW: per-submodule commit SHAs
    "libs/foo": "def5678",
    "libs/bar": "ghi9012"
  },
  "steps": [
    { "file": "src/main.ts", "line": 10, "description": "..." },
    { "file": "libs/foo/bar.ts", "line": 42, "description": "..." }
  ]
}

Why this approach:

  • ✅ Fully backward compatible (existing .tour files are unaffected)
  • ✅ Aligns with how Git natively tracks submodules
    (parent repo stores submodule commit SHA in its tree object)
  • ✅ Minimal schema change
  • ✅ No need for step-level changes; submodule membership can be
    inferred from file path + .gitmodules

Rough Implementation Idea

  1. On tour/step creation:

    • Parse .gitmodules to get all submodule paths
    • For each step's file path, check if it falls under a submodule path
    • Run git rev-parse HEAD inside each relevant submodule directory
    • Store the results in submoduleRefs
  2. On tour playback (view at ref):

    • For a given step's file, determine if it belongs to a submodule
    • If yes, use submoduleRefs[submodulePath] as the ref for that file
    • If no, use the existing top-level ref as before

Additional Considerations

  • If a submodule has not been initialized (git submodule update --init),
    CodeTour could show a warning rather than failing silently
  • submoduleRefs could potentially be updated incrementally as new steps
    are added to a tour

Willingness to Contribute

I am willing to submit a Pull Request for this feature if the maintainers
agree with the proposed direction. Happy to discuss any design concerns
before starting implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions