This document provides technical details about the codebase, CLI reference, architecture, and implementation specifics.
- CLI Reference
- Architecture
- Project Structure
- Configuration System
- Transport Layer
- Conversion Pipeline
- Building from Source
./markdowninthemiddle [flags]Description: Start the HTTP/HTTPS forward proxy with optional HTML-to-Markdown conversion.
Flags:
| Flag | Type | Default | Description |
|---|---|---|---|
--config |
string | ./config.yml |
Path to config file |
--addr |
string | :8080 |
Proxy listen address |
--tls |
bool | false |
Enable TLS on proxy listener |
--auto-cert |
bool | false |
Auto-generate self-signed certificate |
--tls-insecure |
bool | false |
Skip upstream TLS verification |
--negotiate-only |
bool | false |
Only convert when client requests Markdown |
--cache-dir |
string | `` | Enable HTML caching to directory |
--output-dir |
string | `` | Write Markdown files to directory |
--max-body-size |
int64 | 10485760 |
Max response body size (bytes) |
--transport |
string | http |
Transport type: http or chromedp |
--convert-json |
bool | false |
Enable JSON-to-Markdown conversion |
--template-dir |
string | `` | Directory with Mustache templates |
--allow |
[]string | `` | Regex patterns for allowed URLs (repeatable) |
Generate a self-signed TLS certificate.
./markdowninthemiddle gencert [flags]Flags:
| Flag | Type | Default | Description |
|---|---|---|---|
--host |
string | localhost |
Hostname/IP for certificate |
--dir |
string | ./certs |
Output directory for cert/key |
Example:
./markdowninthemiddle gencert --host myhost.local --dir ./certs
# Creates: ./certs/cert.pem and ./certs/key.pemRequest → Filter → Transport → Response Processor → Client
(regex) (http/cdp) (decompress, convert,
cache, count tokens)
-
Filter Middleware - Check if URL matches allow-list (if configured)
- Returns 403 Forbidden if blocked
- Continues to transport if allowed or no filter
-
Transport Selection - Choose between HTTP or chromedp
- HTTP - Standard
net.Dial+ TLS for upstream - chromedp - Headless Chrome browser pool with semaphore
- HTTP - Standard
-
Response Processing - PostProcessor middleware
- Decompress body (
gzip,deflate) - Check content-type and size limits
- Cache HTML to disk (if enabled)
- Convert HTML → Markdown (if applicable)
- Count tokens via TikToken
- Write Markdown to files (if enabled)
- Add
X-Token-Countheader - Add
Vary: acceptheader
- Decompress body (
- Proxy listener - Single goroutine per connection (Go HTTP server)
- Browser pool - Semaphore-bounded concurrent tabs (chromedp)
- Default: 5 concurrent tabs
- Each tab: ~20-50MB memory
- Configurable via
pool_size
markdowninthemiddle/
├── go.mod, go.sum # Dependencies
├── README.md # Project overview
├── main.go # Entry point
│
├── docs/ # Documentation
│ ├── HTTPS_SETUP.md # TLS/HTTPS configuration
│ ├── CHROMEDP.md # JavaScript rendering setup
│ ├── CODE_DETAILS.md # This file
│ ├── DOCKER.md # Docker deployment guide
│ ├── MCP_SERVER.md # MCP server integration
│ ├── MITM_SETUP.md # Client certificate setup
│ ├── MITM_IMPLEMENTATION.md # MITM technical details
│ ├── PACKAGING.md # Release and packaging
│ └── UPSTREAM_PROXY.md # Upstream proxy configuration
│
├── docker/ # Docker configuration
│ ├── README.md # Docker quick start
│ ├── Dockerfile # Multi-stage build
│ ├── docker-compose.yml # Services (proxy + Chrome + MCP)
│ ├── .dockerignore # Docker build exclude patterns
│ └── .env.example # Example environment variables
│
├── examples/ # Example configurations
│ ├── README.md # Examples guide
│ ├── config.example.yml # Example configuration file
│ └── mustache-templates/ # JSON-to-Markdown templates
│ ├── _default.mustache # Generic JSON template
│ └── api.github.com__users.mustache # GitHub API example
│
├── scripts/
│ ├── start-chrome.sh # macOS/Linux Chrome launcher
│ ├── start-chrome.bat # Windows Chrome launcher
│ └── docker-compose.sh # Docker helper script
│
├── cmd/
│ ├── main.go # Cobra root command
│ ├── root.go # Proxy startup logic
│ ├── mcp.go # MCP server command
│ └── gencert.go # Certificate generation
│
├── certs/ # Runtime TLS certificates (generated)
├── output/ # Runtime markdown output (if enabled)
│
└── internal/
├── banner/
│ └── banner.go # ASCII art startup banner
│
├── browser/
│ ├── browser.go # chromedp pool (http.RoundTripper)
│ └── browser_test.go # Browser pool tests
│
├── cache/
│ ├── cache.go # Disk cache with RFC 7234
│ └── cache_test.go # Cache tests
│
├── certs/
│ ├── certs.go # TLS cert generation/loading
│ └── certs_test.go # Cert tests
│
├── config/
│ ├── config.go # Viper config loader
│ └── config_test.go # Config tests
│
├── converter/
│ ├── converter.go # HTML→Markdown conversion
│ └── converter_test.go # Converter tests
│
├── filter/
│ ├── filter.go # Regex URL filtering
│ └── filter_test.go # Filter tests
│
├── middleware/
│ ├── logger.go # Custom request logging
│ ├── decompress.go # Content-Encoding decompression
│ └── middleware.go # Response processing RoundTripper
│
├── output/
│ ├── output.go # Markdown file writer
│ └── output_test.go # Output tests
│
├── proxy/
│ ├── proxy.go # Chi HTTP server/router
│ └── proxy_test.go # Proxy tests
│
├── templates/
│ ├── templates.go # Mustache template loader
│ └── templates_test.go # Template tests
│
└── tokens/
├── tokens.go # TikToken counter wrapper
└── tokens_test.go # Token counter tests
- CLI Flags - Command-line arguments
- Environment Variables -
MITM_prefixed (underscore for nesting) - config.yml - Local YAML file
- Built-in Defaults - Hardcoded defaults in code
type Config struct {
Proxy ProxyConfig // Listener settings
TLS TLSConfig // TLS/HTTPS settings
Conversion ConversionConfig // HTML conversion settings
MaxBodySize int64 // Body size limit
Cache CacheConfig // Disk cache settings
Output OutputConfig // Markdown file output
LogLevel string // Log level (info, warn, error, debug)
Transport TransportConfig // HTTP vs chromedp
Filter FilterConfig // Request filtering
}MITM_PROXY_ADDR → proxy.addr
MITM_TLS_ENABLED → tls.enabled
MITM_TLS_AUTO_CERT → tls.auto_cert
MITM_CONVERSION_ENABLED → conversion.enabled
MITM_TRANSPORT_TYPE → transport.type
MITM_TRANSPORT_CHROMEDP_URL → transport.chromedp.url
MITM_CACHE_ENABLED → cache.enabled
MITM_FILTER_ALLOWED → filter.allowed (comma-separated)
- Package:
net/http - Implementation: Standard library
http.Transport - Features:
- Keep-alive connections
- Connection pooling
- Automatic decompression (disabled in proxy)
- TLS with configurable min version
&http.Transport{
TLSClientConfig: &tls.Config{
MinVersion: tls.VersionTLS12,
InsecureSkipVerify: opts.TLSInsecure,
},
DisableCompression: false,
IdleConnTimeout: 90 * time.Second,
}- Package:
github.com/chromedp/chromedp - Implementation:
browser.Pool(implementshttp.RoundTripper) - Features:
- WebSocket connection to Chrome DevTools Protocol (CDP)
- Semaphore-bounded concurrent tabs
- Automatic tab cleanup after request
- Exponential backoff on startup
Pool Configuration:
type Pool struct {
allocCtx context.Context // Parent context
allocCancel context.CancelFunc // Cancel function
sem *semaphore.Weighted // Concurrency limiter
timeout time.Duration // Request timeout
wsURL string // Chrome DevTools URL
healthy bool // Health flag
}RoundTrip Flow:
- Acquire semaphore slot (wait if at limit)
- Create child context with timeout
- Create chromedp context (opens new tab)
- Navigate to URL
- Wait for
bodyelement - Extract outer HTML
- Release semaphore slot
- Return response
Library: github.com/JohannesKaufmann/html-to-markdown
Process:
-
Check Content-Type
- Only process
text/htmlresponses - Respect
negotiate_onlyflag (checkAccept: text/markdown)
- Only process
-
Decompress
- Handle
gzipanddeflateencoding - Preserve original for caching
- Handle
-
Convert
- Parse HTML with
html.Parse - Walk DOM tree and convert to Markdown
- Handle tables, lists, links, images, etc.
- Parse HTML with
-
Count Tokens
- Use TikToken library to count tokens
- Default encoding:
cl100k_base(GPT-4/Claude) - Add to
X-Token-Countheader
-
Cache & Output
- Cache original HTML (if enabled, respects RFC 7234)
- Write Markdown to files (if enabled)
- Add cache headers to response
Library: github.com/pkoukk/tiktoken-go
Encoding Options:
cl100k_base- GPT-4, Claudep50k_base- GPT-3.5r50k_base- Older models
Counting Method:
enc, _ := tiktoken.GetEncoding("cl100k_base")
tokens := enc.EncodeOrdinary(text)
count := len(tokens)- Go 1.21 or later
- git
- (Optional) Docker for containerized build
# Clone repository
git clone https://github.com/rickcrawford/markdowninthemiddle.git
cd markdowninthemiddle
# Download dependencies
go mod download
# Build binary
go build -o markdowninthemiddle .
# Verify build
./markdowninthemiddle --version# Strip debug info and reduce binary size
go build -ldflags="-s -w" -o markdowninthemiddle .# macOS binary on Linux
GOOS=darwin GOARCH=arm64 go build -o markdowninthemiddle-macos .
# Linux binary
GOOS=linux GOARCH=amd64 go build -o markdowninthemiddle-linux .
# Windows binary
GOOS=windows GOARCH=amd64 go build -o markdowninthemiddle.exe .# Build local image
docker build -t markdowninthemiddle:latest .
# Build with specific Go version
docker build --build-arg GO_VERSION=1.23 -t markdowninthemiddle:latest .
# Multi-arch build
docker buildx build --platform linux/amd64,linux/arm64 -t markdowninthemiddle:latest .go test ./...go test -cover ./...
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.outgo test -run TestFilterMiddleware ./internal/filter/...go test -bench=. -benchmem ./...| Package | Version | Purpose |
|---|---|---|
github.com/go-chi/chi/v5 |
Latest | HTTP router & middleware |
github.com/spf13/cobra |
Latest | CLI framework |
github.com/spf13/viper |
Latest | Configuration |
github.com/JohannesKaufmann/html-to-markdown |
Latest | HTML conversion |
github.com/pkoukk/tiktoken-go |
Latest | Token counting |
github.com/chromedp/chromedp |
Latest | Browser automation |
golang.org/x/sync |
Latest | Semaphore |
go mod tidy # Update dependencies
go mod download # Download to cache
go mod verify # Verify checksumsPer Request:
- HTTP: <1MB
- chromedp: 20-50MB (depends on page complexity)
Reduce chromedp Memory:
transport:
chromedp:
pool_size: 3 # Reduce concurrent tabsHTTP Transport:
# Adjust idle connections
MITM_HTTP_MAX_IDLE_CONNS=100Avoid re-creating encoder:
// Good: create once, reuse
enc, _ := tiktoken.GetEncoding("cl100k_base")
// Cache 'enc' for multiple uses./markdowninthemiddle --config config.yml --log-level debug# Add to code
log.Printf("Pool state: %d/%d slots available", p.sem.Release(1), poolSize)# Via proxy with verbose logging
./markdowninthemiddle --log-level debug 2>&1 | tee proxy.log
# Monitor with tcpdump
sudo tcpdump -i lo0 -n "port 8080"openssl x509 -in ./certs/cert.pem -text -nooutProblem: "Failed to open new tab - no browser is open"
Solution:
- Verify Chrome is running:
curl http://localhost:9222/json/version - Check Docker container health:
docker compose ps - Increase startup timeout in
browser.go
Problem: Memory usage grows over time with chromedp
Solution:
- Reduce
pool_sizeto limit concurrent tabs - Ensure tab contexts are properly closed
- Monitor with:
docker stats markdowninthemiddle-proxy
Problem: HTML to Markdown conversion is slow
Causes:
- Very large HTML (>10MB) - increase
max_body_size - chromedp overhead - use HTTP transport for non-JS sites
- Network latency - enable caching
- Format:
gofmt(enforced) - Linting:
golangci-lint - Tests: Minimum 80% coverage
go fmt ./...
golangci-lint run ./...
go test -cover ./...- Create feature branch:
git checkout -b feature/my-feature - Add tests:
*_test.go - Document: Update README if user-facing
- Submit PR with description
- Go Documentation: https://golang.org/doc
- chromedp: https://github.com/chromedp/chromedp
- Chrome DevTools Protocol: https://chromedevtools.github.io/devtools-protocol/
- html-to-markdown: https://github.com/JohannesKaufmann/html-to-markdown
- TikToken: https://github.com/pkoukk/tiktoken-go