Skip to content
Open
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### Added
- Gmail: add `gmail autoreply` to reply once to matching messages, label the thread for dedupe, and optionally archive/mark read. Includes docs and regression coverage for skip/reply flows.
- Raw API dumps: new `gog docs raw`, `gog sheets raw`, `gog slides raw`, and `gog drive raw` subcommands emit the canonical Google API response as JSON for lossless scripting and LLM consumption. Compact by default; `--pretty` for indented output. `sheets raw` gates cell-level data behind `--include-grid-data` (off by default) and warns on stderr when grid data or `developerMetadata` is present. `drive raw` defaults to `fields=*` and redacts a small set of capability/token-shaped fields (thumbnailLink, webContentLink, exportLinks, resourceKey, appProperties, properties, contentHints.thumbnail.image); passing `--fields` honors the request verbatim so users who name a field get it. See `docs/raw-audit.md` for the redaction rationale.
- Drive: `drive ls` and `drive get` now accept a `--fields` flag mapping directly to the Drive API field mask, so callers can request fields beyond the hard-coded default set (thumbnailLink, imageMediaMetadata, etc.). Closes #486.

## 0.12.0 - 2026-03-09

Expand Down
33 changes: 32 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -914,6 +914,15 @@ gog drive unshare <fileId> --permission-id <permissionId>

# Shared drives (Team Drives)
gog drive drives --max 100

# Request extra fields from the Drive API (closes #486)
gog drive ls --fields "files(id,name,thumbnailLink),nextPageToken"
gog drive get <fileId> --fields "id,name,thumbnailLink,imageMediaMetadata"

# Raw API dump (lossless JSON for scripting/LLMs)
gog drive raw <fileId> # fields=*, sensitive fields redacted by default
gog drive raw <fileId> --fields "id,name,thumbnailLink" # honors user-named fields verbatim
gog drive raw <fileId> --pretty
```

### Docs / Slides / Sheets
Expand All @@ -938,6 +947,8 @@ gog docs write <docId> --text "Rewrite one tab" --tab-id t.notes
gog docs write <docId> --file ./body.txt --append --pageless
gog docs find-replace <docId> "old" "new"
gog docs find-replace <docId> "old" "new" --tab-id t.notes
gog docs raw <docId> # Lossless JSON dump of Documents.Get (LLM/scripting)
gog docs raw <docId> --pretty

# Slides
gog slides info <presentationId>
Expand All @@ -950,6 +961,7 @@ gog slides list-slides <presentationId>
gog slides add-slide <presentationId> ./slide.png --notes "Speaker notes"
gog slides update-notes <presentationId> <slideId> --notes "Updated notes"
gog slides replace-slide <presentationId> <slideId> ./new-slide.png --notes "New notes"
gog slides raw <presentationId> # Lossless JSON dump of Presentations.Get

# Sheets
gog sheets copy <spreadsheetId> "My Sheet Copy"
Expand All @@ -969,7 +981,26 @@ gog sheets links <spreadsheetId> 'Sheet1!A1:B10'
gog sheets add-tab <spreadsheetId> <tabName>
gog sheets rename-tab <spreadsheetId> <oldName> <newName>
gog sheets delete-tab <spreadsheetId> <tabName> --force
```
gog sheets raw <spreadsheetId> # Lossless JSON dump of Spreadsheets.Get
gog sheets raw <spreadsheetId> --include-grid-data # Include cell-level data (off by default)
```

**Raw vs other read subcommands.** Use `raw` when you need the full
canonical Google API response as JSON (e.g. feeding a doc into an LLM,
or scripting against structural fields `cat`/`structure` drop). Other
read commands are lossier on purpose:

- `docs info`, `sheets metadata`, `slides info`, `drive get` → metadata only (cheap)
- `docs cat`, `docs structure` → plain text / simplified structure
- `docs export`, `sheets export`, `slides export`, `drive download` → converted file formats (pdf/docx/xlsx/pptx/md)
- `<group> raw` → full API response as JSON (verbose, lossless)

`drive raw` defaults to `fields=*` and redacts a small set of
capability/token-shaped fields (thumbnailLink, webContentLink,
exportLinks, resourceKey, appProperties, properties,
contentHints.thumbnail.image). When `--fields` is supplied the response
is returned verbatim — the user named the field, they get it. See
[`docs/raw-audit.md`](docs/raw-audit.md) for the redaction rationale.

### Contacts

Expand Down
101 changes: 101 additions & 0 deletions docs/raw-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# `gog <group> raw` — Sensitive Field Audit

This document records the security audit performed before shipping the
`gog <group> raw <id>` subcommands, which dump the canonical Google API
response as JSON for programmatic / LLM consumption.

## Redaction rule

`raw` applies field-level redaction **only when the user did not explicitly
name a field via `--fields`**. Rationale:

- The default (implicit `fields=*` for Drive, or no mask for the other
APIs) pulls in capability URLs and third-party–stashed metadata that
callers rarely want and can leak if piped into an LLM, shared in a bug
report, or committed to a repo.
- When a caller writes `--fields "id,name,thumbnailLink"`, they named
`thumbnailLink` deliberately. Redacting a user-named field would be
surprising and user-hostile.

Summary: **redact what the user didn't ask for; honor what they did.**

## Per-endpoint findings (PR #1 scope)

### 1. `docs.Documents.Get` — `gog docs raw`

REST ref: <https://developers.google.com/docs/api/reference/rest/v1/documents/get>
Go type: <https://pkg.go.dev/google.golang.org/api/docs/v1#Document>

| Field | Risk | Default handling |
|---|---|---|
| `inlineObjects.*.embeddedObject.imageProperties.contentUri` | Short-lived (~30 min) bearer-style authenticated image URL | **Ship as-is** |
| `inlineObjects.*.embeddedObject.imageProperties.sourceUri` | May reference private source URLs | **Ship as-is** |

**Why not redact:** the Docs API has no field mask, so the principled
"redact only what the user didn't name" rule has no escape hatch.
These image URIs are short-lived (~30 min) and require the caller's
auth anyway — substantially lower risk than Drive's `thumbnailLink`.
The lossless guarantee is valued more than the marginal hardening.

No credentials, tokens, or OAuth metadata in the response.

### 2. `sheets.Spreadsheets.Get` — `gog sheets raw`

REST ref: <https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get>
Go type: <https://pkg.go.dev/google.golang.org/api/sheets/v4#Spreadsheet>

| Field | Risk | Default handling |
|---|---|---|
| `developerMetadata` | Third-party apps may stash arbitrary KV including secrets | **Warn on stderr if present**, do not redact |
| `sheets[].data.rowData.values[].userEnteredValue.formulaValue` (only with `--include-grid-data`) | Formulas can embed API keys via `IMPORTRANGE`, hardcoded tokens in cells | **Warn on stderr when `--include-grid-data` is set**, do not redact |

`--include-grid-data` is **off by default** because grid payloads can be
multi-MB and are the primary leakage vector.

No redaction applied to Sheets output; warnings only. Redacting cell
content would defeat the purpose of a lossless dump.

### 3. `slides.Presentations.Get` — `gog slides raw`

REST ref: <https://developers.google.com/slides/api/reference/rest/v1/presentations/get>
Go type: <https://pkg.go.dev/google.golang.org/api/slides/v1#Presentation>

| Field | Risk | Default handling |
|---|---|---|
| `slides[].pageElements[].image.contentUrl` | Short-lived authenticated image URL (same class as Docs `contentUri`) | **Ship as-is** |
| `slides[].pageElements[].image.sourceUrl` | Possibly private origin URL | **Ship as-is** |
| `slides[].pageElements[].video.url` | Drive video refs may carry signed access | **Ship as-is** |

Same rationale as Docs: no field mask, short-lived URLs, auth-gated,
lower risk than Drive. Lossless guarantee preferred.

### 4. `drive.Files.Get` with `fields=*` — `gog drive raw` *(highest risk)*

REST ref: <https://developers.google.com/drive/api/reference/rest/v3/files/get>
Go type: <https://pkg.go.dev/google.golang.org/api/drive/v3#File>

| Field | Risk | Default handling |
|---|---|---|
| `thumbnailLink` | Time-limited signed URL that bypasses normal auth for ~hours. Classic leak vector. | Redact |
| `webContentLink` | Direct download URL; capability URL | Redact |
| `exportLinks` | Per-MIME authenticated export URLs | Redact |
| `resourceKey` | Capability token for link-shared files; effectively a shared secret | Redact |
| `appProperties` | Arbitrary app-stashed KV; apps commonly misuse for secrets | Redact |
| `properties` | Public custom properties, still frequently (mis)used for tokens | Redact |
| `contentHints.thumbnail.image` | Base64 thumbnail bytes; large and unnecessary | Redact |
| `permissions[].emailAddress`, `owners[].emailAddress`, `sharingUser`, `lastModifyingUser`, `trashingUser` | Non-collaborator emails (PII) when `fields=*` enumerates full ACL | Not redacted — caller already has access to the file; enumeration is a conscious `--fields` choice |

**Reminder:** all of the above are redacted **only when `--fields` is not
set**. Passing `--fields "id,name,thumbnailLink"` returns `thumbnailLink`
verbatim — the user named it.

## Cross-cutting observations

- Google APIs never return OAuth access tokens, refresh tokens, or client
secrets in resource responses. The risk is capability URLs and
app-stashed custom metadata, not credential disclosure in the API
contract itself.
- `gog drive raw` is the most dangerous command; the others are modest in
comparison.
- PR #2 will extend this audit to gmail / calendar / people / contacts /
tasks / forms before those commands ship.
38 changes: 38 additions & 0 deletions internal/cmd/docs.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,44 @@ type DocsCmd struct {
Sed DocsSedCmd `cmd:"" name:"sed" help:"Regex find/replace (sed-style: s/pattern/replacement/g)"`
Clear DocsClearCmd `cmd:"" name:"clear" help:"Clear all content from a Google Doc"`
Structure DocsStructureCmd `cmd:"" name:"structure" aliases:"struct" help:"Show document structure with numbered paragraphs"`
Raw DocsRawCmd `cmd:"" name:"raw" help:"Dump raw Google Docs API response as JSON (Documents.Get; lossless; for scripting and LLM consumption)"`
}

// DocsRawCmd dumps the full Documents.Get response as JSON, with no Fields
// restriction. Intended for programmatic / LLM consumption where the caller
// wants the canonical Google Docs API tree (tables, suggestions, per-run
// styling, list nesting, named ranges, inline objects) that `info` drops.
//
// REST reference: https://developers.google.com/docs/api/reference/rest/v1/documents/get
// Go type: https://pkg.go.dev/google.golang.org/api/docs/v1#Document
type DocsRawCmd struct {
DocID string `arg:"" name:"docId" help:"Doc ID"`
Pretty bool `name:"pretty" help:"Pretty-print JSON (default: compact single-line)"`
}

func (c *DocsRawCmd) Run(ctx context.Context, flags *RootFlags) error {
id := strings.TrimSpace(c.DocID)
if id == "" {
return usage("empty docId")
}

svc, err := requireDocsService(ctx, flags)
if err != nil {
return err
}

doc, err := svc.Documents.Get(id).Context(ctx).Do()
if err != nil {
if isDocsNotFound(err) {
return fmt.Errorf("doc not found or not a Google Doc (id=%s)", id)
}
return err
}
if doc == nil {
return errors.New("doc not found")
}

return outfmt.WriteRaw(ctx, os.Stdout, doc, outfmt.RawOptions{Pretty: c.Pretty})
}

type DocsExportCmd struct {
Expand Down
172 changes: 172 additions & 0 deletions internal/cmd/docs_raw_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
package cmd

import (
"context"
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"strings"
"testing"

"google.golang.org/api/docs/v1"
"google.golang.org/api/option"

"github.com/steipete/gogcli/internal/ui"
)

// fullDocResponse returns a richer Document payload than DocsInfoCmd would
// ever request (DocsInfoCmd restricts via Fields). This proves `raw` drops
// that restriction and exposes the full API tree.
func fullDocResponse(id string) map[string]any {
return map[string]any{
"documentId": id,
"title": "Full Doc",
"revisionId": "rev1",
"body": map[string]any{
"content": []any{
map[string]any{
"startIndex": 1,
"endIndex": 10,
"paragraph": map[string]any{
"elements": []any{
map[string]any{
"textRun": map[string]any{
"content": "hello world\n",
},
},
},
},
},
},
},
"namedStyles": map[string]any{
"styles": []any{map[string]any{"namedStyleType": "NORMAL_TEXT"}},
},
}
}

// newDocsRawTestServer builds a test Docs server; if status != 0 it returns
// that status instead of a successful response.
func newDocsRawTestServer(t *testing.T, status int, body map[string]any) *httptest.Server {
t.Helper()
return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !strings.HasPrefix(r.URL.Path, "/v1/documents/") || r.Method != http.MethodGet {
http.NotFound(w, r)
return
}
if status != 0 {
w.WriteHeader(status)
_ = json.NewEncoder(w).Encode(map[string]any{
"error": map[string]any{"code": status, "message": "mock error"},
})
return
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(body)
}))
}

func installMockDocsService(t *testing.T, srv *httptest.Server) {
t.Helper()
orig := newDocsService
t.Cleanup(func() { newDocsService = orig })

docSvc, err := docs.NewService(context.Background(),
option.WithoutAuthentication(),
option.WithHTTPClient(srv.Client()),
option.WithEndpoint(srv.URL+"/"),
)
if err != nil {
t.Fatalf("NewService: %v", err)
}
newDocsService = func(context.Context, string) (*docs.Service, error) { return docSvc, nil }
}

func rawTestContext(t *testing.T) context.Context {
t.Helper()
u, err := ui.New(ui.Options{Stdout: io.Discard, Stderr: io.Discard, Color: "never"})
if err != nil {
t.Fatalf("ui.New: %v", err)
}
return ui.WithUI(context.Background(), u)
}

func TestDocsRaw_HappyPath(t *testing.T) {
srv := newDocsRawTestServer(t, 0, fullDocResponse("doc1"))
defer srv.Close()
installMockDocsService(t, srv)

ctx := rawTestContext(t)
flags := &RootFlags{Account: "a@b.com"}

out := captureStdout(t, func() {
cmd := &DocsRawCmd{}
if err := runKong(t, cmd, []string{"doc1"}, ctx, flags); err != nil {
t.Fatalf("run: %v", err)
}
})

var got map[string]any
if err := json.Unmarshal([]byte(out), &got); err != nil {
t.Fatalf("output is not valid JSON: %v\nraw: %s", err, out)
}
// Bare struct: top-level keys must be Document fields, not a wrapper.
if got["documentId"] != "doc1" {
t.Fatalf("expected documentId=doc1, got: %v", got["documentId"])
}
// The whole point of raw: body.content must be present (info -j drops it).
body, ok := got["body"].(map[string]any)
if !ok {
t.Fatalf("expected body object in output, got: %v", got["body"])
}
if _, ok := body["content"]; !ok {
t.Fatalf("expected body.content in raw output")
}
}

func TestDocsRaw_APIError(t *testing.T) {
srv := newDocsRawTestServer(t, http.StatusInternalServerError, nil)
defer srv.Close()
installMockDocsService(t, srv)

ctx := rawTestContext(t)
flags := &RootFlags{Account: "a@b.com"}

_ = captureStdout(t, func() {
cmd := &DocsRawCmd{}
err := runKong(t, cmd, []string{"doc1"}, ctx, flags)
if err == nil {
t.Fatalf("expected error on 500, got nil")
}
})
}

func TestDocsRaw_NotFound(t *testing.T) {
srv := newDocsRawTestServer(t, http.StatusNotFound, nil)
defer srv.Close()
installMockDocsService(t, srv)

ctx := rawTestContext(t)
flags := &RootFlags{Account: "a@b.com"}

_ = captureStdout(t, func() {
cmd := &DocsRawCmd{}
err := runKong(t, cmd, []string{"doc1"}, ctx, flags)
if err == nil {
t.Fatalf("expected error on 404")
}
if !strings.Contains(err.Error(), "not found") {
t.Fatalf("expected 'not found' in error, got: %v", err)
}
})
}

func TestDocsRaw_EmptyDocID(t *testing.T) {
ctx := rawTestContext(t)
flags := &RootFlags{Account: "a@b.com"}
err := (&DocsRawCmd{}).Run(ctx, flags)
if err == nil {
t.Fatalf("expected error on empty docId")
}
}
Loading