Skip to content

Add scripted-input variant of the database updater#88

Merged
oschwald merged 2 commits into
mainfrom
wstorey/scripted-input
Jun 30, 2026
Merged

Add scripted-input variant of the database updater#88
oschwald merged 2 commits into
mainfrom
wstorey/scripted-input

Conversation

@horgh

@horgh horgh commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

On Splunk Cloud Victoria, the geoipupdate_input modular input only runs on one search head cluster member, so other members never download the .mmdb files and the geoip command fails on them. Prior attempts to make the modular input run on every member (run_only_one in inputs.conf.spec and as a scheme argument) did not fix it.

In the Splunk community Slack, it was suggested trying a scripted input, which may run on every SHC member. As an easily-revertible experiment, disable the modular ://default instance and add a [script://] stanza pointing at a thin wrapper (geoipupdate_script.py) that reads a session key from stdin (provided via passAuth) and calls the shared run_database_update(). The core update logic in geoipupdate_input.py is unchanged.

This is unverified: Splunk's docs confirm the mechanics (passAuth -> stdin session key, python.required honored for [script://]) but do not state that scripted inputs run on every SHC member, so it is pending a customer test.

Summary by CodeRabbit

  • New Features
    • Added an alternate hourly update mechanism for GeoIP database updates in Splunk Cloud using a scripted-input approach.
    • Updated configuration to support cluster-friendly execution while keeping updates functional.
  • Bug Fixes
    • Disabled the default hourly update input by default during validation.
    • Improved behavior and logging when the required session input is missing or blank.
  • Documentation
    • Updated “Automatic Updates” guidance and added instructions for switching/reverting update modes.
  • Tests
    • Added coverage for scripted-input session handling and update invocation behavior.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds geoipupdate_script.py, a scripted-input wrapper that reads a session key from stdin and calls run_database_update(). The modular input default instance is disabled in inputs.conf and replaced with a script:// stanza. Tests and docs were updated for the new path.

Changes

Scripted Input Variant for GeoIP Updater

Layer / File(s) Summary
Scripted entry point and inputs wiring
geoip/package/bin/geoipupdate_script.py, geoip/package/default/inputs.conf
The new script reads a session key from stdin, logs and exits on empty input, calls run_database_update(session_key) on valid input, and is wired in inputs.conf alongside the disabled default modular input.
stdin handling tests
tests/geoipupdate_script_test.py
The test module sets up imports for geoipupdate_script and verifies the valid, empty, and whitespace stdin cases with mocked logging and update calls.
Docs and changelog updates
CLAUDE.md, CHANGELOG.md
CLAUDE.md updates the automatic update description and adds the scripted-input variant section; CHANGELOG.md adds the unreleased 1.1.3 entry.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • oschwald

Poem

🐇 I hop through stdin, key in tow,
A script now starts the update flow.
One stanza sleeps, one stanza sings,
And every search head does its things.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a scripted-input variant of the database updater.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch wstorey/scripted-input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a scripted-input variant (geoipupdate_script.py) as an alternative to the modular input to test whether it runs on all search head cluster members in Splunk Cloud. The modular input is disabled by default, and the scripted input is enabled in inputs.conf. Feedback suggests improving the portability of the scripted input configuration by using a relative path (./bin/geoipupdate_script.py) instead of a hardcoded absolute path, and updating the documentation in CLAUDE.md to reflect this change.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread geoip/package/default/inputs.conf Outdated
Comment thread CLAUDE.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 65-85: The new scripted-input notes conflict with the existing
Search Head Cluster guidance in CLAUDE.md, so update the “Automatic Updates”
section to match the current behavior and remove the claim that run_only_one =
false guarantees every SHC node downloads its own databases. Make the documented
SHC update behavior consistent with the scripted-input experiment description,
and keep the guidance in sync using the relevant CLAUDE.md sections about
automatic updates and the scripted input variant.

In `@tests/geoipupdate_script_test.py`:
- Around line 15-20: The import-time path/module patching in
geoipupdate_script_test.py is leaking the fake solnlib into other tests because
sys.path and sys.modules are mutated during collection. Move the setup into a
scoped fixture that uses monkeypatch.syspath_prepend() and monkeypatch.setitem()
for the mocked modules, then import or reload geoipupdate_script inside that
fixture so each test gets an isolated environment.
- Around line 51-103: The main stdin-handling tests for geoipupdate_script.main
currently verify update and logging behavior but do not assert the wrapper stays
silent on stdout. Update the tests around geoipupdate_script.main to use capsys
and assert no stdout output on both the successful session-key path and the
empty/whitespace stdin paths, while still checking run_database_update and the
logger calls remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: abdf0623-8b4c-4131-b44a-059f2a86ead5

📥 Commits

Reviewing files that changed from the base of the PR and between 2a42f85 and f70a104.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • CLAUDE.md
  • geoip/package/bin/geoipupdate_script.py
  • geoip/package/default/inputs.conf
  • tests/geoipupdate_script_test.py

Comment thread CLAUDE.md
Comment on lines +15 to +20
# Add the lib and bin directories to the path for testing
repo_root = Path(__file__).parent.parent
lib_dir = repo_root / "geoip" / "package" / "lib"
bin_dir = repo_root / "geoip" / "package" / "bin"
sys.path.insert(0, str(lib_dir))
sys.path.insert(0, str(bin_dir))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Keep the import-time mocks scoped to each test.

This module mutates sys.path and sys.modules during collection, so the fake solnlib can leak into unrelated tests after this file is imported. Move the path/module patching into a fixture with monkeypatch.syspath_prepend() / monkeypatch.setitem(...) and import or reload geoipupdate_script inside that fixture instead.

Suggested direction
+import importlib
+import pytest
-
-# Add the lib and bin directories to the path for testing
-repo_root = Path(__file__).parent.parent
-lib_dir = repo_root / "geoip" / "package" / "lib"
-bin_dir = repo_root / "geoip" / "package" / "bin"
-sys.path.insert(0, str(lib_dir))
-sys.path.insert(0, str(bin_dir))
+repo_root = Path(__file__).parent.parent
+lib_dir = repo_root / "geoip" / "package" / "lib"
+bin_dir = repo_root / "geoip" / "package" / "bin"
@@
-mock_solnlib = MagicMock()
-mock_solnlib.conf_manager = mock_conf_manager
-mock_solnlib.soln_exceptions = mock_soln_exceptions
-sys.modules["solnlib"] = mock_solnlib
-sys.modules["solnlib.conf_manager"] = mock_conf_manager
-sys.modules["solnlib.soln_exceptions"] = mock_soln_exceptions
-
-# Now we can import geoipupdate_script since solnlib is mocked
-import geoipupdate_script  # noqa: E402  # type: ignore[import-not-found]
+@pytest.fixture
+def geoipupdate_script_module(monkeypatch: MonkeyPatch):
+    mock_solnlib = MagicMock()
+    mock_solnlib.conf_manager = mock_conf_manager
+    mock_solnlib.soln_exceptions = mock_soln_exceptions
+    monkeypatch.syspath_prepend(str(lib_dir))
+    monkeypatch.syspath_prepend(str(bin_dir))
+    monkeypatch.setitem(sys.modules, "solnlib", mock_solnlib)
+    monkeypatch.setitem(sys.modules, "solnlib.conf_manager", mock_conf_manager)
+    monkeypatch.setitem(sys.modules, "solnlib.soln_exceptions", mock_soln_exceptions)
+    import geoipupdate_script
+    return importlib.reload(geoipupdate_script)

Also applies to: 40-48

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/geoipupdate_script_test.py` around lines 15 - 20, The import-time
path/module patching in geoipupdate_script_test.py is leaking the fake solnlib
into other tests because sys.path and sys.modules are mutated during collection.
Move the setup into a scoped fixture that uses monkeypatch.syspath_prepend() and
monkeypatch.setitem() for the mocked modules, then import or reload
geoipupdate_script inside that fixture so each test gets an isolated
environment.

Comment thread tests/geoipupdate_script_test.py
On Splunk Cloud Victoria, the geoipupdate_input modular input only runs on
one search head cluster member, so other members never download the .mmdb
files and the geoip command fails on them. Prior attempts to make the
modular input run on every member (run_only_one in inputs.conf.spec and as a
scheme argument) did not fix it.

In the Splunk community Slack, it was suggested trying a scripted input,
which may run on every SHC member. As an easily-revertible experiment,
disable the modular ://default instance and add a [script://] stanza
pointing at a thin wrapper (geoipupdate_script.py) that reads a session
key from stdin (provided via passAuth) and calls the shared
run_database_update(). The core update logic in geoipupdate_input.py is
unchanged.

This is unverified: Splunk's docs confirm the mechanics (passAuth -> stdin
session key, python.required honored for [script://]) but do not state that
scripted inputs run on every SHC member, so it is pending a customer test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@horgh horgh force-pushed the wstorey/scripted-input branch from f70a104 to 8b26cd0 Compare June 30, 2026 15:58
oschwald
oschwald previously approved these changes Jun 30, 2026

@oschwald oschwald left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me for a release to test out whether it helps. Claude had a few comments, but they seem intentional or inconsequential.

Comment thread CLAUDE.md

The `geoipupdate_input` modular input downloads and updates databases automatically:
- A default input (`geoipupdate_input://default`) is enabled out of the box (no UI, runs in background)
- A default input (`geoipupdate_input://default`) provides the update logic (no UI, runs in background)

@oschwald oschwald Jun 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code, on behalf of Greg:

This one-line softening ("provides the update logic") isn't enough — the rest of this Automatic Updates section still reads as present-tense fact while the modular input is now disabled = 1:

  • L55: "The geoipupdate_input modular input downloads and updates databases automatically" — it no longer does; the scripted wrapper does.
  • L60: "Default update interval is 3600 seconds" — the active path now runs at interval = 60.
  • L61: "run_only_one = false ... ensures each node in a SHC downloads its own databases" — this directly contradicts the new section's premise (that it only runs on one member, which is why the experiment exists).

Suggest adding a forward-pointer at the top of this section and qualifying these bullets as the modular input's intended behavior when enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can clean this up more if we keep this. I don't think we'll keep it though.

Comment thread geoip/package/default/inputs.conf Outdated
# The leading "./" resolves relative to this app's directory, so the path keeps
# working regardless of the installed app folder name (Splunk's documented form).
[script://./bin/geoipupdate_script.py]
interval = 60

@oschwald oschwald Jun 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code, on behalf of Greg:

interval = 60 is 60x the documented/modular value (3600s). pygeoipupdate does hash/up-to-date checks so it rarely re-downloads, but this still polls MaxMind every minute per SHC member. If 60 was chosen for fast feedback during the experiment, a one-line comment saying so would keep it from reading as a leftover from local testing — otherwise consider aligning to 3600.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this one I actually meant to leave at 3600.

Comment on lines +22 to +25
# For Splunk Free, we apparently must use "admin", otherwise we get no session
# key when running the input. For Splunk Enterprise (and I think Splunk Cloud),
# we can use splunk-system-user.
passAuth = splunk-system-user

@oschwald oschwald Jun 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code, on behalf of Greg:

This comment describes passAuth = admin but the stanza uses splunk-system-user, and it's doubly hedged ("apparently", "I think"). It documents an out-of-scope, unverified path (the experiment targets Splunk Cloud Victoria) — the kind of comment that rots. Suggest removing it, or rewording to a verified, de-hedged statement if the Free-vs-Enterprise distinction is worth keeping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not sure about exact behaviour unfortunately. The comment describes how I think it works based on slack searches. I think what we have is the best thing to try.

I didn't intend to commit this as 60. I had it 60 for testing.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@geoip/package/bin/geoipupdate_script.py`:
- Around line 24-25: The stdin parsing in geoipupdate_script.py is treating the
full passAuth payload as the session key, so update the input handling to
extract the token from the sessionKey= prefix before calling
run_database_update(). Locate the logic around the raw stdin read and
session_key assignment, and make sure only the token value is kept after
stripping whitespace and removing the sessionKey= marker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 18dad701-3a70-46a5-bf8b-8632a4c2b2e0

📥 Commits

Reviewing files that changed from the base of the PR and between f70a104 and 2a6228e.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • CLAUDE.md
  • geoip/package/bin/geoipupdate_script.py
  • geoip/package/default/inputs.conf
  • tests/geoipupdate_script_test.py

Comment on lines +24 to +25
raw = sys.stdin.read()
session_key = raw.strip()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Splunk scripted input passAuth stdin format sessionKey= prefix or raw token

💡 Result:

When you enable passAuth in your Splunk scripted input configuration, splunkd provides a session key to your script via stdin [1][2]. The exact format delivered is sessionKey=YOUR_TOKEN_HERE [3]. To use this token in your script, you must read the line from stdin and parse it to remove the sessionKey= prefix [3][1]. Common implementation patterns include: 1. Reading the stream: Since the token is delivered as the first (and often only) line, you can capture it using standard input reading methods [1]. 2. Parsing the prefix: Because the input arrives exactly as sessionKey=, your script should programmatically strip the sessionKey= string to extract the raw token [3]. A simple approach in Python is to use.replace('sessionKey=', '') or string slicing if you are certain of the format [1]. 3. Utilization: Once extracted, this raw session key can be used to authenticate requests to the Splunk REST API [1][2]. Note that this behavior applies to traditional scripted inputs [2]. If you are developing Modular Inputs, the configuration and session key are passed to your script via stdin in XML format, which is parsed differently [4][5][6]. In those cases, the session key is contained within a <session_key> XML tag, which should be parsed using an XML library rather than manual string manipulation [4].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the target file and locate related parsing logic.
git ls-files geoip/package/bin/geoipupdate_script.py
wc -l geoip/package/bin/geoipupdate_script.py
cat -n geoip/package/bin/geoipupdate_script.py | sed -n '1,220p'

# Search for any handling of sessionKey/passAuth in the repo.
rg -n "sessionKey|passAuth|stdin" geoip -S

Repository: maxmind/geoip-splunk

Length of output: 3098


Parse the sessionKey= prefix from stdin
passAuth sends sessionKey=<token> on stdin for scripted inputs, so raw.strip() passes the literal prefix through to run_database_update() and turns a valid token into an auth failure. Strip the sessionKey= prefix before using the value.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@geoip/package/bin/geoipupdate_script.py` around lines 24 - 25, The stdin
parsing in geoipupdate_script.py is treating the full passAuth payload as the
session key, so update the input handling to extract the token from the
sessionKey= prefix before calling run_database_update(). Locate the logic around
the raw stdin read and session_key assignment, and make sure only the token
value is kept after stripping whitespace and removing the sessionKey= marker.

@oschwald oschwald merged commit d69c898 into main Jun 30, 2026
10 checks passed
@oschwald oschwald deleted the wstorey/scripted-input branch June 30, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants