Add scripted-input variant of the database updater#88
Conversation
📝 WalkthroughWalkthroughAdds ChangesScripted Input Variant for GeoIP Updater
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a scripted-input variant (geoipupdate_script.py) as an alternative to the modular input to test whether it runs on all search head cluster members in Splunk Cloud. The modular input is disabled by default, and the scripted input is enabled in inputs.conf. Feedback suggests improving the portability of the scripted input configuration by using a relative path (./bin/geoipupdate_script.py) instead of a hardcoded absolute path, and updating the documentation in CLAUDE.md to reflect this change.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Around line 65-85: The new scripted-input notes conflict with the existing
Search Head Cluster guidance in CLAUDE.md, so update the “Automatic Updates”
section to match the current behavior and remove the claim that run_only_one =
false guarantees every SHC node downloads its own databases. Make the documented
SHC update behavior consistent with the scripted-input experiment description,
and keep the guidance in sync using the relevant CLAUDE.md sections about
automatic updates and the scripted input variant.
In `@tests/geoipupdate_script_test.py`:
- Around line 15-20: The import-time path/module patching in
geoipupdate_script_test.py is leaking the fake solnlib into other tests because
sys.path and sys.modules are mutated during collection. Move the setup into a
scoped fixture that uses monkeypatch.syspath_prepend() and monkeypatch.setitem()
for the mocked modules, then import or reload geoipupdate_script inside that
fixture so each test gets an isolated environment.
- Around line 51-103: The main stdin-handling tests for geoipupdate_script.main
currently verify update and logging behavior but do not assert the wrapper stays
silent on stdout. Update the tests around geoipupdate_script.main to use capsys
and assert no stdout output on both the successful session-key path and the
empty/whitespace stdin paths, while still checking run_database_update and the
logger calls remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: abdf0623-8b4c-4131-b44a-059f2a86ead5
📒 Files selected for processing (5)
CHANGELOG.mdCLAUDE.mdgeoip/package/bin/geoipupdate_script.pygeoip/package/default/inputs.conftests/geoipupdate_script_test.py
| # Add the lib and bin directories to the path for testing | ||
| repo_root = Path(__file__).parent.parent | ||
| lib_dir = repo_root / "geoip" / "package" / "lib" | ||
| bin_dir = repo_root / "geoip" / "package" / "bin" | ||
| sys.path.insert(0, str(lib_dir)) | ||
| sys.path.insert(0, str(bin_dir)) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Keep the import-time mocks scoped to each test.
This module mutates sys.path and sys.modules during collection, so the fake solnlib can leak into unrelated tests after this file is imported. Move the path/module patching into a fixture with monkeypatch.syspath_prepend() / monkeypatch.setitem(...) and import or reload geoipupdate_script inside that fixture instead.
Suggested direction
+import importlib
+import pytest
-
-# Add the lib and bin directories to the path for testing
-repo_root = Path(__file__).parent.parent
-lib_dir = repo_root / "geoip" / "package" / "lib"
-bin_dir = repo_root / "geoip" / "package" / "bin"
-sys.path.insert(0, str(lib_dir))
-sys.path.insert(0, str(bin_dir))
+repo_root = Path(__file__).parent.parent
+lib_dir = repo_root / "geoip" / "package" / "lib"
+bin_dir = repo_root / "geoip" / "package" / "bin"
@@
-mock_solnlib = MagicMock()
-mock_solnlib.conf_manager = mock_conf_manager
-mock_solnlib.soln_exceptions = mock_soln_exceptions
-sys.modules["solnlib"] = mock_solnlib
-sys.modules["solnlib.conf_manager"] = mock_conf_manager
-sys.modules["solnlib.soln_exceptions"] = mock_soln_exceptions
-
-# Now we can import geoipupdate_script since solnlib is mocked
-import geoipupdate_script # noqa: E402 # type: ignore[import-not-found]
+@pytest.fixture
+def geoipupdate_script_module(monkeypatch: MonkeyPatch):
+ mock_solnlib = MagicMock()
+ mock_solnlib.conf_manager = mock_conf_manager
+ mock_solnlib.soln_exceptions = mock_soln_exceptions
+ monkeypatch.syspath_prepend(str(lib_dir))
+ monkeypatch.syspath_prepend(str(bin_dir))
+ monkeypatch.setitem(sys.modules, "solnlib", mock_solnlib)
+ monkeypatch.setitem(sys.modules, "solnlib.conf_manager", mock_conf_manager)
+ monkeypatch.setitem(sys.modules, "solnlib.soln_exceptions", mock_soln_exceptions)
+ import geoipupdate_script
+ return importlib.reload(geoipupdate_script)Also applies to: 40-48
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/geoipupdate_script_test.py` around lines 15 - 20, The import-time
path/module patching in geoipupdate_script_test.py is leaking the fake solnlib
into other tests because sys.path and sys.modules are mutated during collection.
Move the setup into a scoped fixture that uses monkeypatch.syspath_prepend() and
monkeypatch.setitem() for the mocked modules, then import or reload
geoipupdate_script inside that fixture so each test gets an isolated
environment.
On Splunk Cloud Victoria, the geoipupdate_input modular input only runs on one search head cluster member, so other members never download the .mmdb files and the geoip command fails on them. Prior attempts to make the modular input run on every member (run_only_one in inputs.conf.spec and as a scheme argument) did not fix it. In the Splunk community Slack, it was suggested trying a scripted input, which may run on every SHC member. As an easily-revertible experiment, disable the modular ://default instance and add a [script://] stanza pointing at a thin wrapper (geoipupdate_script.py) that reads a session key from stdin (provided via passAuth) and calls the shared run_database_update(). The core update logic in geoipupdate_input.py is unchanged. This is unverified: Splunk's docs confirm the mechanics (passAuth -> stdin session key, python.required honored for [script://]) but do not state that scripted inputs run on every SHC member, so it is pending a customer test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f70a104 to
8b26cd0
Compare
oschwald
left a comment
There was a problem hiding this comment.
This looks fine to me for a release to test out whether it helps. Claude had a few comments, but they seem intentional or inconsequential.
|
|
||
| The `geoipupdate_input` modular input downloads and updates databases automatically: | ||
| - A default input (`geoipupdate_input://default`) is enabled out of the box (no UI, runs in background) | ||
| - A default input (`geoipupdate_input://default`) provides the update logic (no UI, runs in background) |
There was a problem hiding this comment.
Claude Code, on behalf of Greg:
This one-line softening ("provides the update logic") isn't enough — the rest of this Automatic Updates section still reads as present-tense fact while the modular input is now disabled = 1:
- L55: "The
geoipupdate_inputmodular input downloads and updates databases automatically" — it no longer does; the scripted wrapper does. - L60: "Default update interval is 3600 seconds" — the active path now runs at
interval = 60. - L61: "
run_only_one = false... ensures each node in a SHC downloads its own databases" — this directly contradicts the new section's premise (that it only runs on one member, which is why the experiment exists).
Suggest adding a forward-pointer at the top of this section and qualifying these bullets as the modular input's intended behavior when enabled.
There was a problem hiding this comment.
Yeah, we can clean this up more if we keep this. I don't think we'll keep it though.
| # The leading "./" resolves relative to this app's directory, so the path keeps | ||
| # working regardless of the installed app folder name (Splunk's documented form). | ||
| [script://./bin/geoipupdate_script.py] | ||
| interval = 60 |
There was a problem hiding this comment.
Claude Code, on behalf of Greg:
interval = 60 is 60x the documented/modular value (3600s). pygeoipupdate does hash/up-to-date checks so it rarely re-downloads, but this still polls MaxMind every minute per SHC member. If 60 was chosen for fast feedback during the experiment, a one-line comment saying so would keep it from reading as a leftover from local testing — otherwise consider aligning to 3600.
There was a problem hiding this comment.
Oh, this one I actually meant to leave at 3600.
| # For Splunk Free, we apparently must use "admin", otherwise we get no session | ||
| # key when running the input. For Splunk Enterprise (and I think Splunk Cloud), | ||
| # we can use splunk-system-user. | ||
| passAuth = splunk-system-user |
There was a problem hiding this comment.
Claude Code, on behalf of Greg:
This comment describes passAuth = admin but the stanza uses splunk-system-user, and it's doubly hedged ("apparently", "I think"). It documents an out-of-scope, unverified path (the experiment targets Splunk Cloud Victoria) — the kind of comment that rots. Suggest removing it, or rewording to a verified, de-hedged statement if the Free-vs-Enterprise distinction is worth keeping.
There was a problem hiding this comment.
Yeah, I am not sure about exact behaviour unfortunately. The comment describes how I think it works based on slack searches. I think what we have is the best thing to try.
I didn't intend to commit this as 60. I had it 60 for testing.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@geoip/package/bin/geoipupdate_script.py`:
- Around line 24-25: The stdin parsing in geoipupdate_script.py is treating the
full passAuth payload as the session key, so update the input handling to
extract the token from the sessionKey= prefix before calling
run_database_update(). Locate the logic around the raw stdin read and
session_key assignment, and make sure only the token value is kept after
stripping whitespace and removing the sessionKey= marker.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 18dad701-3a70-46a5-bf8b-8632a4c2b2e0
📒 Files selected for processing (5)
CHANGELOG.mdCLAUDE.mdgeoip/package/bin/geoipupdate_script.pygeoip/package/default/inputs.conftests/geoipupdate_script_test.py
| raw = sys.stdin.read() | ||
| session_key = raw.strip() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🌐 Web query:
Splunk scripted input passAuth stdin format sessionKey= prefix or raw token
💡 Result:
When you enable passAuth in your Splunk scripted input configuration, splunkd provides a session key to your script via stdin [1][2]. The exact format delivered is sessionKey=YOUR_TOKEN_HERE [3]. To use this token in your script, you must read the line from stdin and parse it to remove the sessionKey= prefix [3][1]. Common implementation patterns include: 1. Reading the stream: Since the token is delivered as the first (and often only) line, you can capture it using standard input reading methods [1]. 2. Parsing the prefix: Because the input arrives exactly as sessionKey=, your script should programmatically strip the sessionKey= string to extract the raw token [3]. A simple approach in Python is to use.replace('sessionKey=', '') or string slicing if you are certain of the format [1]. 3. Utilization: Once extracted, this raw session key can be used to authenticate requests to the Splunk REST API [1][2]. Note that this behavior applies to traditional scripted inputs [2]. If you are developing Modular Inputs, the configuration and session key are passed to your script via stdin in XML format, which is parsed differently [4][5][6]. In those cases, the session key is contained within a <session_key> XML tag, which should be parsed using an XML library rather than manual string manipulation [4].
Citations:
- 1: https://community.splunk.com/t5/Splunk-Dev/Scripted-Input-Python-SDK-passAuth-Not-Working/m-p/147796
- 2: https://community.splunk.com/t5/Alerting/Use-an-alert-to-trigger-scripted-input/m-p/98136
- 3: https://community.splunk.com/t5/Security/How-to-get-session-key-auth-token-for-mod-input-script/m-p/561575
- 4: https://help.splunk.com/en/splunk-cloud-platform/developing-views-and-apps-for-splunk-web/10.1.2507/modular-inputs/create-modular-inputs
- 5: https://help.splunk.com/en/splunk-enterprise/developing-views-and-apps-for-splunk-web/9.0/modular-inputs/create-modular-inputs
- 6: https://help.splunk.com/en/splunk-cloud-platform/developing-views-and-apps-for-splunk-web/10.3.2512/modular-inputs/modular-inputs-examples
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the target file and locate related parsing logic.
git ls-files geoip/package/bin/geoipupdate_script.py
wc -l geoip/package/bin/geoipupdate_script.py
cat -n geoip/package/bin/geoipupdate_script.py | sed -n '1,220p'
# Search for any handling of sessionKey/passAuth in the repo.
rg -n "sessionKey|passAuth|stdin" geoip -SRepository: maxmind/geoip-splunk
Length of output: 3098
Parse the sessionKey= prefix from stdin
passAuth sends sessionKey=<token> on stdin for scripted inputs, so raw.strip() passes the literal prefix through to run_database_update() and turns a valid token into an auth failure. Strip the sessionKey= prefix before using the value.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@geoip/package/bin/geoipupdate_script.py` around lines 24 - 25, The stdin
parsing in geoipupdate_script.py is treating the full passAuth payload as the
session key, so update the input handling to extract the token from the
sessionKey= prefix before calling run_database_update(). Locate the logic around
the raw stdin read and session_key assignment, and make sure only the token
value is kept after stripping whitespace and removing the sessionKey= marker.
On Splunk Cloud Victoria, the geoipupdate_input modular input only runs on one search head cluster member, so other members never download the .mmdb files and the geoip command fails on them. Prior attempts to make the modular input run on every member (run_only_one in inputs.conf.spec and as a scheme argument) did not fix it.
In the Splunk community Slack, it was suggested trying a scripted input, which may run on every SHC member. As an easily-revertible experiment, disable the modular ://default instance and add a [script://] stanza pointing at a thin wrapper (geoipupdate_script.py) that reads a session key from stdin (provided via passAuth) and calls the shared run_database_update(). The core update logic in geoipupdate_input.py is unchanged.
This is unverified: Splunk's docs confirm the mechanics (passAuth -> stdin session key, python.required honored for [script://]) but do not state that scripted inputs run on every SHC member, so it is pending a customer test.
Summary by CodeRabbit