Releases: domainaware/parsedmarc
10.1.1
10.1.0
Changes
- Graceful shutdown on
SIGTERM/SIGINT. PreviouslySIGTERM(sent bysystemctl stop,docker stop, and Kubernetes pod termination) killed the process mid-batch, tearing output writes and silently dropping Kafka producer buffers. parsedmarc now sets a shutdown flag that is polled at safe boundaries: the one-shot CLI checks it between batches, and the watcher passes it asconfig_reloadingso the mailbox backend — including the IMAP IDLE loop — returns once the current batch is processed. Either way the current batch and its output writes finish before the process exits. Ctrl-C is a double-tap: the first press is graceful, the second short-circuits toos._exit(130). This requires theconfig_reloading-aware IMAP IDLE loop from the pinnedmailsuitefork; with a stockmailsuitethe IDLE watcher cannot be interrupted cleanly. (PR #794)- Output clients are now closed on every exit path via
atexitplus a trailing close at the end of_main(), fixing a long-standing leak where one-shot CLI runs and graceful shutdowns never flushed Kafka / closed Elasticsearch / S3 / etc. clients. - Example systemd unit in
docs/source/usage.mdnow setsKillSignal=SIGTERMandTimeoutStopSec=60so systemd waits long enough for the watcher to drain (keep it abovemailbox_check_timeout).
- Output clients are now closed on every exit path via
- Switch the Kafka client dependency from
kafka-python-ngback tokafka-python>=2.3.2(#795).kafka-python-ngwas a fork created whilekafka-pythonwas unmaintained; upstreamkafka-pythonis active again, and the now-archived fork is vulnerable to CVE-2026-10142 and CVE-2026-10143, both fixed inkafka-python2.3.2. Both packages install the samekafkamodule, so if you are upgrading an existing environment in place withpip, runpip uninstall kafka-python-ngbefore upgrading parsedmarc so the two distributions don't conflict with each other's files.- parsedmarc is compatible with both
kafka-python2.3.2+ and 3.x:kafka-python3.0 removed theNoBrokersAvailableexception (a failed bootstrap now raisesKafkaTimeoutError), and parsedmarc handles whichever the installed version provides.
- parsedmarc is compatible with both
- The whole codebase (library and tests) now passes
pyrightwith zero errors and warnings, and CI enforces this (plusruff format --check) on every push and pull request. Pyright is configured inpyproject.toml([tool.pyright]) and pinned in the[build]extra. The fixes are annotation-level only (Optionalparameters, TypedDict-aware signatures on the syslog/GELF save methods,TYPE_CHECKING-aware optional imports forpsycopgand thekafka-python2.x/3.x bootstrap-error fallback) — runtime behavior is unchanged apart from theSyslogClientfix below. Builds on the class-body alias declarations from #797.
Bug fixes
SyslogClient: constructing a TCP/TLS client withretry_attempts< 1 now raisesValueErrorinstead of silently skipping the connection loop and registering a broken (None) log handler.
10.0.4
Docker
- PostgreSQL backend in the prebuilt image (#792): the image now installs
parsedmarc[postgresql], so the optional PostgreSQL output backend (psycopg) works out of the box in the container without a separatepip install. arm64images (#789): the release workflow now builds and publishes a multi-arch manifest forlinux/amd64andlinux/arm64, soghcr.io/domainaware/parsedmarcruns natively on 64-bit ARM hosts such as a Raspberry Pi. All compiled dependencies (psycopg[binary],lxml,maxminddb,cryptography) ship prebuiltaarch64wheels, so the build adds no source-compilation step.
10.0.3
This release also includes the changes from 10.0.1 and 10.0.2, which were never published as separate GitHub releases.
10.0.3
Bug fixes
- Fix
Reply-To(andDelivered-To) addresses being dropped from failure-report samples.parse_email()looked up mailparser's underscoredreply_to/delivered_tokeys, butmail_jsonnames those headersreply-to/delivered-to, so the lookup always missed andparsed_sample["reply_to"]was always[]regardless of the message. Failure samples now carry their parsed Reply-To addresses through to JSON/CSV output and the Elasticsearch/OpenSearch nestedsample.reply_tofield.
Dashboard fixes
All failure (RUF) dashboards now render every displayed address (From, To, Reply-To) the same way: Display Name <addr>, or the bare address when there is no display name. The format is assembled at query time from fields (display_name / address) that already exist on previously-indexed reports, so the panels work on historical data, not only on reports stored after upgrading — with one unavoidable exception: a report's Reply-To only appears for reports parsed by 10.0.3 or later. Earlier versions discarded it at parse time (the bug above), so it is absent from older stored reports; recovering it requires re-parsing the original samples.
- Splunk failure dashboard: the email-samples panel showed empty
fromandreply_tocolumns — it renamedparsed_sample.headers.from{}{}/parsed_sample.headers.reply-to{}{}, which are mis-cased (the header keys areFrom/Reply-To) and array-of-array shaped. The panel now buildsfromandreply_towith anevalthat coalescesdisplay_name <address>down to the bareaddresswhen there is no display name. (A multi-addressReply-Tofalls back to addresses-only — a Splunk multi-value-rendering limitation, not a data-loss one.) - OpenSearch failure dashboard: the column labelled
reply_toaggregatedsample.headers.in-reply-to.keyword— theIn-Reply-Tothreading header, not the Reply-To address. It now aggregatessample.headers.reply-to.keyword, and that field was added to thedmarc_f*index pattern. To support it, the Elasticsearch/OpenSearch failure writer now flattens theReply-Toheader into a display string onsample.headers["reply-to"], mirroring the existingFrom/Tohandling. (Re-import the dashboards, or refresh thedmarc_f*index pattern, to pick up the new field.) - Grafana (Elasticsearch) dashboard: the Failure Samples panel already read
sample.headers.reply-to.keyword, but that field previously held the raw[[name, address]]array (split into separate name/address terms). The failure-writer flattening above makes the existingReplyTocolumn render a cleanName <address>string — no dashboard change required. - Grafana (PostgreSQL) dashboard: the Failure Reports panel did not surface the message
Fromheader orReply-Toat all (it showed only the envelopeMail From/Rcpt To). AddedFrom(fromsample_from) andReply To(aggregated fromdmarc_failure_sample_address) columns.
10.0.2
Changes
-
Bump the
mailsuiterequirement to>=2.2.1, which raises the transitivemail-parserfloor to>=4.2.1. This pulls in two upstream fixes:mail-parser4.2.1 stops returning a phantom('', '')entry for absent address headers, so parsedmarc no longer indexes an emptyCc/Bccaddress ({address: ""}) for every DMARC failure-report sample in Elasticsearch/OpenSearch — and no longer emits it in JSON, S3, or Kafka output.mail-parser4.2.1 also adopts the stricter address parsing that hardens against CVE-2023-27043 — a Pythonemail-module flaw where an RFC 2822 header containing a special character has the wrong portion identified as the addr-spec, which can let a crafted address bypass email-domain verification.
(The
Reply-Toparsing for failure samples and the failure dashboards are tracked separately.)
10.0.1
Changes
- Bump
mailsuiterequirement to>=2.2.0to fix an upstreamReply-Toheader parsing bug for failure samples
10.0.0
Enhancements
Support for RFC 9989 / RFC 9990 / RFC 9991 reports
Adds parsing support for the final DMARC specification (RFC 9989), the new aggregate-report schema (RFC 9990), and the new failure-report format (RFC 9991), while preserving full RFC 7489 / RFC 6591 backward compatibility.
New aggregate-report fields surfaced from the RFC 9990 XSD — added to types, parsing, CSV output, and Elasticsearch/OpenSearch mappings:
np— non-existent subdomain policy (none/quarantine/reject)testing— testing mode flag (n/y); reports whether the published DMARC record setst=y. It is a new field, not a replacement forpct; thepctmechanism was removed entirely by RFC 9989 Appendix A.6 with no per-message replacement.discovery_method— policy discovery method (psl/treewalk)generator— report generator software identifier, inreport_metadatahuman_result— optional descriptive text on DKIM/SPF auth results (langAttrString; a possiblelangattribute is automatically unwrapped)xml_namespace— the XML namespace declared on the<feedback>root, if any. RFC 9990 reports declareurn:ietf:params:xml:ns:dmarc-2.0.
pct is no longer present in RFC 9990's PolicyPublishedType and parses as None when absent. fo is still part of RFC 9990 and is preserved when set; it parses as None only when the reporter omits it.
The parser detects an RFC 9990 report from the dmarc-2.0 XML namespace or the presence of any RFC 9990-only field, so namespaceless reports that follow the RFC 9990 shape still receive RFC 9990-aware validation warnings (missing required DKIM selector, removed-in-RFC-9990 policy-override types forwarded / sampled_out). RFC 9990 also added policy_test_mode to the policy-override enumeration; it is parsed and stored unchanged.
For failure reports (RFC 9991), Identity-Alignment and Auth-Failure are split on CFWS-aware commas (whitespace is stripped from each token, per the RFC 9991 ABNF) and a warning is logged when either REQUIRED field is missing.
Several elements that became langAttrString in RFC 9990 (extra_contact_info, error, comment, human_result) are now safely unwrapped when the reporter sends them with a lang attribute.
Backwards compatibility to RFC 7489 is maintained.
PostgreSQL storage backend
New optional PostgreSQL output backend as a lighter-weight alternative to Elasticsearch/OpenSearch, configured via a [postgresql] section (host/port/user/password/database or a libpq connection_string), or equivalently through PARSEDMARC_POSTGRESQL_* environment variables and their _FILE Docker-secret variants like every other backend. Tables are created automatically on first run, and the schema captures the RFC 9990 aggregate fields (np, testing, discovery_method, generator, xml_namespace, and per-result human_result). A Grafana dashboard (dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json) is included. Aggregate and SMTP-TLS reports are de-duplicated via ON CONFLICT; failure reports via an arrival-date / From / To / Subject check mirroring the Elasticsearch backend.
The backend is opt-in: install it with pip install parsedmarc[postgresql] (it pulls in psycopg). It is not a mandatory dependency because the prebuilt psycopg binary wheels are not available for every platform.
Docker-secret support via _FILE env vars
Any PARSEDMARC_{SECTION}_{KEY} environment variable can now also be supplied via a file by appending _FILE to its name (e.g. PARSEDMARC_IMAP_PASSWORD_FILE=/run/secrets/imap_password). The file's contents (with trailing CR/LF stripped) are used as the value. This is the same convention used by the official Postgres, MariaDB, and Redis container images, so credentials no longer have to appear in plain environment: blocks where docker inspect, container logs, and /proc/<pid>/environ would expose them.
When both the direct var and its _FILE companion are set, the file wins. A missing or unreadable file raises ConfigurationError rather than silently falling back to an empty value. The four pre-existing *_file config keys ([general] log_file, [msgraph] token_file, [gmail_api] credentials_file, [gmail_api] token_file) keep their direct-path semantics; wrap them in a Docker secret by doubling the suffix (PARSEDMARC_GMAIL_API_CREDENTIALS_FILE_FILE).
Elastic Cloud Serverless compatibility
New [elasticsearch] serverless config flag (env var PARSEDMARC_ELASTICSEARCH_SERVERLESS). Elastic Cloud Serverless manages sharding and replication itself and rejects the number_of_shards / number_of_replicas index settings with HTTP 400 — previously every write into a Serverless project failed at index-creation time. With the flag set, create_indexes strips those two keys from the settings sent to Elasticsearch and passes any other settings (e.g. refresh_interval) through unchanged. Non-Serverless deployments are unaffected.
Bug fixes
save_smtp_tls_report_to_s3was completely broken.parsedmarc/s3.py:save_report_to_s3unconditionally readreport["report_metadata"]when assembling S3 object metadata, but SMTP TLS reports are flat per RFC 8460 §4.3 — they have noreport_metadatasub-object — andparse_smtp_tls_report_jsoncorrectly storesbegin_dateas the raw ISO-8601 string from the report. The S3 path branch also assumedbegin_datewas adatetimeand did.year/.month/.dayon it. The CLI's surroundingtry/exceptsilently swallowed the resultingKeyError, so every SMTP-TLS report quietly failed to upload to S3 in production. Both issues are fixed: SMTP-TLS metadata is now built from the flat report fields directly, and the date is normalized viahuman_timestamp_to_datetime.append_jsoncorrupted JSON output files on the second write. The original implementation opened files in"a+"mode, thenseek()ed backwards to overwrite the trailing]with,\nbefore appending more elements. Python's docs are explicit: on POSIX, writes in"a"/"a+"mode always go to EOF regardless of seek position. The result was that every second call onto an existing file produced[...]\n],\n[...]-style corrupted output instead of a single merged JSON array. Anyone running parsedmarc in watch mode with JSON output enabled hadaggregate.json/failure.json/smtp_tls.jsonquietly turning into invalid JSON after the first overlap. Replaced with a read-merge-write pattern: load the existing array (if any), append the new elements, rewrite the whole file.append_csvwas not affected — it doesn't seek backwards.- Removed redundant try/except in
parsedmarc/webhook.py.save_aggregate_report_to_webhook/save_failure_report_to_webhook/save_smtp_tls_report_to_webhookeach wrappedself._send_to_webhook(...)in a try/except, but_send_to_webhookalready catches everyExceptionitself, so the outer except blocks were unreachable dead code. - Report files whose names contain glob metacharacters were silently skipped. The CLI expanded every file argument with
glob()(parsedmarc/cli.py), which interprets[,],*, and?as pattern syntax (see theglobdocs). A literal path such as[Netease DMARC Failure Report] Rent Reminder.eml— the bracketed shape many providers use for emailed failure reports — was treated as a character class, matched nothing, and was dropped before reaching the parser, with no error. File arguments that already exist on disk are now taken literally; only non-existent paths are treated as glob patterns, so shell-style wildcards (samples/*.xml) still expand. - OpenSearch Dashboards reported a mapping conflict on the aggregate index pattern's
org_emailfield. The shippeddashboards/opensearch/opensearch_dashboards.ndjsonfroze a cached field-list snapshot in whichorg_emailwas atext/objectconflict, alongside leftoverorg_email.#textandorg_email.#text.keywordsubfields — artifacts of a cluster that had once indexed alangAttrStringemaildict ({"#text": …, "@lang": …}) before the parser unwrapped it.org_emailis mapped asText()and the parser now unwraps a dictemailto a plain string, so live data is consistent; cleared the stale conflict and the two artifact subfields from the index pattern, leavingorg_email(text) andorg_email.keywordso importers no longer see the warning. dashboard-dev-bootstrap.shimported the OpenSearch Dashboards saved objects into the wrong tenant. The script sentsecuritytenant: global_tenant, but the OpenSearch security plugin reads that header as a tenant name, andglobal_tenantis a sample custom tenant shipped in the security demo config — not the shared Global tenant, whose token is the literalglobal. The import succeeded into a separateglobal_tenanttenant (its own.kibana_<hash>_globaltenant_1index), so the dashboards were invisible to anyone viewing the Global tenant in OpenSearch Dashboards. Changed the defaultOSD_TENANTtoglobal. (An empty/omittedsecuritytenantheader is not equivalent — it falls back to the user's configured default tenant, not Global.) This affects the contributor dev stack only, not the shipped dashboards.
Breaking changes
Forensic reports have been renamed to failure reports
Forensic reports have been renamed to failure reports throughout the project to reflect the proper naming of the reports since RFC 7489.
- Core:
types.py,__init__.py—ForensicReport→FailureReport,parse_forensic_report→parse_failure_report, report type"failure" - Output modules:
elastic.py,opensearch.py,splunk.py,kafkaclient.py,syslog.py,gelf.py,webhook.py,loganalytics.py,s3.py - CLI:
cli.py— args, config keys, index names (`dmarc_fail...
9.11.2
Changes
base_reverse_dns_types.txtremoved;sortlists.pynow reads the authoritativetypelist directly fromparsedmarc/resources/maps/README.md. The README's industry list (between new<!-- types-list:start -->/<!-- types-list:end -->HTML-comment markers) is now the single source of truth, eliminating the drift risk between the data file and the documented list. Before validating the map,sortlists.pyalso normalizes the README block in place: trims whitespace, deduplicates case-insensitively (errors on case-conflicting entries), and sorts entries alphabetically — so adding a new type is just inserting a- New Typeline anywhere inside the markers. Also fixes a pre-existing typo in the precedence rules where rule 4 saidWeb Hostingbut the canonical type used in 4,176 map rows isWeb Host.- Maintenance tooling no longer ships in the wheel/sdist. The Python scripts under
parsedmarc/resources/maps/(collect_domain_info.py,classify_unknown_domains.py,detect_psl_overrides.py,detect_rebrands.py,sortlists.py, plus the previously-already-excludedfind_bad_utf8.pyandfind_unknown_base_reverse_dns.py) are maintainer-only batch tooling, not parsedmarc runtime code. They have always been in the repository for convenience but were unnecessarily included in distributions, pulling reviewer attention and contributing nothing to end-user functionality. The build now excludes any.pyfile underparsedmarc/resources/maps/whose name doesn't start with an underscore via a single glob pattern (parsedmarc/resources/maps/[!_]*.py), so future maintainer scripts added to that directory are excluded automatically while__init__.pycontinues to ship. The directory's__init__.pyand the runtime data files (base_reverse_dns_map.csv,known_unknown_base_reverse_dns.txt,psl_overrides.txt) continue to ship — they're loaded at runtime viaimportlib.resources.files(parsedmarc.resources.maps).
9.11.1
9.11.0
Changes
- Mailbox backends now live in
mailsuite>=2.0.0. TheIMAPConnection,MSGraphConnection,GmailConnection,MaildirConnection, andMailboxConnectionimplementations were extracted into mailsuite 2.0.0 so other projects can reuse the same provider-agnostic interface. parsedmarc'sparsedmarc.mailpackage is now a thin re-export ofmailsuite.mailbox; existing imports (from parsedmarc.mail import IMAPConnection, etc.) continue to work unchanged. The CLI passestoken_cache_name="parsedmarc"and forwards the existing[msgraph] graph_urlconfig knob so cachedAuthenticationRecords and tokens carry over without re-prompting. - MSGraph backend rewritten on
msgraph-sdk(kiota-based). mailsuite 2.0 replaces the retiredmsgraph-core==0.2.2REST wrapper with the supportedmsgraph-sdkclient. End-user behavior is unchanged. - Direct dependencies on
msgraph-core,imapclient,google-api-core,google-api-python-client,google-auth-httplib2,google-auth-oauthlib, andgoogle-authremoved. They are now installed transitively viamailsuite[gmail,msgraph]>=2.0.0, which is included as a non-optional dependency so Gmail and Microsoft Graph support remain available out of the box.
9.10.3
Fixed
- Bundled OSD aggregate dashboard reported source-row counts as message volume. Pies, tables, and the choropleth aggregated with
countinstead ofsum(message_count), so panels titled "Message volume…", "Reporting organizations", etc. counted distinct sources rather than emails. Bug present since the dashboard shipped in 9.4.0. Line-chart timeseries, SMTP TLS, and forensic panels were already correct. - Splunk aggregate "Map of message sources by country" widget had the same
count-instead-of-sum(message_count)bug. - Splunk forensic-samples table dropped events with null
From/To/Subjectbecause the base search required those fields to exist (field=*). Replaced with a null-tolerant filter pattern. - Splunk SMTP TLS Failure details panel returned no rows; Splunk doesn't evaluate
field>0against multivalued JSON-array paths at search time. Switched to a presence filter plus post-statswhere failed_sessions>0.
Changes
- Aligned the Splunk dashboards with the OSD source-of-truth: new "Message sources by Autonomous System" panel; added missing
dkim_alignedcolumn to DKIM details; green/red colors fortrue/falseon alignment pies and the DMARC-passage timechart; forensic dashboard simplified to OSD's two-panel layout (markdown + samples table);policy_typebucket added to SMTP TLS Domains; minor column / title alignments throughout.
Upgrade notes
Action required — re-import the dashboards. Stored saved objects don't auto-update on parsedmarc upgrade.
- OSD: Stack Management → Saved Objects → Import the new
dashboards/opensearch/opensearch_dashboards.ndjson. Switch the import mode from the default "Create new objects with unique IDs" to "Check for existing objects" and enable "Automatically overwrite conflicts". The default mode would import the corrected viz under fresh UUIDs and leave the buggy originals in place, so the dashboards would keep rendering the wrong numbers. - Splunk: paste each XML in
dashboards/splunk/into the corresponding dashboard's Source editor.
9.10.2
Fixed
MaildirConnection.fetch_message()now marks messages as read after reading them (sets theSflag and moves the file fromnew/tocur/), unless--testis in effect. Previously, a message was processed but its on-disk maildir state was unchanged, so an MUA scanning the same maildir kept showing it as unread. Mirrors the existingmark_read=not testpattern used forMSGraphConnection.get_ip_address_info()no longer caches weak-fallback attributions (no PTR + no ASN-domain map match → rawas_nameused assource_name,source_typeleft null).get_reverse_dns()swallows everyDNSExceptionasNone, so a transient PTR lookup failure (timeout, SERVFAIL, socket error) is indistinguishable from a genuine no-PTR case at that layer — caching the weak result would poison the 4-hour cache with a misattribution that persisted even after the PTR became resolvable again. PTR-backed matches and ASN-domain matches (both stable attributions) are still cached as before; only the specificreverse_dns=None AND type=None AND name=as_namestate skips the cache write so the next lookup retries.