Skip to content

Releases: domainaware/parsedmarc

10.1.1

14 Jun 00:48

Choose a tag to compare

Changes

  • Require mailsuite>=2.2.2.2 to honor config reloading in the IMAP IDLE loop

10.1.0

13 Jun 01:59
8337b67

Choose a tag to compare

Changes

  • Graceful shutdown on SIGTERM / SIGINT. Previously SIGTERM (sent by systemctl stop, docker stop, and Kubernetes pod termination) killed the process mid-batch, tearing output writes and silently dropping Kafka producer buffers. parsedmarc now sets a shutdown flag that is polled at safe boundaries: the one-shot CLI checks it between batches, and the watcher passes it as config_reloading so the mailbox backend — including the IMAP IDLE loop — returns once the current batch is processed. Either way the current batch and its output writes finish before the process exits. Ctrl-C is a double-tap: the first press is graceful, the second short-circuits to os._exit(130). This requires the config_reloading-aware IMAP IDLE loop from the pinned mailsuite fork; with a stock mailsuite the IDLE watcher cannot be interrupted cleanly. (PR #794)
    • Output clients are now closed on every exit path via atexit plus a trailing close at the end of _main(), fixing a long-standing leak where one-shot CLI runs and graceful shutdowns never flushed Kafka / closed Elasticsearch / S3 / etc. clients.
    • Example systemd unit in docs/source/usage.md now sets KillSignal=SIGTERM and TimeoutStopSec=60 so systemd waits long enough for the watcher to drain (keep it above mailbox_check_timeout).
  • Switch the Kafka client dependency from kafka-python-ng back to kafka-python>=2.3.2 (#795). kafka-python-ng was a fork created while kafka-python was unmaintained; upstream kafka-python is active again, and the now-archived fork is vulnerable to CVE-2026-10142 and CVE-2026-10143, both fixed in kafka-python 2.3.2. Both packages install the same kafka module, so if you are upgrading an existing environment in place with pip, run pip uninstall kafka-python-ng before upgrading parsedmarc so the two distributions don't conflict with each other's files.
    • parsedmarc is compatible with both kafka-python 2.3.2+ and 3.x: kafka-python 3.0 removed the NoBrokersAvailable exception (a failed bootstrap now raises KafkaTimeoutError), and parsedmarc handles whichever the installed version provides.
  • The whole codebase (library and tests) now passes pyright with zero errors and warnings, and CI enforces this (plus ruff format --check) on every push and pull request. Pyright is configured in pyproject.toml ([tool.pyright]) and pinned in the [build] extra. The fixes are annotation-level only (Optional parameters, TypedDict-aware signatures on the syslog/GELF save methods, TYPE_CHECKING-aware optional imports for psycopg and the kafka-python 2.x/3.x bootstrap-error fallback) — runtime behavior is unchanged apart from the SyslogClient fix below. Builds on the class-body alias declarations from #797.

Bug fixes

  • SyslogClient: constructing a TCP/TLS client with retry_attempts < 1 now raises ValueError instead of silently skipping the connection loop and registering a broken (None) log handler.

10.0.4

08 Jun 21:44
b869235

Choose a tag to compare

Docker

  • PostgreSQL backend in the prebuilt image (#792): the image now installs parsedmarc[postgresql], so the optional PostgreSQL output backend (psycopg) works out of the box in the container without a separate pip install.
  • arm64 images (#789): the release workflow now builds and publishes a multi-arch manifest for linux/amd64 and linux/arm64, so ghcr.io/domainaware/parsedmarc runs natively on 64-bit ARM hosts such as a Raspberry Pi. All compiled dependencies (psycopg[binary], lxml, maxminddb, cryptography) ship prebuilt aarch64 wheels, so the build adds no source-compilation step.

10.0.3

24 May 17:56
e104f11

Choose a tag to compare

This release also includes the changes from 10.0.1 and 10.0.2, which were never published as separate GitHub releases.

10.0.3

Bug fixes

  • Fix Reply-To (and Delivered-To) addresses being dropped from failure-report samples. parse_email() looked up mailparser's underscored reply_to / delivered_to keys, but mail_json names those headers reply-to / delivered-to, so the lookup always missed and parsed_sample["reply_to"] was always [] regardless of the message. Failure samples now carry their parsed Reply-To addresses through to JSON/CSV output and the Elasticsearch/OpenSearch nested sample.reply_to field.

Dashboard fixes

All failure (RUF) dashboards now render every displayed address (From, To, Reply-To) the same way: Display Name <addr>, or the bare address when there is no display name. The format is assembled at query time from fields (display_name / address) that already exist on previously-indexed reports, so the panels work on historical data, not only on reports stored after upgrading — with one unavoidable exception: a report's Reply-To only appears for reports parsed by 10.0.3 or later. Earlier versions discarded it at parse time (the bug above), so it is absent from older stored reports; recovering it requires re-parsing the original samples.

  • Splunk failure dashboard: the email-samples panel showed empty from and reply_to columns — it renamed parsed_sample.headers.from{}{} / parsed_sample.headers.reply-to{}{}, which are mis-cased (the header keys are From / Reply-To) and array-of-array shaped. The panel now builds from and reply_to with an eval that coalesces display_name <address> down to the bare address when there is no display name. (A multi-address Reply-To falls back to addresses-only — a Splunk multi-value-rendering limitation, not a data-loss one.)
  • OpenSearch failure dashboard: the column labelled reply_to aggregated sample.headers.in-reply-to.keyword — the In-Reply-To threading header, not the Reply-To address. It now aggregates sample.headers.reply-to.keyword, and that field was added to the dmarc_f* index pattern. To support it, the Elasticsearch/OpenSearch failure writer now flattens the Reply-To header into a display string on sample.headers["reply-to"], mirroring the existing From / To handling. (Re-import the dashboards, or refresh the dmarc_f* index pattern, to pick up the new field.)
  • Grafana (Elasticsearch) dashboard: the Failure Samples panel already read sample.headers.reply-to.keyword, but that field previously held the raw [[name, address]] array (split into separate name/address terms). The failure-writer flattening above makes the existing ReplyTo column render a clean Name <address> string — no dashboard change required.
  • Grafana (PostgreSQL) dashboard: the Failure Reports panel did not surface the message From header or Reply-To at all (it showed only the envelope Mail From / Rcpt To). Added From (from sample_from) and Reply To (aggregated from dmarc_failure_sample_address) columns.

10.0.2

Changes

  • Bump the mailsuite requirement to >=2.2.1, which raises the transitive mail-parser floor to >=4.2.1. This pulls in two upstream fixes:

    • mail-parser 4.2.1 stops returning a phantom ('', '') entry for absent address headers, so parsedmarc no longer indexes an empty Cc/Bcc address ({address: ""}) for every DMARC failure-report sample in Elasticsearch/OpenSearch — and no longer emits it in JSON, S3, or Kafka output.
    • mail-parser 4.2.1 also adopts the stricter address parsing that hardens against CVE-2023-27043 — a Python email-module flaw where an RFC 2822 header containing a special character has the wrong portion identified as the addr-spec, which can let a crafted address bypass email-domain verification.

    (The Reply-To parsing for failure samples and the failure dashboards are tracked separately.)

10.0.1

Changes

  • Bump mailsuite requirement to >=2.2.0 to fix an upstream Reply-To header parsing bug for failure samples

10.0.0

21 May 21:20

Choose a tag to compare

Enhancements

Support for RFC 9989 / RFC 9990 / RFC 9991 reports

Adds parsing support for the final DMARC specification (RFC 9989), the new aggregate-report schema (RFC 9990), and the new failure-report format (RFC 9991), while preserving full RFC 7489 / RFC 6591 backward compatibility.

New aggregate-report fields surfaced from the RFC 9990 XSD — added to types, parsing, CSV output, and Elasticsearch/OpenSearch mappings:

  • np — non-existent subdomain policy (none/quarantine/reject)
  • testing — testing mode flag (n/y); reports whether the published DMARC record sets t=y. It is a new field, not a replacement for pct; the pct mechanism was removed entirely by RFC 9989 Appendix A.6 with no per-message replacement.
  • discovery_method — policy discovery method (psl/treewalk)
  • generator — report generator software identifier, in report_metadata
  • human_result — optional descriptive text on DKIM/SPF auth results (langAttrString; a possible lang attribute is automatically unwrapped)
  • xml_namespace — the XML namespace declared on the <feedback> root, if any. RFC 9990 reports declare urn:ietf:params:xml:ns:dmarc-2.0.

pct is no longer present in RFC 9990's PolicyPublishedType and parses as None when absent. fo is still part of RFC 9990 and is preserved when set; it parses as None only when the reporter omits it.

The parser detects an RFC 9990 report from the dmarc-2.0 XML namespace or the presence of any RFC 9990-only field, so namespaceless reports that follow the RFC 9990 shape still receive RFC 9990-aware validation warnings (missing required DKIM selector, removed-in-RFC-9990 policy-override types forwarded / sampled_out). RFC 9990 also added policy_test_mode to the policy-override enumeration; it is parsed and stored unchanged.

For failure reports (RFC 9991), Identity-Alignment and Auth-Failure are split on CFWS-aware commas (whitespace is stripped from each token, per the RFC 9991 ABNF) and a warning is logged when either REQUIRED field is missing.

Several elements that became langAttrString in RFC 9990 (extra_contact_info, error, comment, human_result) are now safely unwrapped when the reporter sends them with a lang attribute.

Backwards compatibility to RFC 7489 is maintained.

PostgreSQL storage backend

New optional PostgreSQL output backend as a lighter-weight alternative to Elasticsearch/OpenSearch, configured via a [postgresql] section (host/port/user/password/database or a libpq connection_string), or equivalently through PARSEDMARC_POSTGRESQL_* environment variables and their _FILE Docker-secret variants like every other backend. Tables are created automatically on first run, and the schema captures the RFC 9990 aggregate fields (np, testing, discovery_method, generator, xml_namespace, and per-result human_result). A Grafana dashboard (dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json) is included. Aggregate and SMTP-TLS reports are de-duplicated via ON CONFLICT; failure reports via an arrival-date / From / To / Subject check mirroring the Elasticsearch backend.

The backend is opt-in: install it with pip install parsedmarc[postgresql] (it pulls in psycopg). It is not a mandatory dependency because the prebuilt psycopg binary wheels are not available for every platform.

Docker-secret support via _FILE env vars

Any PARSEDMARC_{SECTION}_{KEY} environment variable can now also be supplied via a file by appending _FILE to its name (e.g. PARSEDMARC_IMAP_PASSWORD_FILE=/run/secrets/imap_password). The file's contents (with trailing CR/LF stripped) are used as the value. This is the same convention used by the official Postgres, MariaDB, and Redis container images, so credentials no longer have to appear in plain environment: blocks where docker inspect, container logs, and /proc/<pid>/environ would expose them.

When both the direct var and its _FILE companion are set, the file wins. A missing or unreadable file raises ConfigurationError rather than silently falling back to an empty value. The four pre-existing *_file config keys ([general] log_file, [msgraph] token_file, [gmail_api] credentials_file, [gmail_api] token_file) keep their direct-path semantics; wrap them in a Docker secret by doubling the suffix (PARSEDMARC_GMAIL_API_CREDENTIALS_FILE_FILE).

Elastic Cloud Serverless compatibility

New [elasticsearch] serverless config flag (env var PARSEDMARC_ELASTICSEARCH_SERVERLESS). Elastic Cloud Serverless manages sharding and replication itself and rejects the number_of_shards / number_of_replicas index settings with HTTP 400 — previously every write into a Serverless project failed at index-creation time. With the flag set, create_indexes strips those two keys from the settings sent to Elasticsearch and passes any other settings (e.g. refresh_interval) through unchanged. Non-Serverless deployments are unaffected.

Bug fixes

  • save_smtp_tls_report_to_s3 was completely broken. parsedmarc/s3.py:save_report_to_s3 unconditionally read report["report_metadata"] when assembling S3 object metadata, but SMTP TLS reports are flat per RFC 8460 §4.3 — they have no report_metadata sub-object — and parse_smtp_tls_report_json correctly stores begin_date as the raw ISO-8601 string from the report. The S3 path branch also assumed begin_date was a datetime and did .year / .month / .day on it. The CLI's surrounding try/except silently swallowed the resulting KeyError, so every SMTP-TLS report quietly failed to upload to S3 in production. Both issues are fixed: SMTP-TLS metadata is now built from the flat report fields directly, and the date is normalized via human_timestamp_to_datetime.
  • append_json corrupted JSON output files on the second write. The original implementation opened files in "a+" mode, then seek()ed backwards to overwrite the trailing ] with ,\n before appending more elements. Python's docs are explicit: on POSIX, writes in "a"/"a+" mode always go to EOF regardless of seek position. The result was that every second call onto an existing file produced [...]\n],\n[...]-style corrupted output instead of a single merged JSON array. Anyone running parsedmarc in watch mode with JSON output enabled had aggregate.json / failure.json / smtp_tls.json quietly turning into invalid JSON after the first overlap. Replaced with a read-merge-write pattern: load the existing array (if any), append the new elements, rewrite the whole file. append_csv was not affected — it doesn't seek backwards.
  • Removed redundant try/except in parsedmarc/webhook.py. save_aggregate_report_to_webhook / save_failure_report_to_webhook / save_smtp_tls_report_to_webhook each wrapped self._send_to_webhook(...) in a try/except, but _send_to_webhook already catches every Exception itself, so the outer except blocks were unreachable dead code.
  • Report files whose names contain glob metacharacters were silently skipped. The CLI expanded every file argument with glob() (parsedmarc/cli.py), which interprets [, ], *, and ? as pattern syntax (see the glob docs). A literal path such as [Netease DMARC Failure Report] Rent Reminder.eml — the bracketed shape many providers use for emailed failure reports — was treated as a character class, matched nothing, and was dropped before reaching the parser, with no error. File arguments that already exist on disk are now taken literally; only non-existent paths are treated as glob patterns, so shell-style wildcards (samples/*.xml) still expand.
  • OpenSearch Dashboards reported a mapping conflict on the aggregate index pattern's org_email field. The shipped dashboards/opensearch/opensearch_dashboards.ndjson froze a cached field-list snapshot in which org_email was a text / object conflict, alongside leftover org_email.#text and org_email.#text.keyword subfields — artifacts of a cluster that had once indexed a langAttrString email dict ({"#text": …, "@lang": …}) before the parser unwrapped it. org_email is mapped as Text() and the parser now unwraps a dict email to a plain string, so live data is consistent; cleared the stale conflict and the two artifact subfields from the index pattern, leaving org_email (text) and org_email.keyword so importers no longer see the warning.
  • dashboard-dev-bootstrap.sh imported the OpenSearch Dashboards saved objects into the wrong tenant. The script sent securitytenant: global_tenant, but the OpenSearch security plugin reads that header as a tenant name, and global_tenant is a sample custom tenant shipped in the security demo config — not the shared Global tenant, whose token is the literal global. The import succeeded into a separate global_tenant tenant (its own .kibana_<hash>_globaltenant_1 index), so the dashboards were invisible to anyone viewing the Global tenant in OpenSearch Dashboards. Changed the default OSD_TENANT to global. (An empty/omitted securitytenant header is not equivalent — it falls back to the user's configured default tenant, not Global.) This affects the contributor dev stack only, not the shipped dashboards.

Breaking changes

Forensic reports have been renamed to failure reports

Forensic reports have been renamed to failure reports throughout the project to reflect the proper naming of the reports since RFC 7489.

  • Core: types.py, __init__.pyForensicReportFailureReport, parse_forensic_reportparse_failure_report, report type "failure"
  • Output modules: elastic.py, opensearch.py, splunk.py, kafkaclient.py, syslog.py, gelf.py, webhook.py, loganalytics.py, s3.py
  • CLI: cli.py — args, config keys, index names (`dmarc_fail...
Read more

9.11.2

08 May 16:44
ff6f75d

Choose a tag to compare

Changes

  • base_reverse_dns_types.txt removed; sortlists.py now reads the authoritative type list directly from parsedmarc/resources/maps/README.md. The README's industry list (between new <!-- types-list:start --> / <!-- types-list:end --> HTML-comment markers) is now the single source of truth, eliminating the drift risk between the data file and the documented list. Before validating the map, sortlists.py also normalizes the README block in place: trims whitespace, deduplicates case-insensitively (errors on case-conflicting entries), and sorts entries alphabetically — so adding a new type is just inserting a - New Type line anywhere inside the markers. Also fixes a pre-existing typo in the precedence rules where rule 4 said Web Hosting but the canonical type used in 4,176 map rows is Web Host.
  • Maintenance tooling no longer ships in the wheel/sdist. The Python scripts under parsedmarc/resources/maps/ (collect_domain_info.py, classify_unknown_domains.py, detect_psl_overrides.py, detect_rebrands.py, sortlists.py, plus the previously-already-excluded find_bad_utf8.py and find_unknown_base_reverse_dns.py) are maintainer-only batch tooling, not parsedmarc runtime code. They have always been in the repository for convenience but were unnecessarily included in distributions, pulling reviewer attention and contributing nothing to end-user functionality. The build now excludes any .py file under parsedmarc/resources/maps/ whose name doesn't start with an underscore via a single glob pattern (parsedmarc/resources/maps/[!_]*.py), so future maintainer scripts added to that directory are excluded automatically while __init__.py continues to ship. The directory's __init__.py and the runtime data files (base_reverse_dns_map.csv, known_unknown_base_reverse_dns.txt, psl_overrides.txt) continue to ship — they're loaded at runtime via importlib.resources.files(parsedmarc.resources.maps).

9.11.1

30 Apr 16:06
397378d

Choose a tag to compare

Fixed

  • Bump required mailsuite version to >=2.0.2 to address RuntimeError: Event loop is closed failures in the Microsoft Graph mailbox backend (#742).

9.11.0

28 Apr 05:06
5d816a4

Choose a tag to compare

Changes

  • Mailbox backends now live in mailsuite>=2.0.0. The IMAPConnection, MSGraphConnection, GmailConnection, MaildirConnection, and MailboxConnection implementations were extracted into mailsuite 2.0.0 so other projects can reuse the same provider-agnostic interface. parsedmarc's parsedmarc.mail package is now a thin re-export of mailsuite.mailbox; existing imports (from parsedmarc.mail import IMAPConnection, etc.) continue to work unchanged. The CLI passes token_cache_name="parsedmarc" and forwards the existing [msgraph] graph_url config knob so cached AuthenticationRecords and tokens carry over without re-prompting.
  • MSGraph backend rewritten on msgraph-sdk (kiota-based). mailsuite 2.0 replaces the retired msgraph-core==0.2.2 REST wrapper with the supported msgraph-sdk client. End-user behavior is unchanged.
  • Direct dependencies on msgraph-core, imapclient, google-api-core, google-api-python-client, google-auth-httplib2, google-auth-oauthlib, and google-auth removed. They are now installed transitively via mailsuite[gmail,msgraph]>=2.0.0, which is included as a non-optional dependency so Gmail and Microsoft Graph support remain available out of the box.

9.10.3

27 Apr 04:42
826e78c

Choose a tag to compare

Fixed

  • Bundled OSD aggregate dashboard reported source-row counts as message volume. Pies, tables, and the choropleth aggregated with count instead of sum(message_count), so panels titled "Message volume…", "Reporting organizations", etc. counted distinct sources rather than emails. Bug present since the dashboard shipped in 9.4.0. Line-chart timeseries, SMTP TLS, and forensic panels were already correct.
  • Splunk aggregate "Map of message sources by country" widget had the same count-instead-of-sum(message_count) bug.
  • Splunk forensic-samples table dropped events with null From/To/Subject because the base search required those fields to exist (field=*). Replaced with a null-tolerant filter pattern.
  • Splunk SMTP TLS Failure details panel returned no rows; Splunk doesn't evaluate field>0 against multivalued JSON-array paths at search time. Switched to a presence filter plus post-stats where failed_sessions>0.

Changes

  • Aligned the Splunk dashboards with the OSD source-of-truth: new "Message sources by Autonomous System" panel; added missing dkim_aligned column to DKIM details; green/red colors for true/false on alignment pies and the DMARC-passage timechart; forensic dashboard simplified to OSD's two-panel layout (markdown + samples table); policy_type bucket added to SMTP TLS Domains; minor column / title alignments throughout.

Upgrade notes

Action required — re-import the dashboards. Stored saved objects don't auto-update on parsedmarc upgrade.

  • OSD: Stack Management → Saved Objects → Import the new dashboards/opensearch/opensearch_dashboards.ndjson. Switch the import mode from the default "Create new objects with unique IDs" to "Check for existing objects" and enable "Automatically overwrite conflicts". The default mode would import the corrected viz under fresh UUIDs and leave the buggy originals in place, so the dashboards would keep rendering the wrong numbers.
  • Splunk: paste each XML in dashboards/splunk/ into the corresponding dashboard's Source editor.

9.10.2

24 Apr 23:24
342b467

Choose a tag to compare

Fixed

  • MaildirConnection.fetch_message() now marks messages as read after reading them (sets the S flag and moves the file from new/ to cur/), unless --test is in effect. Previously, a message was processed but its on-disk maildir state was unchanged, so an MUA scanning the same maildir kept showing it as unread. Mirrors the existing mark_read=not test pattern used for MSGraphConnection.
  • get_ip_address_info() no longer caches weak-fallback attributions (no PTR + no ASN-domain map match → raw as_name used as source_name, source_type left null). get_reverse_dns() swallows every DNSException as None, so a transient PTR lookup failure (timeout, SERVFAIL, socket error) is indistinguishable from a genuine no-PTR case at that layer — caching the weak result would poison the 4-hour cache with a misattribution that persisted even after the PTR became resolvable again. PTR-backed matches and ASN-domain matches (both stable attributions) are still cached as before; only the specific reverse_dns=None AND type=None AND name=as_name state skips the cache write so the next lookup retries.