datawrapper_normalize.php is a WP-CLI helper script for auditing and repairing broken or outdated Datawrapper embeds stored in WordPress post and page content.
It was created to support content cleanup in cases where Datawrapper embeds were saved in inconsistent formats, where legacy markup still points to old iframe/script patterns, or where published assets need to be restored before the embed can be replaced safely.
The script is WordPress-aware and WPML-aware:
- It runs through
wp eval-file. - It scans only
postandpageentries. - It preserves localized permalinks when WPML is active.
- It can work in inventory mode or normalization mode.
WPML compatibility matters because the script writes a CSV report with the permalink for each inspected entry. When WPML is active, the script asks WPML for the language assigned to the current post/page and then resolves the permalink in that language. This means the report points editors to the correct localized URL instead of always falling back to the default-language permalink.
For each WordPress post or page that contains a datawrapper.dwcdn.net reference, the script:
- Extracts candidate Datawrapper chart IDs from the stored HTML.
- Requests fresh embed code from the Datawrapper API.
- Checks whether the public embed asset is available.
- If the asset is missing, tries to republish the chart through the API.
- If the API returns
404, tries a public CDN fallback embed. - Replaces matching legacy embed markup in the post or page content.
- Removes redundant
full.pngfallback images when a working embed has been restored. - Writes a CSV report with the result for every chart it inspects.
- Creates an HTML backup before updating any content in normalize mode.
This script is meant for editorial maintenance and migration cleanup.
Typical problems it helps resolve:
- Old Datawrapper embeds saved as legacy iframe markup.
- Broken or malformed embed script URLs in existing content.
- Posts or pages where the public Datawrapper asset is missing even though the chart still exists.
- Mixed embed states where a chart embed and a redundant static
full.pngimage are both present.
- WordPress with WP-CLI available.
- PHP with cURL enabled.
- Access to the WordPress database through the normal WP runtime.
- A Datawrapper API token in the
DATAWRAPPER_API_TOKENenvironment variable.
The script can still scan content without a token, but it will not be able to fetch official embed codes or republish charts, so the report will mostly document missing credentials.
The script is compatible with sites that use WPML.
It does not translate or duplicate content by itself, but it is aware of the language context of each post/page when generating report output:
- It uses the
wpml_post_language_detailsfilter to determine the language of the current post/page. - It uses the
wpml_permalinkfilter to resolve the permalink in that same language. - If WPML is not active, it falls back to the normal WordPress permalink.
This is especially useful during editorial cleanup because the CSV report can be used as an action list, and editors need links to the exact localized entry they are reviewing.
The script relies on the official Datawrapper API for two operations:
GET /v3/charts/{id}/embed-codesRetrieves the available embed code for a chart.POST /v3/charts/{id}/publishRepublishes a chart when its public embed asset appears to be missing.
It also validates the public embed asset on the Datawrapper CDN before deciding whether to use the returned embed code as-is, republish the chart, or fall back to a public script embed.
Relevant official documentation:
- Get embed codes for a chart
- Publish a chart
- Getting started with the Datawrapper API
- Responsive iframe embedding guide
Because the script fetches embed codes and may republish charts, the token should have enough permissions for both operations.
If you want the script to run with all recovery features enabled, including republishing charts when public assets are missing, the token should include all of these scopes:
chart:readis needed to request embed codes.chart:writeis needed to publish the chart again.theme:readis required by Datawrapper's publish endpoint.visualization:readis required by Datawrapper's publish endpoint.
According to the official Datawrapper API reference:
GET /v3/charts/{id}/embed-codesrequireschart:read.POST /v3/charts/{id}/publishrequireschart:read,chart:write,theme:read, andvisualization:read.
If you only want to audit content in inventory mode and do not care about republishing missing assets, chart:read is the minimum useful scope.
However, if a chart needs republishing during a later normalize run and your token does not include the publish-related scopes, the script will report the failure and skip updating that embed.
Datawrapper explains token creation in its official getting started guide:
The guide points to the token management page in the Datawrapper app:
In short:
- Open the API token page in Datawrapper.
- Click
Create new Access Token. - Give the token a name.
- Select the scopes listed above.
- Generate the token and store it safely, because Datawrapper will not show it again after the page is refreshed.
Run the script with wp eval-file from the WordPress environment:
wp eval-file datawrapper_normalize.php -- inventoryThe -- separator is important because it tells WP-CLI to pass the remaining arguments to the script.
The script supports two modes:
inventoryRead-only audit. It validates charts and writes the CSV report, but does not update content.normalizeAttempts to replace stored embed markup with a fresh working embed code.
The script accepts up to five arguments after --:
<mode> <limit> <dry-run> <start-after-id> <batch-size>
modeinventoryornormalizelimitMaximum number of posts/pages to process. Use0for no limit.dry-run1means simulate updates without writing to WordPress.0means write changes to the database.start-after-idOptional. Process only posts/pages with an ID greater than this value. Use0to start from the beginning. This is useful when a long production run is interrupted and you want to resume after the last completed post.batch-sizeOptional. Number of candidate posts/pages to fetch per database batch. Smaller batches reduce memory pressure during long runs. The default is20.
export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- inventoryexport DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- inventory 20export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- normalize 5 1export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- normalize 0 0export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- normalize 0 0 167685This skips everything up to and including post 167685 and continues with higher IDs only.
export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file datawrapper_normalize.php -- normalize 0 0 167685 10This is a good recovery pattern after a fatal memory error because it both resumes the run and lowers the amount of content loaded per batch.
export DATAWRAPPER_API_TOKEN="YOUR_TOKEN_HERE"
wp eval-file /Users/stefano/Datawrapper/datawrapper_normalize.php -- inventoryThe script generates two kinds of output in the current working directory:
datawrapper_normalize_report.csvA per-chart report describing what happened for each inspected embed.dw-normalize-backups/A directory of HTML backups, one file per updated post/page, written before any database update.
The CSV report contains:
post_idpost_typepost_statuspost_titlepermalinkchart_idapi_statusasset_statusrepublish_statusfallback_sourceactionreasonscript_src
These fields make it easier to review what was changed, what failed, and which charts may need manual follow-up.
The script is designed to replace several common stored embed variants found in WordPress content, including:
- Datawrapper iframe embeds
- Script-based embeds
- Broken legacy script URL variants
- Gutenberg wrapper blocks that contain Datawrapper embeds
It updates only the first matching block for each detected chart at a time, which keeps replacements predictable and makes the CSV easier to interpret.
The script includes several safeguards:
inventorymode is read-only.normalizemode supports dry-run operation.- Backups are written before content updates.
- Each chart is processed independently, so one failure does not stop the whole post/page.
- Candidate posts/pages are processed in batches instead of being loaded all at once.
- Long runs can be resumed with
start-after-id. - On fatal shutdown, the script reports the current post and the last fully processed post.
- The script logs per-chart outcomes to CSV for review after the run.
For production use, a safe workflow is:
- Run
inventoryfirst. - Review
datawrapper_normalize_report.csv. - Run
normalizewith a smalllimitanddry-run=1. - Inspect the intended changes.
- Run
normalizeagain withdry-run=0once the results look correct. - If a long production run is interrupted, resume it with
start-after-idset to the last fully processed post ID and consider loweringbatch-size.
- The script only scans WordPress
postandpagecontent. - It does not scan custom post types.
- It expects Datawrapper embeds to exist in
post_content. - If WPML is not active, permalinks are handled normally.
- If a chart cannot be resolved through the API and the public CDN asset is unavailable, the script will report the issue and skip updating that embed.
🤖 Created with Codex.