Skip to content

wrong document type error #95

@tacman

Description

@tacman

I'm trying to run against the link https://www.privatdozent.co/p/the-battle-line-at-louvain-1914, and getting an error.

15:13:45 INFO      [graby] Opengraph "article:" data: [] ["ogData" => []]
15:13:45 INFO      [graby] JSON-LD data: ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]] ["JsonLdData" => ["@context" => "https://schema.org","@type" => "NewsArticle","url" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","mainEntityOfPage" => "https://www.privatdozent.co/p/the-battle-line-at-louvain-1914","headline" => "The Battle Line at Louvain (1914)","description" => "“Where they burn books, they will also burn people” — Heinrich Heine","image" => [["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca982839-4161-4d7b-90a3-9ff1bdeca5f0_1280x939.jpeg"]],"datePublished" => "2024-11-15T09:42:48+00:00","dateModified" => "2024-11-15T09:42:48+00:00","isAccessibleForFree" => true,"author" => [["@type" => "Person","name" => "Jørgen Veisdal","url" => "https://substack.com/@privatdozent","description" => "Author of Privatdozent. Associate Professor.","identifier" => "user:3088938","sameAs" => ["https://twitter.com/JorgenVeisdal"],"image" => ["@type" => "ImageObject","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F86ca7756-940a-4d3c-affc-fd5e6a968f2a_2653x2653.jpeg"]]],"publisher" => ["@type" => "Organization","name" => "Privatdozent","url" => "https://www.privatdozent.co","description" => "Essays on the history of mathematics. 10k+ subscribers. Substack Bestseller (2024) 🥇, Grow Feature (2022) 📈, Featured Substack Newsletter (2021) 🌟","interactionStatistic" => ["@type" => "InteractionCounter","name" => "Subscribers","interactionType" => "https://schema.org/SubscribeAction","userInteractionCount" => 10000],"identifier" => "pub:14134","logo" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"image" => ["@type" => "ImageObject","url" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","contentUrl" => "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png","thumbnailUrl" => "https://substackcdn.com/image/fetch/w_128,h_128,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0a20ecc-bc78-4d0a-bed0-7901fce9e3e8_1280x1280.png"],"sameAs" => ["https://twitter.com/dozentprivat"]]]]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] date matched from JsonLd: 2024-11-15T09:42:48+00:00 ["date" => "2024-11-15T09:42:48+00:00"]
15:13:45 INFO      [graby] author matched from JsonLd: Jørgen Veisdal ["author" => "Jørgen Veisdal"]
15:13:45 INFO      [graby] title matched from JsonLd: {The Battle Line at Louvain (1914)} ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] Trying //meta[@property="og:title"]/@content for title ["pattern" => "//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] title matched: The Battle Line at Louvain (1914) ["title" => "The Battle Line at Louvain (1914)"]
15:13:45 INFO      [graby] ...XPath match: {pattern} ["pattern","//meta[@property="og:title"]/@content"]
15:13:45 INFO      [graby] Trying //meta[@property="article:published_time"]/@content for date ["pattern" => "//meta[@property="article:published_time"]/@content"]
15:13:45 INFO      [graby] Trying //html[@lang]/@lang for language ["pattern" => "//html[@lang]/@lang"]
15:13:45 INFO      [graby] Trying //meta[@name="DC.language"]/@content for language ["pattern" => "//meta[@name="DC.language"]/@content"]
15:13:45 INFO      [graby] Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element ["pattern" => "//*[contains(@class, 'google-dfp-ad-wrapper')]"]
15:13:45 INFO      [graby] Trying //iframe/@srcdoc to strip element ["pattern" => "//iframe/@srcdoc"]
15:13:45 INFO      [graby] Trying sharedaddy to strip element ["string" => "sharedaddy"]
15:13:45 INFO      [graby] Trying i-amphtml-replaced-content to strip element ["string" => "i-amphtml-replaced-content"]
15:13:45 INFO      [graby] Using Readability

In Readability.php line 268:
                        
  [DOMException (4)]    
  Wrong Document Error  
                        

Exception trace:
  at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 DOMNode->appendChild() at /home/tac/g/sites/feeds/vendor/j0k3r/php-readability/src/Readability.php:268
 Readability\Readability->init() at /home/tac/g/tacman/graby/src/Extractor/ContentExtractor.php:484
 Graby\Extractor\ContentExtractor->process() at /home/tac/g/tacman/graby/src/Graby.php:352
 Graby\Graby->doFetchContent() at /home/tac/g/tacman/graby/src/Graby.php:177
 Graby\Graby->fetchContent() at /home/tac/g/sites/feeds/src/Parser/Internal.php:25
 App\Parser\Internal->parse() at /home/tac/g/sites/feeds/src/Content/Extractor.php:117
 App\Content\Extractor->parseContent() at /home/tac/g/sites/feeds/src/Content/Import.php:97
 App\Content\Import->process() at /home/tac/g/sites/feeds/src/Command/FetchItemsCommand.php:155
 App\Command\FetchItemsCommand->execute() at /home/tac/g/sites/feeds/vendor/symfony/console/Command/Command.php:279
 Symfony\Component\Console\Command\Command->run() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:1094
 Symfony\Component\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:123
 Symfony\Bundle\FrameworkBundle\Console\Application->doRunCommand() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:342
 Symfony\Component\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/framework-bundle/Console/Application.php:77
 Symfony\Bundle\FrameworkBundle\Console\Application->doRun() at /home/tac/g/sites/feeds/vendor/symfony/console/Application.php:193
 Symfony\Component\Console\Application->run() at /home/tac/g/sites/feeds/vendor/symfony/runtime/Runner/Symfony/ConsoleApplicationRunner.php:49
 Symfony\Component\Runtime\Runner\Symfony\ConsoleApplicationRunner->run() at /home/tac/g/sites/feeds/vendor/autoload_runtime.php:29
 require_once() at /home/tac/g/sites/feeds/c:11

feed:fetch-items [--slug [SLUG]] [--use_queue] [--] [<age>]

This is graby, calling this library, but I'm stuck and don't really understand DOM manipulation in PHP.

I'm running PHP 8.3, and I'm wondering it it's stricter about adding dom elements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions