Skip to content
Change the repository type filter

All

    Repositories list

    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      191051715Updated Apr 10, 2026Apr 10, 2026
    • Jupyter Notebook
      1401Updated Apr 10, 2026Apr 10, 2026
    • Jupyter Notebook
      MIT License
      11601Updated Apr 10, 2026Apr 10, 2026
    • Article extraction benchmark: dataset and evaluation scripts
      Python
      MIT License
      3736212Updated Apr 10, 2026Apr 10, 2026
    • shub

      Public
      Scrapinghub Command Line Client
      Python
      BSD 3-Clause "New" or "Revised" License
      791304714Updated Apr 10, 2026Apr 10, 2026
    • python parser for human readable dates
      Python
      BSD 3-Clause "New" or "Revised" License
      4932.8k29951Updated Apr 9, 2026Apr 9, 2026
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      29127114Updated Apr 9, 2026Apr 9, 2026
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      BSD 3-Clause "New" or "Revised" License
      102552438Updated Apr 8, 2026Apr 8, 2026
    • Software stack with latest Scrapy and updated deps
      Dockerfile
      BSD 3-Clause "New" or "Revised" License
      206411Updated Apr 7, 2026Apr 7, 2026
    • A client interface for Scrapinghub's API
      Python
      BSD 3-Clause "New" or "Revised" License
      61205242Updated Apr 7, 2026Apr 7, 2026
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      BSD 3-Clause "New" or "Revised" License
      1199613917Updated Apr 1, 2026Apr 1, 2026
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      BSD 3-Clause "New" or "Revised" License
      162879245Updated Mar 20, 2026Mar 20, 2026
    • Extract price amount and currency symbol from a raw text string
      Python
      BSD 3-Clause "New" or "Revised" License
      54345179Updated Mar 19, 2026Mar 19, 2026
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      BSD 3-Clause "New" or "Revised" License
      62451Updated Mar 3, 2026Mar 3, 2026
    • Scrapy entrypoint for Scrapinghub job runner
      Python
      BSD 3-Clause "New" or "Revised" License
      162561Updated Feb 26, 2026Feb 26, 2026
    • Python
      BSD 3-Clause "New" or "Revised" License
      141421Updated Jan 21, 2026Jan 21, 2026
    • A python binding for crfsuite
      Python
      MIT License
      225771463Updated Dec 23, 2025Dec 23, 2025
    • marathon-apps-collectd-plugin
      Python
      GNU General Public License v2.0
      96201Updated Nov 27, 2025Nov 27, 2025
    • Crawl Frontier HCF backend
      Python
      BSD 3-Clause "New" or "Revised" License
      6821Updated Oct 31, 2025Oct 31, 2025
    • Dockerfile
      83305Updated Oct 20, 2025Oct 20, 2025
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      45811Updated Sep 4, 2025Sep 4, 2025
    • scikit-learn inspired API for CRFsuite
      Python
      209101Updated Sep 4, 2025Sep 4, 2025
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      53621Updated Jun 6, 2025Jun 6, 2025
    • frontera

      Public
      A scalable frontier for web crawlers
      Python
      BSD 3-Clause "New" or "Revised" License
      2161.3k7817Updated Jun 6, 2025Jun 6, 2025
    • Python Social Auth - Application - Django
      Python
      BSD 3-Clause "New" or "Revised" License
      394201Updated Nov 18, 2024Nov 18, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      25126146Updated Oct 23, 2024Oct 23, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      Apache License 2.0
      221201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      BSD 3-Clause "New" or "Revised" License
      5204.2k37326Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      Apache License 2.0
      86201Updated Jul 18, 2024Jul 18, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      BSD 3-Clause "New" or "Revised" License
      816236Updated Jul 9, 2024Jul 9, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.