Skip to content

comunica/sparql-benchmark-runner.js

Repository files navigation

SPARQL Benchmark Runner

Build Coverage NPM Docker

This is a simple tool to run a query set against a given SPARQL endpoint, and measure its execution time.

Concretely, the query set is a directory containing any number of files, where each file contains a number of SPARQL queries seperated by empty lines.

Example directory of a query set:

watdiv-10M/
  C1.txt
  C2.txt
  C3.txt
  F1.txt
  ...

Example contents of C1.txt:

SELECT * WHERE {
  ?v0 <http://schema.org/caption> ?v1 .
  ?v0 <http://schema.org/text> ?v2 .
}

SELECT * WHERE {
  ?v0 <http://schema.org/caption> ?v1 .
  ?v0 <http://schema.org/text> ?v2 .
}

SELECT * WHERE {
  ?v0 <http://schema.org/caption> ?v1 .
  ?v0 <http://schema.org/text> ?v2 .
}

By default, it generates CSV output in a form similar to:

name;id;error;errorDescription;failures;hash;replication;results;resultsMax;resultsMin;time;timeMax;timeMin;times;timestamps;timestampsMax;timestampsMin;timestampsStd;timeStd;timestampsAll
C1;0;false;;0;6e0f167d2eb0e61af0673275ee8f935f;5;5;5;5;25.8;33;20;28 33 26 22 20;25.4 25.4 25.4 25.4 25.4;32 32 32 32 32;20 20 20 20 20;4.176122603564219 4.176122603564219 4.176122603564219 4.176122603564219 4.176122603564219;4.578209256903839;"[[32, 32, 32, 32, 32],[20, 20, 20, 20, 20], [21, 21, 21, 21, 21], [30, 30, 30, 30, 30], [26, 26, 26, 26, 26]]"
C1;1;false;;0;3e279701df97583c2f296ac0c2e5b877;5;5;5;5;38.6;90;20;27 28 28 20 90;38.4 38.4 38.6 38.6 38.6;89 89 90 90 90;20 20 20 20 20;25.476263462289754 25.476263462289754 25.873538606073968 25.873538606073968 25.873538606073968;25.873538606073968;"[[38.4, 38.4, 38.6, 38.6, 38.6],[50, 50, 51, 52, 53.3], [20, 20, 20, 20, 20], [89, 89, 90, 90, 90], [66, 76, 76, 76, 77]]"
C1;2;false;;0;4783aeaa4ce9950eafd3a623e1a537f6;5;5;5;5;35.8;80;20;28 26 80 20 25;35.8 35.8 35.8 35.8 35.8;80 80 80 80 80;20 20 20 20 20;22.25668438918969 22.25668438918969 22.25668438918969 22.25668438918969 22.25668438918969;22.25668438918969;"[[35.4, 35.4, 35.6, 35.6, 35.6],[50, 50, 51, 52, 53.3], [20, 20, 20, 20, 20], [80, 80, 80, 80, 80], [66, 76, 76, 76, 77]]"

Installation

npm install sparql-benchmark-runner

Usage

Command Line Interface (CLI)

sparql-benchmark-runner can be used from the CLI with the the following options.

Options:
  --version      Show version number                                   [boolean]
  --endpoint     URL of the SPARQL endpoint to send queries to
                                                             [string] [required]
  --queries      Directory of the queries                    [string] [required]
  --replication  Number of replication runs                [number] [default: 5]
  --warmup       Number of warmup runs                     [number] [default: 1]
  --output       Destination for the output CSV file
                                               [string] [default: "./output.csv"]
  --outputRaw    Destination for the raw JSON output file                 [string]
  --metadata     Load query metadata files (*.metadata.json) and enable
                 sequence aggregation                   [boolean] [default: false]
  --invalidateCacheAfterQuerySet  Send a cache invalidation request after each query set
                 execution                              [boolean] [default: false]
  --timeout      Timeout value in seconds to use for individual queries [number]
  ----help

An example input is the following.

sparql-benchmark-runner \
  --endpoint http://example.org/sparql \
  --queries ./watdiv-10M/ \
  --output ./output.csv \
  --replication 5 \
  --warmup 1

As a JavaScript Library

When used as a JavaScript library, the runner can be configured with different query loaders, result aggregators and result serializers to accommodate special use cases. By default, when no specific result aggregator is provided, the runner uses ResultAggregatorComunica that handles basic aggregation, as well as the httpRequests metadata field from a Comunica SPARQL endpoint, if such metadata is provided. Metadata-based sequence aggregation is optional and can be enabled by passing query metadata to the runner together with ResultAggregatorComunicaQuerySequence.

import {
  SparqlBenchmarkRunner,
  ResultSerializerCsv,
  ResultAggregatorComunica,
  QueryLoaderFile,
} from 'sparql-benchmark-runner';

async function executeQueries(pathToQueries, pathToOutputCsv) {
  const queryLoader = new QueryLoaderFile(pathToQueries);
  const resultSerializer = new ResultSerializerCsv();
  const resultAggregator = new ResultAggregatorComunica();

  const querySets = await queryLoader.loadQueries();

  const runner = new SparqlBenchmarkRunner({
    endpoint: 'https://localhost:8080/sparql',
    querySets,
    replication: 4,
    warmup: 1,
    timeout: 60_000,
    availabilityCheckTimeout: 1_000,
    logger: (message) => console.log(message),
    resultAggregator,
  });

  const results = await runner.run();

  await resultSerializer.serialize(pathToOutputCsv, results.aggregateResults);
}

Sequence Metadata

To enable sequence metadata from the CLI, add --metadata. Raw JSON output is only written when --outputRaw is provided, or automatically to ./output-raw.json when --metadata is enabled. Cache invalidation requests are disabled by default and can be enabled with --invalidateCacheAfterQuerySet.

An example metadata file for a two-query sequence is structured as follows:

{
  "user": {
    "user": "http://solidbench-server:3000/pods/00000000000000000933/profile/card#me",
    "transitionProbability": 0.09954255358650524
  },
  "sequenceElements": [
    {
      "session": {
        "task": "Messages",
        "sessionLength": 5,
        "sessionId": 0
      },
      "template": "interactive-short-6",
      "nOpenSessions": 1
    },
    {
      "session": {
        "task": "Messages",
        "sessionLength": 3,
        "sessionId": 1
      },
      "template": "interactive-discover-2",
      "nOpenSessions": 2
    }
  ]
}

Here, the "user" field provides sequence-level metadata, while "sequenceElements" contains metadata for individual queries within that sequence.

If the --outputRaw flag is enabled, the runner appends this metadata to the raw data output. Based on the previous example, the resulting output is structured as follows:

[
  {
    "name": "<querySequenceName>",
    "id": "0",
    "hash": "d41d8cd98f00b204e9800998ecf8427e",
    "results": 0,
    "time": 3400,
    "timestamps": [],
    "httpRequests": 2579,
    "user": {
      "user": "http://solidbench-server:3000/pods/00000000000000000933/profile/card#me",
      "transitionProbability": 0.09954255358650524
    },
    "sequenceElement": {
      "session": {
        "task": "Messages",
        "sessionLength": 5,
        "sessionId": 0
      },
      "template": "interactive-short-6",
      "nOpenSessions": 1
    }
  },
  {
    "name": "<querySequenceName>",
    "id": "1",
    "hash": "8493b4f999ae7bae8fd62f6d7734e0ac",
    "results": 0,
    "time": 43888,
    "timestamps": [
      300, 3500, 40000
    ],
    "httpRequests": 2579,
    "user": {
      "user": "http://solidbench-server:3000/pods/00000000000000000933/profile/card#me",
      "transitionProbability": 0.09954255358650524
    },
    "sequenceElement": {
      "session": {
        "task": "Messages",
        "sessionLength": 3,
        "sessionId": 1
      },
      "template": "interactive-discover-2",
      "nOpenSessions": 2
    }
  }
]

Docker

This tool is also available as a Docker image:

touch output.csv
docker run \
  --rm \
  --interactive \
  --tty \
  --volume $(pwd)/output.csv:/output.csv \
  --volume $(pwd)/queries:/queries \
  comunica/sparql-benchmark-runner \
  --endpoint https://dbpedia.org/sparql \
  --queries /queries \
  --output /output.csv \
  --replication 5 \
  --warmup 1

License

This code is copyrighted by Ghent University – imec and released under the MIT license.

About

πŸ‹ Executes a query set against a given SPARQL endpoint

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages