Introduction to rbench

We defined a benchmarking framework and database rbench in order to benchmark multiple database systems with many problems and problem instances. rbench is a collection of programs for solving problems in the database systems and bash scripts that start the database systems and run the problem instances.

This work is based on the OpenRuleBench benchmarking suite.

We define some problems in a declarative way. We are interested in how different (kinds of) database systems faire in solving the problems. Of course, problem-specific implementations and algorithms are often faster, but we're interested in the declarative programming and specifications-as-programs.

A problem is a mapping from input to output. Where applicable we map from input to size of output in order to skip measuring the time it takes to write to standard output or a file. A problem has one or more instances (input data) that can be solved using the same program or solver.

Publications

Source code

Measurement Results

The resource usages for the benchmark runs are available from the db/results directory in our git repository. We are working on a way to explore the full dataset online. In the meantime we give visualisations for the runtimes for benchmarks of specific input data on different systems.

References and Related Work

Systems

Our benchmarking system supports the following systems:

Cost Measure

As a first try to estimate the runtime, we look at the number of applicable rule instances in the given logic program.

I.e. this is the total number of variable assignments that satisfy the body (condition) of the rules. The standard seminaive evaluation would look at each such rule instance exactly once.

Note that this cost measure is purely declarative, and does not look at the sequence in which facts are computed. For the applicable rule instances, we only need the final version of the relations in the minimal model.

Problem Instances

We use our instance generator graphgen to generate graph instances. The repository includes shell scripts to generate the instances we used for specific publications or experiments.

Graphgen generates problem instances in the following formats:

We provide our current working set of instances to download. We also provide specific sets of instances:

Problem: Transitive Closure (tcff)

The TCFF problem is generating a Transitive closure of a directed graph. The output is potentially quadratic in the input size.

tc(A, B) :- par(A, B).
tc(A, B) :- par(A, X), tc(X, B).

The instances are generated using our graphgen tool.

SQL Formulation

WITH RECURSIVE tc(a,b) AS (
  SELECT par.a, par.b from par
  UNION
  SELECT par.a, tc.b from par JOIN tc ON par.b = tc.a
) SELECT Count(*) FROM tc;

with the table definition

CREATE TEMP TABLE par (
  a INT NOT NULL, b INT NOT NULL,
  CONSTRAINT par_pk PRIMARY KEY (a, b)
) WITHOUT ROWID;
CREATE INDEX par_fb ON par (b);

SPARQL Formulation

SELECT (count(*) as ?resultcount) WHERE {?a :par+ ?b}

CYPHER Formulation

MATCH (a)-[*]->(b)
RETURN DISTINCT a,b;

Input Graphs

Graph Nodes Edges In-Deg. Out-Deg. Cyc. TC Size. Iter. Rule Inst.
k50 50 2500 50 50 yes 2500 1 127500
k100 100 10000 100 100 yes 10000 1 1010000
k500 500 250000 500 500 yes 250000 1 125250000
k1k 1000 1000000 1000 1000 yes 1000000 1 1001000000
k2k 2000 4000000 2000 2000 yes 4000000 1 8004000000
t50 50 1225 0-49 0-49 no 1225 1 20825
t100 100 4950 0-99 0-99 no 4950 1 166650
t500 500 124750 0-499 0-499 no 124750 1 20833250
t1k 1000 499500 0-999 0-999 no 499500 1 166666500
t2k 2000 1999000 0-1999 0-1999 no 1999000 1 1333333000
c1k 1000 1000 1 1 yes 1000000 1000 1001000
c2k 2000 2000 1 1 yes 4000000 2000 4002000
c3k 3000 3000 1 1 yes 9000000 3000 9003000
c4k 4000 4000 1 1 yes 16000000 4000 16004000
s1k_1 1000 2000 2 2 yes 1000000 500 2002000
s1k_2 1000 3000 3 3 yes 1000000 201 3003000
s1k_3 1000 4000 4 4 yes 1000000 250 4004000
s1k_4 1000 5000 5 5 yes 1000000 200 5005000
s1k_5 1000 6000 6 6 yes 1000000 42 6006000
s2k_1 2000 4000 2 2 yes 4000000 1000 8004000
s2k_2 2000 6000 3 3 yes 4000000 288 12006000
s2k_3 2000 8000 4 4 yes 4000000 500 16008000
s2k_4 2000 10000 5 5 yes 4000000 400 20010000
s2k_5 2000 12000 6 6 yes 4000000 127 24012000
p1k 1000 999 0-1 0-1 no 499500 999 499500
p2k 2000 1999 0-1 0-1 no 1999000 1999 1999000
p3k 3000 2999 0-1 0-1 no 4498500 2999 4498500
p4k 4000 3999 0-1 0-1 no 7998000 3999 7998000
m4_2ki 8192 6144 0-1 0-1 no 12288 3 12288
m16_512 8192 7680 0-1 0-1 no 61440 15 61440
m64_128 8192 8064 0-1 0-1 no 258048 63 258048
m256_32 8192 8160 0-1 0-1 no 1044480 255 1044480
m1ki_8 8192 8184 0-1 0-1 no 4190208 1023 4190208
m4ki_2 8192 8190 0-1 0-1 no 16773120 4095 16773120
b17 131071 131070 0-1 0-2 no 1966082 16 1966082
b18 262143 262142 0-1 0-2 no 4194306 17 4194306
b19 524287 524286 0-1 0-2 no 8912898 18 8912898
v10 1023 1022 0-1 0-2 no 8194 9 8194
v11 2047 2046 0-1 0-2 no 18434 10 18434
v12 4095 4094 0-1 0-2 no 40962 11 40962
v17 131071 131070 0-1 0-2 no 1966082 16 1966082
v18 262143 262142 0-1 0-2 no 4194306 17 4194306
v19 524287 524286 0-1 0-2 no 8912898 18 8912898
y500_4k 4500 4499 0-500 0-1 no 9998000 4000 9998000
y500_8k 8500 8499 0-500 0-1 no 35996000 8000 35996000
y1k_4k 5000 4999 0-1000 0-1 no 11998000 4000 11998000
y1k_8k 9000 8999 0-1000 0-1 no 39996000 8000 39996000
u1k_50k 1000 50000 46-51 46-51 yes 1000000 3 50050000
u1k_125k 1000 125000 122-127 122-127 yes 1000000 2 125125000
u1k_250k 1000 250000 248-251 248-251 yes 1000000 2 250250000
u2k_200k 2000 200000 98-101 98-101 yes 4000000 3 400200000
u2k_500k 2000 500000 248-251 248-251 yes 4000000 2 1000500000
u2k_1m 2000 1000000 499-502 499-502 yes 4000000 2 2001000000
a1k_50k 1000 50000 0-102 0-102 no 471621 8 15269270
a1k_125k 1000 125000 0-250 0-250 no 492680 5 40853746
a1k_250k 1000 250000 0-500 0-501 no 497763 4 82941061
a2k_200k 2000 200000 0-201 0-201 no 1944729 8 128420988
a2k_500k 2000 500000 0-500 0-500 no 1985321 6 330666855
a2k_1m 2000 1000000 0-1000 0-1000 no 1995698 4 666033613
w1k_1k 2000 1000000 0-1000 0-1000 no 1000000 1 1000000
x10k 20001 20000 0-10000 0-10000 no 100020000 2 100020000

Measurements

For XSB we show the absolute time and for all other systems the relative time as a factor compared with XSB (e.g. the value 0.5 means that the system is twice as fast as XSB).

Graph XSB (s) BAM YAP SQLite PostgreSQL MariaDB Jena Soufflé
k500 13.342 s 0.30 0.66 5.07 3.62 3.75 1.92 0.20
k1k 103.266 s 0.31 0.67 5.56 3.82 3.76 1.82 0.18
k2k 810.945 s 0.42 0.66 5.94 3.90 160.08 1.87 0.18
t500 2.301 s 0.62 0.76 4.52 2.84 4.58 3.09 0.31
t1k 16.302 s 0.65 0.76 5.55 3.66 4.07 2.30 0.22
t2k 127.584 s 0.99 0.75 6.12 4.17 4.12 2.02 0.20
c1k 0.547 s 0.11 0.66 3.93 2.77 2.38 6.66 1.46
c2k 1.597 s 0.18 1.18 4.93 3.28 7.87 5.14 1.55
c3k 3.196 s 0.28 1.41 5.49 3.36 40.17 4.60 1.66
c4k 5.605 s 0.51 1.58 5.64 3.60 54.40 4.18 1.68
s1k_1 0.633 s 0.16 0.93 4.21 2.93 2.64 6.22 1.63
s1k_3 0.686 s 0.25 1.42 5.11 3.52 3.49 6.22 1.76
s1k_4 0.846 s 0.24 1.42 4.66 3.26 3.26 5.16 1.51
s2k_1 1.844 s 0.28 1.47 5.43 3.39 9.02 5.03 1.79
s2k_3 2.416 s 0.31 1.89 5.71 3.53 10.18 4.43 1.66
s2k_4 2.671 s 0.38 2.13 5.91 3.66 10.66 4.16 1.60
p1k 0.343 s 0.09 0.45 3.34 2.53 2.76 7.87 1.26
p2k 0.934 s 0.14 0.90 4.16 2.70 3.05 5.48 1.50
p3k 1.794 s 0.19 1.15 4.78 2.85 13.44 4.81 1.58
p4k 3.145 s 0.23 1.29 4.81 2.87 31.58 4.33 1.57
m4_2ki 0.113 s 0.00 0.18 0.53 3.42 1.06 17.47 0.27
m16_512 0.152 s 0.00 0.36 1.44 2.99 1.68 14.20 0.46
m64_128 0.283 s 0.17 0.49 2.25 2.59 2.07 9.28 0.83
m256_32 0.639 s 0.22 1.03 3.16 2.54 2.56 6.29 1.17
m1ki_8 1.817 s 0.19 1.10 4.34 2.67 9.77 4.85 1.38
m4ki_2 6.252 s 0.55 1.44 5.16 2.91 50.63 4.07 1.62
b17 1.089 s 0.33 0.85 3.20 2.63 2.79 7.33 0.79
b18 2.012 s 0.82 1.03 3.50 2.49 8.85 7.33 0.76
b19 4.071 s 1.96 1.06 3.65 2.37 27.55 7.81 0.73
v17 1.171 s 0.25 0.65 2.97 2.87 2.53 7.83 0.80
v18 2.188 s 0.46 0.69 3.29 2.76 7.91 8.65 0.79
v19 4.416 s 1.05 0.69 3.42 2.68 24.77 11.04 0.77
y500_4k 3.678 s 0.28 1.20 5.10 2.93 40.50 4.46 1.65
y500_8k 13.305 s 1.19 1.37 5.35 2.97 62.86 3.66 1.67
y1k_4k 4.112 s 0.34 1.12 5.43 3.18 47.98 4.52 1.73
y1k_8k 14.084 s 1.41 1.33 5.66 3.12 67.51 3.78 1.75
u1k_50k 5.250 s 0.35 2.09 6.20 4.03 3.92 2.64 0.44
u1k_125k 12.557 s 0.36 2.08 6.12 4.14 3.81 2.25 0.38
u1k_250k 24.594 s 0.37 2.05 6.11 4.41 3.81 2.14 0.47
u2k_200k 40.987 s 0.37 2.36 6.84 4.08 125.27 2.08 0.31
u2k_500k 98.825 s 0.40 2.32 6.67 4.31 143.63 2.02 0.37
u2k_1m 196.720 s 0.42 2.22 6.42 4.58 125.14 1.89 0.48
a1k_50k 1.778 s 0.42 1.07 5.29 3.74 3.87 3.73 0.60
a1k_125k 4.273 s 0.41 1.42 5.45 3.83 4.00 2.82 0.53
a1k_250k 8.409 s 0.41 1.67 5.50 4.11 4.05 2.51 0.56
a2k_200k 13.021 s 0.48 1.25 6.40 4.36 4.15 2.47 0.42
a2k_500k 32.553 s 0.50 1.64 6.25 4.80 4.12 2.27 0.49
a2k_1m 65.792 s 0.51 1.98 6.09 5.15 4.12 2.16 0.55
w1k_1k 2.210 s 0.12 1.17 1.71 1.31 3.20 4.80 0.43
x10k 23.630 s 6.80 0.36 9.70 7.01 243.93 4.30 0.87

Visualisation

To view plots of the execution time of the transitive closure plotted against the cost for different systems, click one of the systems. The specific system pages have the same graphs.

Visualisation is done using Plotly

Problem: Same Generation (sgff)

The SGFF problem is generating the "cousin" relationship of a directed graph. Two nodes are in the same generation if both have direct predecessors that are in the same generation. The resulting relationship is an equivalence relationship. The output is potentially quadratic in the input size.

We use the same input data as for the Same Generation Problem.

sg(Y,Y) :- par(_, Y).
sg(X,Y) :- par(X,A), sg(A,B), par(Y,B).

The instances are generated using our graphgen tool.

SQL Formulation

WITH RECURSIVE sg(a,b) AS (
    SELECT par.b, par.b from par
    UNION
    SELECT p1.a, p2.a from sg JOIN par p1 ON p1.b = sg.a JOIN par p2 ON p2.b = sg.b
) SELECT Count(*) FROM sg;

with the table definition

CREATE TEMP TABLE par (
  a INT NOT NULL, b INT NOT NULL,
  CONSTRAINT par_pk PRIMARY KEY (a, b)
) WITHOUT ROWID;
CREATE INDEX par_fb ON par (b);

Measurements

For XSB we show the absolute time and for all other systems the relative time as a factor compared with XSB (e.g. the value 0.5 means that the system is twice as fast as XSB).

Graph XSB (s) BAM YAP SQLite PostgreSQL MariaDB Soufflé
k50 0.747 s 0.53 0.78 2.58 2.46 4.03 0.43
k100 10.450 s 0.43 0.61 2.58 2.12 4.16 0.25
k500 9277.250 s 0.31 0.41 2.31 1.79 3.51 0.14
t50 0.247 s 0.51 0.65 2.44 2.62 4.04 0.48
t100 2.137 s 0.61 1.18 3.12 2.70 4.97 0.43
t500 1478.280 s 0.50 0.65 3.58 2.78   0.22
c1k 0.075 s 0.03 0.13 0.33 1.87 0.73 0.25
c2k 0.088 s 0.00 0.17 0.41 1.59 0.80 0.20
c3k 0.098 s 0.00 0.20 0.48 1.41 0.82 0.17
c4k 0.108 s 0.00 0.19 0.28 1.46 0.92 0.18
s1k_1 1.233 s 0.32 0.86 3.64 2.83 3.93 1.35
s1k_3 2.518 s 0.52 0.87 3.84 2.68 4.80 1.15
s1k_4 3.363 s 0.44 0.88 4.02 2.74 5.09 1.12
p1k 0.065 s 0.00 0.00 0.54 2.15 0.83 0.31
p2k 0.093 s 0.00 0.11 0.55 1.63 0.76 0.19
p3k 0.103 s 0.00 0.15 0.68 1.50 0.80 0.18
p4k 0.105 s 0.00 0.19 0.86 1.56 0.95 0.18
m4_2ki 0.125 s 0.00 0.32 1.05 1.36 1.04 0.24
m16_512 0.133 s 0.00 0.38 1.16 1.34 1.14 0.23
m64_128 0.133 s 0.00 0.45 1.22 1.35 1.20 0.23
m256_32 0.133 s 0.00 0.49 1.25 1.36 1.20 0.23
m1ki_8 0.133 s 0.00 0.45 1.22 1.35 1.20 0.23
m4ki_2 0.133 s 0.00 0.45 1.23 1.36 1.21 0.23
b17 0.605 s 0.14 1.23 2.20 1.31 4.51 0.34
b18 1.088 s 0.11 1.14 2.17 1.20 4.03 0.28
v10 0.283 s 0.22 0.30 2.95 2.98 3.37 0.72
v11 0.853 s 0.25 0.26 3.06 3.25 2.44 0.61
v12 2.296 s 0.42 0.30 4.24 4.87 21.80 0.69
y500_4k 0.218 s 0.30 0.16 2.82 3.06 1.97 0.56
y500_8k 0.213 s 0.27 0.23 3.11 3.29 2.27 0.61

Visualisation

To view plots of the execution time of the transitive closure plotted against the cost for different systems, click one of the systems. The specific system pages have the same graphs.

Visualisation is done using Plotly

Problem: JOIN1

The JOIN1 problem is a join of five relations where every join is its own auxiliary predicate to facilitate (early) duplicate elimination. The benchmark query is a(X,Y).

a(X, Y)  :- t2(X, Z), t3(Z, Y).
t3(X, Y) :- c4(X, Z), c5(Z, Y).
t2(X, Y) :- t1(X, Z), c3(Z, Y).
t1(X, Y) :- c1(X, Z), c2(Z, Y).

A flat query without explicit duplicate elimination possibilities would be

a(X, Y)  :- c1(X, A), c2(A, B), c3(B, C), c4(C, D), c5(D, Y).

This query is also a part of the OpenRuleBench suite. OpenRuleBench also evaluates other queries. We will do that in the future as well.

SQL Formulation

WITH
    t1 AS
    (
        SELECT DISTINCT c1.a, c2.b
        FROM c1, c2
        WHERE c1.b = c2.a
    ),
    t2 AS
    (
        SELECT DISTINCT t1.a, c3.b
        FROM t1, c3
        WHERE t1.b = c3.a
    ),
    t3 AS
    (
        SELECT DISTINCT c4.a, c5.b
        FROM c4, c5
        WHERE c4.b = c5.a
    ),
    a AS
    (
        SELECT DISTINCT t2.a, t3.b
        FROM t2, t3
        WHERE t2.b = t3.a
    )
SELECT COUNT(*) FROM a;

Measurements

For XSB we show the absolute time and for all other systems the relative time as a factor compared with XSB (e.g. the value 0.5 means that the system is twice as fast as XSB).

Input XSB (s) BAM YAP SQLite PostgreSQL MariaDB Soufflé
j1k10k 6.690 s 0.16 0.53 3.49 4.06 5.13 0.92
j1k50k 96.725 s 0.19 0.58 3.41 3.88 6.37 0.78
j1k250k 154.330 s 0.17 0.54 3.27 3.38 42.03 0.68

Authors

This page was last updated on 2019-09-26.