Neo4J

Neo4J is a popular graph database that supports CYPHER for graph queries.

To measure the resource usage for the Neo4J server systems we took a snapshot of the status information provided by at the start of the client program and when the query was finished. The difference between both snapshots is used to measure CPU time. This approach can not be used to measure memory usage. We made sure that for Neo4J only one user was connected, as we gathered the status information of the server process.

tcff

Using CYPHER's variable length paths we can formulate loading the data and executing the query as such:

//tcff.qry
MATCH (n)
DETACH DELETE n;

LOAD CSV FROM 'file://INSTANCE' AS line
MERGE (a:Node {n: toInteger(line[0])})
MERGE (b:Node {n: toInteger(trim(line[1]))})
CREATE (a)-[:PAR]->(b);

MATCH (a)-[*]->(b)
RETURN COUNT(DISTINCT [a,b]);

Launch: (sed \'s!INSTANCE!{instance}!g\' tcff.qry) | ./cypher-shell -u neo4j

Other Formulations

There are different ways to formulate this query that are faster or slower depending on the specific layout of the graph. The version above is the most general one.

Early Selection

This query uses a cross-product of all nodes and filters out the pairs that are not connected.

MATCH (a), (b)
WHERE (a)-[*]->(b)
RETURN COUNT(*);

Using ShortestPath

One can use the shortestPath method that does a breadth-first-search from both nodes to find a shortest path between both nodes. This does not work if the nodes are the same (and fails with the incorrect settings) and is slow if the nodes are not connected. Then each node has to be checked for a connection to itself.

MATCH shortestPath((a)-[*]->(b))
RETURN COUNT([a,b]) AS ret
UNION
MATCH (a)
WHERE (a)-[*]->(a)
RETURN COUNT(*) AS ret;

This query depends on the following settings: cypher.forbid_shortestpath_common_nodes=false and cypher.forbid_exhaustive_shortestpath=true.

sgff

Using properties of selected paths CYPHER allows us to formulate the same generation problem (two nodes are in the same generation if they are the same distance from a common ancestor).

MATCH (n)
DETACH DELETE n;

LOAD CSV FROM 'file://INSTANCE' AS line
MERGE (a:Node {n: toInteger(line[0])})
MERGE (b:Node {n: toInteger(trim(line[1]))})
CREATE (a)-[:PAR]->(b);

MATCH p1 =(a)-[*0..]->(c)
MATCH p2 =(b)-[*0..]->(c)
WHERE length(p1) = length(p2)
RETURN COUNT(DISTINCT [a,b]);

Since we match pairs of path of unknown (possibly infinite) length and compare them by that property, there is no clear reason why this query should terminate. As far as we know, the only reason this query terminates is because paths are only selected up to length 2^63-1 (max_long), which means that this query does not work for large enough graphs.