Hashing and Canonicalizing Notation 3 Graphs Articles uri icon

publication date

  • November 2010

start page

  • 663

end page

  • 685

issue

  • 7

volume

  • 76

international standard serial number (ISSN)

  • 0022-0000

electronic international standard serial number (EISSN)

  • 1090-2724

abstract

  • This paper presents a hash and a canonicalization algorithm for Notation 3 (N3) and Resource Description Framework (RDF) graphs. The hash algorithm produces, given a graph, a hash value such that the same value
    would be obtained from any other equivalent graph. Contrary to previous
    related work, it is well-suited for graphs with blank nodes, variables
    and subgraphs. The canonicalization algorithm outputs a canonical
    serialization of a given graph (i.e. a canonical representative of the
    set of all the graphs that are equivalent to it). Potential applications
    of these algorithms include, among others, checking graphs for
    identity, computing differences between graphs and graph
    synchronization. The former could be especially useful for crawlers that
    gather RDF/N3 data from the Web, to avoid processing several times
    graphs that are equivalent. Both algorithms have been evaluated on a big
    dataset, with more than 29 million triples and several millions of
    subgraphs and variables.