Data cleansing base on subgraph comparison

Data cleansing base on subgraph comparison

Huang Li 1,2

COMPUTER MODELLING & NEW TECHNOLOGIES 2014 18(1) 52-60

1 College of Computer Science and Technology, Wuhan University of Science and Technology
2 Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System


With the quick development of the semantic web technology, RDF data explosion has become a challenging problem. Since RDF data are always from different resources, which may have overlap with each other, they could have duplicates. These duplicates may cause ambiguity and even error in reasoning. However, attentions are seldom paid to this problem. In this paper, we study the problem and give a solution, named K-radius sub graph comparison (KSC). The proposed method is based on RDF-Hierarchical Graph Model. KSC combines similar and comparison of ‘context’ to detect duplicate in RDF data. Experiments on publication datasets show that the proposed method is efficient in duplicate detection of RDF data. And KSC is simpler and less time-costs than other methods of graph comparison.