Reasoning about Global Clones – Scalable Semantic Clone Detection


  1. Schugerl, P.
  2. Rilling, J.
  3. Charland, P.
Corporate Authors
Defence R&D Canada - Valcartier, Valcartier QUE (CAN);Concordia Univ, Montreal Que (CAN) Dept of Computer Science and Software Engineering
The Semantic Web is slowly transforming the Web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publicly accessible repositories, the introduction of massive horizontal scaling frameworks, and cloud computing infrastructures, a new era of software mining across information silos is reshaping the software engineering landscape. Given these technological advances, analyzing code at a global scale, across systems, projects and organizational boundaries, becomes feasible. In this paper, we introduce a clone detection algorithm and its implementation that can scale to such large global datasets, by modeling clones using description logic and applying a horizontal scaling Semantic Web reasoner. We demonstrate how our simple feature vector that only uses control statements, data types and method calls, can yield results similar to other popular clone detection tools. Our approach does not only allow us to reliably identify clones in a global context. By using a semantic reasoner, it also allows us to expand clone detection to a new class of semantic clones. We have compared our algorithm to some of the leading clone detection tools (DECKARD, CCFinder, JCD, and Simian) in order to validate our approach and show the diffe
Report Number
DRDC-VALCARTIER-SL-2011-454 — Scientific Literature
Date of publication
01 Jul 2011
Number of Pages
Electronic Document(PDF)

Permanent link

Document 1 of 1

Date modified: