Shuffling and Randomization for Scalable Source Code Clone Detection

PDF

Authors
  1. Keivanloo, I.
  2. Roy, C.K.
  3. Rilling, J.
  4. Charland, P.
Corporate Authors
Defence R&D Canada - Valcartier, Valcartier QUE (CAN);Concordia Univ, Montreal QUE (CAN) Dept of Computer Science;Saskatchewan Univ, Saskatoon SASK (CAN)
Abstract
In this research, we present a novel approach that allows existing state of the art clone detection tools to scale to very large datasets. A key benefit of our approach is that the improved tools scalability is achieved using standard hardware and without modifying the original tools implementation. We use a hybrid approach comprising of shuffling, repetition, and random subset generation of the subject code. As part of the experimental evaluation, we applied our shuffling and randomization approach on two state of the art clone detection tools. Our approach allowed these tools to scale to a very large dataset, SeCold.org, using standard hardware and without significantly affecting the overall recall of these tools.
Report Number
DRDC-VALCARTIER-SL-2012-261 — Scientific Literature
Date of publication
01 Jun 2012
Number of Pages
2
DSTKIM No
CA036813
CANDIS No
536477
Format(s):
Electronic Document(PDF)

Permanent link

Document 1 of 1

Date modified: