Internet-scale Real-time Code Clone Search via Multi-level Indexing

PDF

Authors
  1. Keivanloo, I.
  2. Rilling, J.
  3. Charland, P.
Corporate Authors
Defence R&D Canada - Valcartier, Valcartier QUE (CAN);Concordia Univ, Montreal Que (CAN) Dept of Computer Science and Software Engineering
Abstract
Finding lines of code similar to a code fragment across large knowledge bases in fractions of a second is a new branch of code clone research also known as real-time code clone search. Among the requirements real-time code clone search has to meet are scalability, short response time, scalable incremental corpus updates, and support for type-1, type-2, and type-3 clones. We conducted a set of empirical studies on a large open source code corpus to gain insight about its characteristics. We used these results to design and optimize a multi-level indexing approach using hash table-based and binary search to improve Internet-scale real-time code clone search response time. Finally, we performed an evaluation on an Internet-scale corpus (1.5 million Java files and 266 MLOC). Our approach maintains a response time for 99.9% of clone searches in the microseconds range, while supporting the aforementioned requirements.
Report Number
DRDC-VALCARTIER-SL-2011-453 — Scientific Literature
Date of publication
01 Oct 2011
Number of Pages
5
DSTKIM No
CA036824
CANDIS No
536494
Format(s):
Electronic Document(PDF)

Permanent link

Document 1 of 1

Date modified: