Kam1n0 – MapReduce-based Assembly Clone Search for Reverse Engineering


  1. Ding, S.H.
  2. Fung, B.C.M.
  3. Charland, P.
Corporate Authors
Defence Research and Development Canada, Valcartier Research Centre, Quebec QC (CAN);McGill Univ, Montreal QC (CAN) School of Information Studies
Assembly code analysis is one of the critical processes for detecting and proving software plagiarism and software patent infringements when the source code is unavailable. It is also a common practice to discover exploits and vulnerabilities in existing software. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. An effective and efficient assembly code clone search engine can greatly reduce the effort of this process, since it can identify the cloned parts that have been previously analyzed. The assembly code clone search problem belongs to the field of software engineering. However, it strongly depends on practical nearest neighbor search techniques in data mining and databases. By closely collaborating with reverse engineers and Defence Research and Development Canada (DRDC), we study the concerns and challenges that make existing assembly code clone approaches not practically applicable from the perspective of data mining. We propose a new variant of LSH scheme and incorporate it with graph matching to address these challenges. We implement an integrated assembly clone search engine called Kam1n0. It is the first clone search engine that can efficiently identify the given query assembly function’s subgraph clones from a large assembly code repository. Kam1n0 is built upon the Apache Spark computation framework and Cassandra-like key-value distributed storage. A deployed demo system is publicly available. Extensive exp
Software reverse engineering;malware analysis;assembly code;code clone search;MapReduce
Report Number
DRDC-RDDC-2016-P072 — External Literature
Date of publication
12 Oct 2016
Number of Pages
Electronic Document(PDF)

Permanent link

Document 1 of 1

Date modified: