2019-121

2019-121

Deep Learning to Detect Software Defects

MICHAEL D. NAPLES, WILLIAM H. JACOBS, ALEXANDER V. BOYLE, DEEP DESAI, and VINCENT J. PEDATA

Modern bug detection algorithms rely on hand written rules to ensure source code follows a certain format. As software engineering progresses and source code becomes more complicated the number of different bugs and their complexity increases. Building algorithms by hand to detect more subtle bugs is almost impossible. To build a more robust bug detector this project takes a natural language processing approach to solving this problem. Using deep learning we create word embeddings on source code and use a recurrent neural network to distinguish between buggy and non-buggy code. To frame bug detection as a machine learning task large amounts of source code, both buggy and non-buggy are required. Our program can download a list of GitHub repositories, extract the source code and we use simple code transformations that take clean code and alter it in some way to make it buggy. This allows us to collect an arbitrarily large amount of labeled buggy and non-buggy code. This program is far from being a commercial product but as a proof of concept we achieved an accuracy of 85% predicting swapped parameters in previously unseen code.