Data Mining Behavioral Approach To Reduce The Data Set For Debugging
Software companies spend most of cost in dealing for software bugs. Software bugs are unavoidable and fixing
bugs is an expensive task. Automatic bug triage is applied by using text classification techniques to reduce the time cost in
manual work. In this paper, we label the issue of data reduction for bug triage, i.e., to reduce the scale of bug report and
improve its quality. To reduce the large data on bug dimension and the word dimension, we simultaneously combine
instance selection and feature selection techniques. For applying instance selection and feature selection, we take out the
attributes from historical bug data set and then for this new data set we build a predicative model that is to determine the data
reduction orders for bug triage. We examine the performance of data reduction on bug reports of two large open source
projects, such as Eclipse and Mozilla. The results of the data reduction techniques show that, the data scale will reduce
effectively and accuracy of bug triage is improved.
Index Terms— Bug triage, bug data reduction, instance selection, feature selection, software repositories.