A Study of Data Mining Techniques and Benefits With Respect of Data Science Technology
Abstract - Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions. Enormous data is vacant to various databases as well as new datasets are generated with the implementation of emerging technologies based equipment. These datasets have vital hidden information to explore for meaningful purpose and decision making in training policies and management of critical issues particularly in the field of medical, security etc. To achieve the target of possibly higher level of accuracy the study, selection and implementation of the existing techniques.The objective of the study is to find out the robustness of the Data mining techniques that are evaluated based on Sampling Method (SM) such as Multi - Fold Cross Validation and Performance Evaluation Parameters (PEP) such as accuracy, sensitivity and specificity. To analyse the result using the PEP, the Data mining classification techniques – NB, KNN and SVM are applied to Cardiovascular and Cardiomyopathy datasets that relates to the heart disease and Breast Cancer Dataset relates to the classify the return cancer after treatment which cannot be detected. These Datasets are considered as three different cases. These techniques are implemented using python script in open source software environment Orange Canvas. The results of Mk-NN indicated comparatively more robust accuracy in Multi-Fold Cross Validation. The comparative analysis outcomes depicts that some limitations occurs in the specific techniques which may vary depending on the heterogeneous and voluminous nature of dataset as well as the applied Classification Performance Evaluation Sampling Method (CPESM).
Keywords - Data Mining Techniques, Classification Techniques, Performance Evaluation Parameters (PEP), Data Mining Tool, Data Science and Data Visualization.