Paper Title
Quick Recognization of Confidential Data Leakage on Transformed E-Mail

In an Organization, the data that is classified is transformed without authorization from the computer or data center to the outside universe refers to Data leakage. So this data leakage creates a major risk to the Organization infrastructure and IT services. Organizations need to quickly spot the sensitive data vulnerability by monitoring the content during transformation and need to check the accessing privilege of both sender and recipient in addition to preserving the privacy. Even, Preventing from Data Extrusion (sensitive data piracy) is challenging, due to hiding of sensitive data pattern within the content that was transformed. In Existing, sequence alignment algorithm paired with trigram (n=3) sequence sampling concentrates only on inadvertent data leak and failed to detect other types of planned attacks. So provides only 80% of detection. Also when identifying the sensitive pattern in shuffled content, it affects not only the alignment precision but also the accuracy in detection. Hence inefficient in recognizing the sensitive data pattern and partial data leakage in the longest transmitted sequence. It’s also failed to preserve both the privateness and the confidentiality. To bring this under control, this paper suggests DLD checker and detector for effectively and very efficiently recognize the sensible data pattern in longest sequence and also to recognize the inaccurate or partial data leaks by comparing the modified sequence with the stored sensitive data pattern. This paper also includes the Lucene framework for quickly reveal the matched sensible word from the transformed data using indexing technique. It does perform the detection of any category of data theft, by this, it achieves confidentiality and by preserving specific individual information ensuring privacy was achieved. This system also achieves 100% of detection along with accuracy in detecting the shuffled data pattern in transformed leaks with improved server scalability and throughput using n=4 gram sequence sampling and Levenshtein-Distance algorithm. Keywords— Data Leak, Detection Accuracy, DLD, Efficient, Privacy, Sampling, Scalability, Sensitive Data, Sequence Alignment.