Paper Title
Identification of Marathi and Sanskrit Compound and Non-Compound Word using Genetic Algorithm
Abstract
Text based language recognition is the task of recognizing a language from a given text of document
automatically. It is complicated to distinguish languages within language families than other families. In this paper, the
performance of statistical measures has been investigated to determine the text-based language identification system with
prominence on five languages used in India based on Devanagari script –Marathi, Hindi, Sanskrit, Bhojpuriand Nepali. ngrams
is used as feature for classification in the proposed system. Language Identification is a main pre-processing step in
several tasks of Natural Language Processing (NLP). There is wide scope in a multilingual society like India for automatic
language identification since it would be a fundamental step in bridging the digital segregate between the Indian masses and
the world.
Index Terms- Devanagari Script, Multilingual Computing Wiener filter, Curvelet transform, Genetic algorithm