Twitter News Stratification Using Random Forest
With the popularity of Social Networks, mostly news providers used to share their news in various social
networking sites and web blogs. In India, many news groups share their news on Twitter micro blogging service provider.
These data carries valuable information relevant to social research areas. Thus, the idea is to categorize the news into
different groups so the news groups in India are identified. News groups are selected on their popularity to extract the short
messages from Twitter Micro Blog. Short message extracted from Twitter was classified into 12 major groups. Machine
learning techniques were used to train the data. In order to create the instances words from each short message were consider
and bag-of-words approach was used to create feature vector. The data was trained using Random Forest machine learning
techniques. Random forest is a best ensemble learning method, which is consist of multiple decision trees built on random
inputs and separating nodes on a random subset of features. Because of its good classification and generalization ability,
random forest is preferred in various domains. Large amount of feature will be collected for current research. The
performance will speak the efficacious of the system.
Keywords: Random Forest, Web Mining, Text classification.