Spanish Text Classification with Bert
Abstract - This paper presents a training process of Spanish news content. The research is conducted as part of the Spanish e-learning support project, which aims to promote the study of Spanish linguistics and help the development of Spanish teaching. The BERT model is the most popular pre-trained model used in natural language processing. We start with the training of the BERT classifier and analysis of the performance using different datasets. We first apply the EDA augmentation method to the Spanish text and improve classification accuracy.We have achieved over 90% accuracy on the news topic classification tasks with two datasets. Not only that, we conducted further research on the performance differences with varied training setups and discovered the intra-domain migration problem of the model during training.The result reflects the impact of dataset difference on model presentation, which shows the issue of model migration within the domain.
Keywords - Natural Language Processing, Data Augmentation, Visualization, Intra-Domain Migration.