August, 2020

Computational Intelligence Lab

Impact of Lexical Normalization on Twitter Sentiment Analysis

In this paper we show the impact of lexical normalization on the performance of different models for the task of Twitter sentiment analysis. We investigated BERT and ALBERT models of various sizes and performed lexical normalization using MoNoise in the default as well as the bad-speller mode. Our findings suggest that the impact of lexical normalization depends on the model architecture as well as the model size and that performing lexical normalization can also hurt performance. It is therefore not possible to give a final recommendation on whether it is advisable to perform lexical normalization prior to performing further data analysis.

The source code of our implementation can be found here and a written report can be found here.

Authors:

    Jannik Gut
  • Github

  • Robin Burkhard
  • Github

  • Bernhard Walser
  • Github

  • Manuel Meinen
  • Github
  • Supervisors:

    Prof. Dr. Thomas Hofmann