Sentiment Classification on Mandalika MotoGP Event Using K-Means Clustering and Random Forest

Authors

  • Khairul Fadhli Margolang Magister of Computer Science, Potensi Utama University
  • Muhammad Zarlis Information Systems Management Department, Universitas Bina Nusantara
  • Hartono Hartono Magister of Computer Science, Potensi Utama University

DOI:

https://doi.org/10.35842/icostec.v2i1.35

Keywords:

k-means clustering, mandalika, motogp, sentiment analysis, random forest

Abstract

As one the most famous world-class motorcycle racing competition, MotoGP is an event broadcast live on television with millions of viewers on each race. Indonesia, especially the Pertamina Mandalika Circuit, will hold this prestigious racing event in the 19th series of 2022. This event sparks Indonesian netizens' reactions on social media, especially on Twitter. This research aims to analyze the public sentiment and emotional value regarding this event, with the data collected from Twitter social media. With the features of sentiment and emotion values extracted from the contents of this tweet, we use K-means clustering to generate sentiment clusters as targets for the classification using the Random Forest (RF) algorithm. From the evaluation using the 5-fold and 10-fold cross-validation, we get the highest accuracy of 0.99, the highest precision of 0.990175, and the highest recall of 0.99 from the RF model with ten trees configuration. We also get the lowest accuracy, precision, and recall values of 0.96, 0.960934, and 0.96 from the RF models with 15 and 20 trees configuration, with the 10-fold evaluation

Downloads

Published

2023-02-28