Open Journal Systems

Analyze IMDb movies by sentiment and topic analysis

Ningjing Ouyang

Article ID: 1958
Vol 8, Issue 3, 2023, Article identifier:

VIEWS - 428 (Abstract) 156 (PDF)

Abstract

Movie is an important cultural form, carrying multiple levels and meanings such as art, entertainment and social value. Movie review and rating data sets are huge, and deep learning and natural language processing methods are widely used today. Advances in big data and deep learning offer unprecedented opportunities to understand moviegoer behavior and preferences while providing a cost-effective way to gain insights relevant to the entertainment industry. This project conducts sentiment analysis, topic modeling, and visual statistical analysis based on the IMDb movie data set to identify key factors and deeper insights that influence successful decision-making in film production. This project first uses the word embedding method to vectorize the movie review text, and then uses Bidirectional Long Short-Term Memory (Bi-LSTM) to perform sentiment classification. In addition, statistical methods such as visualization were used to discover conclusions such as the highest average number of movies released in November, and identify trends, patterns and relationships between the variables of IMDb movies. Finally, the Latent Dirichlet Allocation (LDA) topic modeling model was constructed to find out that the important topic with increased demand is light entertainment movies, highlighting the commercial feasibility of comedy movies as a profitable business model. In summary, this project uses an emotion-topic fusion analysis method based on the Bi-LSTM emotion classification method and the LDA topic modeling method. The results show that the Bi-LSTM model can better identify positive and negative emotions in movie reviews, and the LDA topic model performs well in mining popular topics.


Keywords

movie; nature language processing; sentiment analysis; topic analysis; Bi-LSTM; LDA

Full Text:

PDF



References

1. Zhang Y, Zhang L. Movie recommendation algorithm based on sentiment analysis and LDA. Procedia Computer Science 2022; 199: 871–878. doi: 10.1016/j.procs.2022.01.109

2. Bhuvaneshwari P, Rao AN, Robinson YH, Thippeswamy MN. Sentiment analysis for user reviews using Bi-LSTM self-attention based CNN model. Multimedia Tools and Applications 2022; 81(9): 12405–12419. doi: 10.1007/s11042-022-12410-4

3. Topal K, Ozsoyoglu G. Movie review analysis: Emotion analysis of IMDb movie reviews. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); 18–21 August 2016; San Francisco, CA, USA. pp. 1170–1176.

4. Sharma R, Morwal S, Agarwal B. Named entity recognition using neural language model and CRF for Hindi language. Computer Speech & Language 2022; 74: 101356. doi: 10.1016/j.csl.2022.101356

5. Trivedi SK, Dey S, Kumar A. Capturing user sentiments for online Indian movie reviews: A comparative analysis of different machine-learning models. The Electronic Library 2018; 36(4): 677–695. doi: 10.1108/EL-04-2017-0075

6. Kanani S, Patel S, Gupta RK, et al. An AI-enabled ensemble method for rainfall forecasting using long-short term memory. Mathematical Biosciences and Engineering 2023; 20(5): 8975–9002. doi: 10.3934/mbe.2023394

7. Rehman AU, Malik AK, Raza B, Ali W. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimedia Tools and Applications 2019; 78: 26597–26613. doi: 10.1007/s11042-019-07788-7

8. Hourrane O, Idrissi N, Benlahmar EH. Sentiment classification on movie reviews and twitter: An experimental study of supervised learning models. In: Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD); 3–4 October 2019; Rabat, Morocco. pp. 1–6.

9. Shaukat Z, Zulfiqar AA, Xiao C, et al. Sentiment analysis on IMDB using lexicon and neural networks. SN Applied Sciences 2020; 2(2): 1–10. doi: 10.1007/s42452-019-1926-x

10. Arora E, Mishra S, Kumar KV, Upadhyay P. Extending bidirectional language model for enhancing the performance of sentiment analysis. In: Gunjan V, Senatore S, Kumar A (editors). Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies. Springer; pp. 133–141.

11. Chirgaiya S, Sukheja D, Shrivastava N, Rawat R. Analysis of sentiment based movie reviews using machine learning techniques. Journal of Intelligent & Fuzzy Systems 2021; 41(5): 5449–5456. doi: 10.3233/JIFS-189866

12. Acikalin UU, Bardak B, Kutlu M. Turkish sentiment analysis using bert. In: Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU); 5–7 October 2020; Gaziantep, Turkey. pp. 1–4.

13. Wu J, Ye C, Zhou H. BERT for sentiment classification in software engineering. In: Proceedings of the 2021 International Conference on Service Science (ICSS); 14–16 May 2021; Xi’an, China. pp. 115–121.

14. Kaushik K, Parmar M. Sentiment analysis based on movie reviews using various classification techniques: A review. International Journal of Scientific Research in Computer Science Engineering and Information Technology 2021; 7(3): 197–208. doi: 10.32628/CSEIT217329

15. Hakim AA, Erwin A, Eng KI, et al. Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In: Proceedings of the 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE); 7–8 October 2014; Yogyakarta, Indonesia. pp. 1–4.

16. Yang Q. LDA-based topic mining research on China’s government data governance policy. Social Security and Administration Management 2022; 3(2): 33–42. doi: 10.23977/socsam.2022.030205

17. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research 2003; 3: 993–1022.

18. Newman D, Lau JH, Grieser K, Baldwin T. Automatic evaluation of topic coherence: Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics; 2–4 June 2010; Los Angeles, California, USA. pp. 100–108.

19. Musat CC, Velcin J, Trausan-Matu S, Rizoiu MA. Improving topic evaluation using conceptual knowledge. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI); 16–22 July 2011; Barcelona, Catalonia, Spain. pp. 1866–1871.

20. Baroni M. Composition in distributional semantics. Language and Linguistics Compass 2013; 7(10): 511–522. doi: 10.1111/lnc3.12050

21. Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining; 2–6 February 2015; Shanghai, China. pp. 399–408.

22. IMDB Movie Reviews with ratings. Available online: https://www.kaggle.com/datasets/nisargchodavadiya/imdb-movie-reviews-with-ratings-50k (accessed on 25 September 2023).

23. Tan KL, Lee CP, Lim KM. Roberta-Gru: A hybrid deep learning model for enhanced sentiment analysis. Applied Sciences 2023; 13(6): 3915. doi: 10.3390/app13063915

24. Ding R, Nallapati R, Xiang B. Coherence-aware neural topic modeling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; October–November 2018; Brussels, Belgium. pp. 830–836.


DOI: https://doi.org/10.54517/esp.v8i3.1958
(428 Abstract Views, 156 PDF Downloads)

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Ningjing Ouyang

License URL: https://creativecommons.org/licenses/by/4.0/