Electronic International Standard Serial Number (EISSN)
Modern companies generate value by digitalizing their services and products. Knowing what customers are saying about the firm through reviews in social media content constitutes a key factor to succeed in the big data era. However, social media data analysis is a complex discipline due to the subjectivity in text review and the additional features in raw data. Some frameworks proposed in the existing literature involve many steps that thereby increase their complexity. A two-stage framework to tackle this problem is proposed: the first stage is focused on data preparation and finding an optimal machine learning model for this data; the second stage relies on established layers of big data architectures focused on getting an outcome of data by taking most of the machine learning model of stage one. Thus, a first stage is proposed to analyze big and small datasets in a non-big data environment, whereas the second stage analyzes big datasets by applying the first stage machine learning model of. Then, a study case is presented for the first stage of the framework to analyze reviews of hotel-related businesses. Several machine learning algorithms were trained for two, three and five classes, with the best results being found for binary classification.
big data framework; machine learning model; social media analytics; hospitality; yelp; business intelligence; sentiment analysis; analytics; classification; future