Predicting emotional states using behavioral markers derived from passively sensed data: Data-driven machine learning approach Articles uri icon

publication date

  • March 2021

start page

  • e24465


  • 3


  • 9

International Standard Serial Number (ISSN)

  • 2291-5222


  • Background: Mental health disorders affect multiple aspects of patients' lives, including mood, cognition, and behavior. eHealth and mobile health (mHealth) technologies enable rich sets of information to be collected noninvasively, representing a promising opportunity to construct behavioral markers of mental health. Combining such data with self-reported information about psychological symptoms may provide a more comprehensive and contextualized view of a patient's mental state than questionnaire data alone. However, mobile sensed data are usually noisy and incomplete, with significant amounts of missing observations. Therefore, recognizing the clinical potential of mHealth tools depends critically on developing methods to cope with such data issues. Objective: This study aims to present a machine learning-based approach for emotional state prediction that uses passively collected data from mobile phones and wearable devices and self-reported emotions. The proposed methods must cope with high-dimensional and heterogeneous time-series data with a large percentage of missing observations. Methods: Passively sensed behavior and self-reported emotional state data from a cohort of 943 individuals (outpatients recruited from community clinics) were available for analysis. All patients had at least 30 days' worth of naturally occurring behavior observations, including information about physical activity, geolocation, sleep, and smartphone app use. These regularly sampled but frequently missing and heterogeneous time series were analyzed with the following probabilistic latent variable models for data averaging and feature extraction: mixture model (MM) and hidden Markov model (HMM). The extracted features were then combined with a classifier to predict emotional state. A variety of classical machine learning methods and recurrent neural networks were compared. Finally, a personalized Bayesian model was proposed to improve performance by considering the individual differences in the data and applying a different classifier bias term for each patient. Results: Probabilistic generative models proved to be good preprocessing and feature extractor tools for data with large percentages of missing observations. Models that took into account the posterior probabilities of the MM and HMM latent states outperformed those that did not by more than 20%, suggesting that the underlying behavioral patterns identified were meaningful for individuals' overall emotional state. The best performing generalized models achieved a 0.81 area under the curve of the receiver operating characteristic and 0.71 area under the precision-recall curve when predicting self-reported emotional valence from behavior in held-out test data. Moreover, the proposed personalized models demonstrated that accounting for individual differences through a simple hierarchical model can substantially improve emotional state prediction performance without relying on previous days' data. Conclusions: These findings demonstrate the feasibility of designing machine learning models for predicting emotional states from mobile sensing data capable of dealing with heterogeneous data with large numbers of missing observations. Such models may represent valuable tools for clinicians to monitor patients' mood states.


  • Telecommunications


  • affect; bayesian analysis; digital phenotype; machine learning; mental health; mobile health; mobile phone; personalized models; probabilistic models