Every single time he says it, I want to set it to music. “Left shifting,” the Head of Department says. But I hear “Left Shifter.” And, of course, there’s Shirley Bassey in my head. “Left shifter…
Anxiety disorders are present in an important part of the population, the Diagnostic and Statistical Manual of Mental Disorders in its 5th Edition, known as DSM-5, establish a classification of ten different kinds of anxiety based in the symptoms that make people have problems in the normal development of their activities. Generalized anxiety disorder (GAD) is one of these disorders, characterized by the presence of excessive worries and alertness over different issues; people with GAD are constantly anticipating some kind of disasters about health, family, work, studies, etc., this condition could affect even children. The affected person has not the control of her thoughts, and as consequence, her quality of life is seriously depleted.
Data Collection
We have evaluated how to gather a text corpus that enables the train an IA model; we shuffle a set of possibilities, as to make use of social media content tagged as anxiety or to search previous works around the thematic. This last was the selected strategy, due to the scientific backing that offers the work made by the University of Southern California, who developed The Distress Analysis Interview Corpus (DAIC).
This corpus contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post-traumatic stress disorder, the compiled database contains audios, transcriptions and facial features of interviews. The corpus of this database is being shared on a case-by-case basis by request to DAIC-WOZ and for research purposes.
Data description
The database contains, for each participant, a set of files that includes among other resources, the transcription of the interviews, there are two speakers, the interviewer, and the participant.
The model that we have proposed consists of a classifier that determine the level of anxiety present in a phrase, for that reason we have to preprocess the text. Table 1 shows a data frame generated as from the transcripts, we read the file and split the text into two columns, according to the occurrence of each kind of speaker, in question column we have the text from the interviewer, and in answer column, the text of the participant.
The most of frames in transcriptions have a length lesser than 100 characters as shown in Image 2 Length of frames.
Feature generation
We experimented with various strategies to generate features as the input of the model as from the transcriptions of interviews. Finally, we decided to apply word embeddings technique over windowed sequences to train the model, with vector representations for the meaning of each word.
In that sense, we had to define the dimension of the embedding input. After several tries, the better results were achieved concatenating all phrases from participants, and later, splitting that text in sliders windows with a length of 10 words. Using Tokenizer from Keras, we got a dataset of integer representations for each window of 10 words, as showed in Image 4.
The tokenizer generated from the transcriptions of answers has 7373 words of a total of 7633. There were 260 words that were excluded because they are misspelled, some of them are diffferent, swapmeets, among others.
As we could see in Image 3, the mild category has a clear higher quantity of instances, in our first models, this condition caused that they have the tendency to predict always mild class. To control this, we joined the three datasets of train, validation, and test and then, reduced the instances in order to keep a balanced number of instances for each anxiety level.
Experimentation
We developed some variants of recurrent neural networks with word embeddings. In our first experiments, we used as input large padded sequences like 1000 words, these models didn’t produce good results. After an analysis of input data, we could find out, that we were introducing to the models a big quantity of padding values in order to fit the length of sequences, consequently, models could not learn anything from the input layer.
We achieved better results once we started to use smaller sequences of tokenized words as input, the bests, with windowed sequences with a length of 10 words, with a displacement of 1 word, without any padding value.
Table 3 shows the layers and parameters of the two models.
We trained the models with 100 epochs and an early stopping callback function with the patience of 3 over the validation loss. Image 6 Accuracy history compares the accuracy history of the models. The accuracy with GloVe representation presents the best evolution of accuracy values on validation data.
Results
With the trained models, we evaluated them against the test datasets, the accuracy of both models is around of 73%, we considerate that these values are acceptable for this initial experiment; Table 4 shows the achieved values of loss and accuracy for both models.
Image 7 and Image 8 and show the confusion matrix resulting over the test data with GloVe and Word2Vec model respectively.
We submitted to the model some phrases to test it, the next are some of the obtained results.
Conclusions and future work
The results of our models confirm the hypothesis about the capacity of an IA model to predict the level of anxiety in a textual phrase.
With the ability to classify the expressions of a person, we could be able to implement an application that diagnostic the presence of a generalized anxiety disorder and supports the virtual care of people with this condition.
References
· Kroenke, K., Spitzer, R. L., Williams, J. B., & Löwe, B. (2010). The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: A systematic review. General Hospital Psychiatry, 32(4), 345–359. doi:10.1016/j.genhosppsych.2010.03.006
· British Psychological Society(2013). Social anxiety disorder: recognition, assessment and treatment
· Seinfeld, S., Bergstrom, I., Pomes, A., Arroyo-Palacios, J., Vico, F., Slater, M., & Sanchez-Vives, M. V. (2016). Influence of Music on Anxiety Induced by Fear of Heights in Virtual Reality. Frontiers in psychology, 6, 1969. doi:10.3389/fpsyg.2015.01969
· Shen, J.H. (2017). Detecting anxiety on Reddit.
· Newman, M. G., Llera, S. J., Erickson, T. M., Przeworski, A., & Castonguay, L. G. (2013). Worry and generalized anxiety disorder: a review and theoretical synthesis of evidence on nature, etiology, mechanisms, and treatment. Annual review of clinical psychology, 9, 275–97.
· Kroenke, Kurt & Strine, Tara & L Spitzer, Robert & Williams, Janet & T Berry, Joyce & Mokdad, Ali. (2008). The PHQ-8 as a Measure of Current Depression in the General Population. Journal of affective disorders. 114. 163–73. 10.1016/j.jad.2008.06.026.
· Yevhen Tyshchenko (2018). Depression and anxiety detection from blog posts data
· Podea, D. & Ratoi, F. (2011). Anxiety Disorders. City: INTECH Open Access Publisher.
· Jeffrey Pennington and Richard Socher and Christopher D. Manning (2014). GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.
Podcasting has become an increasingly popular medium for distributing and consuming audio content. With the help of Artificial Intelligence (AI), podcasting has become even more sophisticated and…
Olho-te com meus olhos curiosos Procurando e percorrendo seus detalhes Os mínimos e escondidos detalhes Nas tuas curvas e traços sinuosos A volúpia de mim toma conta Quando tu me encosta E teus…
XGBoost is a short form for Extreme Gradient Boosting. It gained popularity in data science after the famous Kaggle competition Otto Classification challenge. But how does it work exactly? XGBoost is…