suicidal ideation detection: a review of machine learning ... · suicidal ideation detection: a...

1

Suicidal Ideation Detection: A Review ofMachine Learning Methods and Applications

Shaoxiong Ji, Student Member, IEEE, Shirui Pan, Member, IEEE, Xue Li,Erik Cambria, Senior Member, IEEE, Guodong Long, and Zi Huang

Abstract—Suicide is a critical issue in the modern society.Early detection and prevention of suicide attempt shouldbe addressed to save people’s life. Current suicidal ideationdetection methods include clinical methods based on theinteraction between social workers or experts and the tar-geted individuals, and machine learning techniques withfeature engineering or deep learning for automatic detectionbased on online social contents. This is the first survey thatcomprehensively introduces and discusses the methods fromthese categories. Domain-specific applications of suicidalideation detection are also reviewed according to their datasources, i.e., questionnaires, electronic health records, suicidenotes, and online user content. To facilitate further research,several specific tasks and datasets are introduced. Finally, wesummarize the limitations of current work and provide anoutlook of further research directions.

Index Terms—Suicidal ideation detection, social content,feature engineering, deep learning.

I. INTRODUCTION

MENTAL health issues, such as anxiety and de-pression, are becoming increasingly concerning in

modern society, as they turn out to be especially severein developed countries and emerging markets. Severemental disorders without effective treatment can turn tosuicidal ideation or even suicide attempts. Some onlineposts contain a large amount of negative information andgenerate problematic phenomena such as cyberstalkingand cyberbullying. Consequences can be severe and riskysince such bad information are often engaged in someform of social cruelty, leading to rumors or even mentaldamages. Research shows that there is a link betweencyberbullying and suicide [1]. Victims overexposed to toomuch negative messages or events may become depressedand desperate, even worse, some may commit suicide.

The reasons people commit suicide are complicated.People with depression are highly likely to commitsuicide, but many without depression can also havesuicidal thoughts [2]. Nock et al. [3] reported prevalenceand suicide factors across 17 countries, and found that riskfactors consist of being female, younger, less educated,

S. Ji, X. Li, and Z. Huang are with The Universityof Queensland, Australia. E-mail: [email protected];{xueli; huang}@itee.uq.edu.au

S. Pan is with Monash University, Australia. E-mail: [email protected]

E. Cambria is with Nanyang Technological University, Singapore.E-mail: [email protected]

G. Long is with University of Technology Sydney, Australia.E-mail: [email protected]

unmarried, and having mental health issues. According tothe American Foundation for Suicide Prevention (AFSP),suicide factors fall under under three categories: healthfactors, environment factors, and historical factors [4].Ferrari et al. [5] found that mental health issue andsubstance use disorders are attributed to the factors ofsuicide. O’Connor and Nock [6] conducted a thoroughreview about the psychology of suicide, and summarizedpsychological risks as personality and individual differ-ences, cognitive factors, social factors, and negative lifeevents.

Due to the advances of social media and onlineanonymity, an increasing number of individuals turnto interact with others on the Internet. Online commu-nication channels are becoming a new way for peopleto express their their feelings, suffering, and suicidaltendencies. Hence, online channels have naturally startedto act as a surveillance tool for suicidal ideation, andmining social content can improve suicide prevention [7].In addition, strange social phenomena are emerging,e.g., online communities reaching an agreement on self-mutilation and copycat suicide. For example, a socialnetwork phenomenon call the “Blue Whale Game”1 in2016 uses many tasks (such as self-harming) and leadsgame members to commit suicide in the end. Suicideis a critical social issue and takes thousands of livesevery year. Thus, it is necessary to detect suicidalityand to prevent suicide before victims end their life. Earlydetection and treatment are regarded as the most effectiveways to prevent potential suicide attempts.

Potential victims with suicidal ideation may expresstheir thoughts of committing suicide in the form offleeting thoughts, suicide plan and role playing. Suicidalideation detection is to find out these dangerous inten-tions or behaviors before tragedy strikes. The reasonsof suicide are complicated and attributed to a complexinteraction of many factors [6]. To detect suicidal ideation,many researchers conducted psychological and clinicalstudies [8] and classified responses of questionnaires [9].Based on their social media data, artificial intelligence (AI)and machine learning techniques can predict people’slikelihood of suicide [10], which paves the way forearly intervention. Detection on social content focuseson feature engineering [11], [12], sentiment analysis [13],[14], and deep learning [15], [16], [17].

1https://thesun.co.uk/news/worldnews/3003805

arX

iv:1

910.

1261

1v1

[cs

.CY

] 2

3 O

ct 2

019

https://thesun.co.uk/news/worldnews/3003805

2

Mobile technologies have been studied and appliedto suicide prevention, for example, the mobile suicideintervention application iBobbly [18] developed by theBlack Dog Institute2. Many other suicide prevention toolsintegrated with social networking services have also beendeveloped including Samaritans Radar3 and Woebot4.The former was a Twitter plugin for monitoring alarmingposts which was later discontinued because of privacyissues. The latter is a Facebook chatbot based on cognitivebehavioral therapy and natural language processing(NLP) techniques for relieving people’s depression andanxiety.

It is inevitable that applying cutting-edge AI technolo-gies for suicidal ideation detection comes with privacyissues [19] and ethical concerns [20]. Linthicum et al. [21]put forward three ethical issues including the influenceof bias on machine learning algorithms, the predictionon time of suicide act, and ethical and legal questionsraised by false positive and false negative prediction. Itis not easy to answer ethical questions for AI as theserequire algorithms to reach a balance between competingvalues, issues and interests [19].

AI has been applied to solve many challenging socialproblems. Detection of suicidal ideation with AI tech-niques is one of potential applications for social good, andshould be addressed to meaningfully improve people’swellbeing. The survey reviews suicidal ideation detectionmethods from the perspective of AI and machine learningand specific domain applications with social impact. Thecategorization from these two perspectives is shown inFig. 1. This paper provides a comprehensive review on theincreasingly important field of suicidal ideation detectionby proposing a summary of current research progressand an outlook of future work. The contributions of oursurvey are summarized as follows.

• To the best of our knowledge, this is the first surveythat conducts a comprehensive review on suicidalideation detection, its methods and its applicationsfrom a machine learning perspective.

• We introduce and discuss both classical contentanalysis and recent machine learning techniques,plus their application to questionnaires, EHR data,suicide notes and online social content.

• We enumerate existing and less explored tasks, anddiscuss their limitations. We also summarize existingdatasets and provide an outlook of future researchdirections in this field.

The remainder of the paper is organized as follows:methods and applications are introduced and summa-rized in Section II and Section III, respectively; Section IVenumerates specific tasks and some datasets; finally, wehave a discussion and propose some future directions inSection V.

2https://blackdoginstitute.org.au/research/digital-dog/programs/ibobbly-app

3https://samaritans.org/about-samaritans/research-policy/internet-suicide/samaritans-radar

4https://woebot.io

II. METHODS AND CATEGORIZATION

Suicide detection has drawn attention of many re-searchers due to an increasing suicide rate in recent years,and has been studied extensively from many perspectives.The research techniques used to examine suicide alsospan many fields and methods, for example, clinicalmethods with patient-clinic interaction [8] and automaticdetection from user generated content (mainly text) [11],[16]. Machine learning techniques are widely applied forautomatic detection.

Traditional suicide detection relies on clinical methodsincluding self-reports and face-to-face interviews. Veneket al. [8] designed a five-item ubiquitous questionnaire forthe assessment of suicidal risks, and applied a hierarchicalclassifier on the patients’ response to determine their sui-cidal intentions. Through face-to-face interaction, verbaland acoustic information can be utilized. Scherer [22]investigated the prosodic speech characteristics and voicequality in a dyadic interview to identify suicidal and non-suicidal juveniles. Other clinical methods examine restingstate heart rate from converted sensing signals [23], clas-sify functional magnetic resonance imaging based neuralrepresentations of death- and life-related words [24], andevent-related instigators converted from EEG signals [25].Another aspect of clinical treatment is the understandingof the psychology behind suicidal behavior [6]. This,however, relies heavily on clinician’s knowledge andface-to-face interaction. Suicide risk assessment scaleswith clinical interview can reveal informative cues forpredicting suicide [26]. Tan et al. [27] conducted aninterview and survey study in Weibo, a Twitter-likeservice in China, to explore the engagement of suicideattempters with intervention by direct messages.

A. Content AnalysisUsers’ post on social websites reveals rich information

and their language preferences. Through exploratorydata analysis on the user generated content can havean insight to language usage and linguistic clues ofsuicide attempters. Related analysis includes lexicon-based filtering, statistical linguistic features, and topicmodeling within suicide-related posts.

Suicide-related keyword dictionary and lexicon aremanually built to enable keyword filtering [28], [29]and phrases filtering [30]. Suicide-related keywordsand phrases include “kill”, “suicide”, “feel alone”, “de-pressed”, and “cutting myself”. Vioules et al. [4] built apoint-wise mutual information symptom lexicon usingannotated Twitter dataset. Gunn and Lester [31] analyzedposts from Twitter in the 24 hours prior to death of asuicide attempter. Coppersmith et al. [32] analyzed thelanguage usage of data from the same platform. Suicidalthoughts may involve strong negative feelings, anxiety,and hopelessness, or other social factors like family andfriends. Ji et al. [16] performed word cloud visualizationand topics modeling over suicide-related content andfound that suicide-related discussion covers both personal

https://blackdoginstitute.org.au/research/digital-dog/programs/ibobbly-app

https://blackdoginstitute.org.au/research/digital-dog/programs/ibobbly-app

https://samaritans.org/about-samaritans/research-policy/internet-suicide/samaritans-radar

https://samaritans.org/about-samaritans/research-policy/internet-suicide/samaritans-radar

https://woebot.io

3

Suicidal Ideation

Detection

Clinical Methods

Content Analysis

Feature Engineering

Deep LearningLinguistic Features

Affective Characteristics

Clinical Interview

Questionnaires

Electronic Health Records

Suicide Notes

Social Content

DomainsMethods

Statistical Features

RNNs CNNs RedditTwitterWeiboReachOutAttention

Tabular Features

Fig. 1: The categorization of suicide ideation detection: methods and domains

and social issues. Colombo et al. [33] analyzed the graph-ical characteristics of connectivity and communicationin the Twitter social network. Coppersmith et al. [34]provided an exploratory analysis on language patternsand emotions in Twitter. Other methods and techniquesinclude Google Trends analysis for monitoring suiciderisk [35], detecting social media content and speechpatterns analysis [36], assessing the reply bias throughlinguistic clues [37], human-machine hybrid method foranalyzing the effect of language of social support onsuicidal ideation risk [38].

B. Feature EngineeringThe goal of text-based suicide classification is to

determine whether candidates, through their posts, havesuicidal ideations. Machine learning methods and NLPhave also been applied in this field.

1) Tabular Features: Tabular data for suicidal ideationdetection consist of questionnaire responses and struc-tured statistical information extracted from websites.Such structured data can be directly used as featuresfor classification or regression. Masuda et al. [39] ap-plied logistic regression to classify suicide and controlgroups based on users’ characteristics and social behav-ior variables, and found variables such as communitynumber, local clustering coefficient and homophily havea larger influence on suicidal ideation in a SNS of Japan.Chattopadhyay [40] applied Pierce Suicidal Intent Scale(PSIS) to assess suicide factors and conducted a regressionanalysis. Questionnaires act as a good source of tabularfeatures. Delgado-Gomez et al. [41] used the internationalpersonal disorder examination screening questionnaireand the Holmes-Rahe social readjustment rating scale.Chattopadhyay [42] proposed to apply a multilayer feedforward neural network as shown in Fig. 2a to classifysuicidal intention indicators according to Beck’s suicideintent scale.

2) General Text Features: Another direction of featureengineering is to extract features from unstructured text.

The main features consist of N-gram features, knowledge-based features, syntactic features, context features andclass-specific features [43]. Abboute et al. [44] built a setof keywords for vocabulary feature extraction within ninesuicidal topics. Okhapkina et al. [45] built a dictionaryof terms pertaining to suicidal content and introducedterm frequency-inverse document frequency (TF-IDF)matrices for messages and a singular value decomposition(SVD) for matrices. Mulholland and Quinn [46] extractedvocabulary and syntactic features to build a classifierto predict the likelihood of a lyricist’s suicide. Huanget al. [47] built a psychological lexicon dictionary byextending HowNet (a commonsense words collection),and used a support vector machines (SVM) to detectcybersuicide in Chinese microblogs. Topic model [48] isincorporated with other machine learning techniques foridentifying suicide in Sina Weibo. Ji et al. [16] extractseveral informative sets of features, including statistical,syntactic, linguistic inquiry and word count (LIWC), wordembedding, and topic features, and then put the extractedfeatures into classifiers as shown in Fig. 2b, where fourtraditional supervised classifiers are compared. Shinget al. [12] extracted several features as bag of words(BoWs), empath, readability, syntactic features, topicmodel posteriors, word embeddings, linguistic inquiryand word count, emotion features and mental diseaselexicon.

Models for suicidal ideation detection with featureengineering include SVM [43], artificial neural networks(ANN) [49] and conditional random field (CRF) [50]. Taiet al. [49] selected several features including history ofsuicide ideation and self-harm behavior, religious belief,family status, mental disorder history of candidates andtheir family. Pestian et al. [51] compared the performanceof different multivariate techniques with features of wordcounts, POS, concepts and readability score. Similarly, Jiet al. [16] compared four classification methods of logisticregression, random forest, gradient boosting decision tree,and XGBoost. Braithwaite et al. [52] validated machine

4

learning algorithms can effectively identify high suicidalrisk.

I1

I2

Il

In

hn

h2

h1

y d(t, y) E MSE

Update of weights Backpropagation

(a) Neural network with feature engineering

+Title Text Body

Statis POS LIWC TF-IDF Topics LIWC POS Statis

Classifier

(b) Classifier with feature engineering

Fig. 2: Illustrations of methods with feature engineering

3) Affective Characteristics: Affective characteristics areone of the most distinct differences between those whoattempt suicide and normal individuals, which hasdrawn great attention of both computer scientists andmental heath researchers. To detect the emotions insuicide notes, Liakata et al. [50] used manual emotioncategories including anger, sorrow, hopefulness, happi-ness/peacefulness, fear, pride, abuse and forgiveness.Wang et al. [43] employed combined characteristics ofboth factual (2 categories) and sentimental aspects (13categories) to discover fine-grained sentiment analysis.Similarly, Pestian et al. [51] identified emotions of abuse,anger, blame, fear, guilt, hopelessness, sorrow, forgive-ness, happiness, peacefulness, hopefulness, love, pride,thankfulness, instructions, and information. Ren et al. [13]proposed a complex emotion topic model and appliedit to analyze accumulated emotional traits in suicideblogs and to detect suicidal intentions from a blog stream.Specifically, the authors studied accumulate emotionaltraits including emotion accumulation, emotion covari-ance, and emotion transition among eight basic emotionsof joy, love, expectation, surprise, anxiety, sorrow, anger,and hate with a five-level intensity.

C. Deep Learning

Deep learning has been a great success in manyapplications including computer vision, NLP, and medicaldiagnosis. In the field of suicide research, it is alsoan important method for automatic suicidal ideationdetection and suicide prevention. It can effectively learntext features automatically without complex featureengineering techniques. Popular deep neural networks(DNNs) include convolutional neural networks (CNNs)and recurrent neural networks (RNNs) as shown in

Fig. 3a and 3b, respectively. To apply DNNs, naturallanguage text is usually embedded into distributed vectorspace with popular word embedding techniques such asword2vec [53] and GloVe [54]. Shing et al. [12] applieduser-level CNN with filter size of 3, 4 and 5 to encodeusers’ posts. Long short-term memory (LSTM) network,a popular variant of RNN, is applied to encode textualsequences and then processed for classification with fullyconnected layers [16].

Recent methods introduce other advanced learningparadigms to integrate with DNNs for suicidal ideationdetection. Ji et al. [55] proposed model aggregationmethods for updating neural networks, i.e, CNNs andLSTMs, targeting to detect suicidal ideation in privatechatting rooms. However, decentralized training relies oncoordinators in chatting rooms to label user posts for su-pervised training, which can only applied to very limitedscenarios. One possible better way is to use unsupervisedor semi-supervised learning methods. Benton et al. [15]predicted suicide attempt and mental health with neuralmodels under the framework of multi-task learning bypredicting the gender of users as auxiliary task. Gauret al. [56] incorporated external knowledge bases andsuicide-related ontology into text representation, andgained an improved performance with a CNN model.Coppersmith et al. [57] developed a deep learning modelwith GloVe for word embedding, bidirectional LSTMfor sequence encoding, and self attention mechanism forcapturing the most informative subsequence. Sawhney etal. [58] used LSTM, CNN, and RNN for suicidal ideationdetection.

In the 2019 CLPsych Shared Task [59], many popularDNN architectures were applied. Hevia et al. [60] evalu-ated the effect of pretraining using different models in-cluding GRU-based RNN. Morales et al. [61] studied sev-eral popular deep learning models such as CNN, LSTM,and Neural Network Synthesis (NeuNetS). Matero etal. [62] proposed dual-context model using hierarchicallyattentive RNN, and bidirectional encoder representationsform transformers (BERT).

Another sub-direction is the so-called hybrid methodwhich cooperates minor feature engineering with repre-sentation learning techniques. Chen et al. [63] proposeda hybrid classification model of behavioral model andsuicide language model. Zhao et al. [64] proposed D-CNN model taking word embedding and external tabularfeatures as inputs for classifying suicide attempters withdepression.

D. Summary

The popularization of machine learning has facilitatedresearch on suicidal ideation detection from multi-modaldata and provided a promising way for effective earlywarning. Current research focuses on text-based methodsby extracting features and deep learning for automaticfeature learning. Many canonical NLP features such asTF-IDF, topics, syntactics, affective characteristics, and

5

TABLE I: Categorization of methods for suicidal ideation detection

Category Publications Methods Inputs

FeatureEngineering

Ji et al. [16] Word counts, POS, LIWC, TF-IDF + classifiers TextMasuda et al. [39] Multivariate/univariate logistic regression Characteristics variables

Delgado-Gomez et al. [41] International personal disorder examination screening questionnaire Questionnaire responsesMulholland et al. [46] Vocabulary features, syntactic features, semantic class features, N-gram lyricsOkhapkina et al. [45] Dictionary, TF-IDF + SVD Text

Huang et al. [47] Lexicon, syntactic features, POS, tense TextPestian et al. [51] Word counts, POS, concepts and readability score Text

Tai et al. [49] self-measurement scale + ANN Self-measurement formsShing et al. [12] BoWs, empath, readability, syntactic, topic, LIWC, emotion, lexicon Text

DeepLearning

Zhao et al. [64] Word embedding, tabular features, D-CNN Text+external informationShing et al. [12] Word embedding, CNN, max pooling Text

Ji et al. [16] Word embedding, LSTM, max pooling TextBento et al. [15] Multi-task learning, neural networks TextHevia et al. [60] Pretrained GRU, word embedding, document embedding Text

Morales et al. [61] CNN, LSTM, NeuNetS, word embedding TextMatero et al. [62] Dual-context, BERT, GRU, attention, user-factor adaptation TextGaur et al. [56] CNN, knowledge base, ConceptNet embedding Text

Coppersmith et al. [57] GloVe, BiLSTM, self attention Text

Text Embedding

(a) CNN

RNNcellRNNcellRNNcellRNNcellRNNcell

w1 w2 w3 w4w0

h0 h1 h2 h3 h4

(b) RNN

Fig. 3: Deep neural networks for suicidal ideation detec-tion

readability, and deep learning models like CNN andLSTM are widely used by researchers. Those methodsgained preliminary success on suicidal ideation detection,but some methods may only learn statistical cues and lackof commonsense. The recent work [56] incorporated ex-ternal knowledge by using knowledge bases and suicideontology for knowledge-aware suicide risk assessment.It took a remarkable step towards knowledge-awaredetection.

III. APPLICATIONS ON DOMAINS

Many machine learning techniques have been in-troduced for suicidal ideation detection. The relevantextant research can also be viewed according to thedata source. Specific application covers a wide rangeof domains including questionnaires, electronic healthrecords (EHRs), suicide notes, and online user content.Fig. 4 shows some examples of data source for suicidalideation detection, where Fig. 4a lists selected questionsof the “International Personal Disorder Examination

Screening Questionnaire” (IPDE-SQ) adapted from [41],Fig. 4b are selected patient’s records from [65], Fig. 4cis a suicide note from a website5, and Fig. 4d is atweet and its corresponding comments from Twitter.com.Some researchers also developed softwares for suicideprevention. Berrouiguet et al. [66] developed a mobileapplication for health status self report. Meyer et al. [67]developed an e-PASS Suicidal Ideation Detector (eSID)tool for medical practitioners.

A. QuestionnairesMental disorder scale criteria such as DSM-IV6 and ICD-

107, and the IPDE-SQ provides good tool for evaluating anindividual’s mental status and their potential for suicide.Those criteria and examination metrics can be used todesign questionnaires for self-measurement or face-to-face clinician-patient interview.

To study the assessment of suicidal behavior, Delgado-Gomez et al. [9] applied and compared the IPDE-SQ andthe “Barrat’s Impulsiveness Scale”(version 11, BIS-11) toidentify people likely to attempt suicide. The authorsalso conducted a study on individual items from thosetwo scales. The BIS-11 scale has 30 items with 4-pointratings, while the IPDE-SQ in DSM-IV has 77 true-falsescreening questions. Further, Delgado-Gomez et al. [41]introduced the “Holmes-Rahe Social Readjustment RatingScale” (SRRS) and the IPDE-SQ as well to two comparisongroups of suicide attempters and non-suicide attempters.The SRRS consists of 43 ranked life events of differentlevels of severity. Harris et al. [68] conducted a surveyon understanding suicidal individuals’ online behaviorsto assist suicide prevention. Sueki [69] conducted anonline panel survey among Internet users to studythe association between suicide-related Twitter use andsuicidal suicidal behavior. Based on the questionnaireresults, they applied several supervised learning methods

5https://paranorms.com/suicide-notes6https://psychiatry.org/psychiatrists/practice/dsm7https://apps.who.int/classifications/icd10/browse/2016/en

Twitter.com

https://paranorms.com/suicide-notes

https://psychiatry.org/psychiatrists/practice/dsm

https://apps.who.int/classifications/icd10/browse/2016/en

6

1. I feel often empty inside (Borderline PD) 2. I usually get fun and enjoyment out of life (Schizoid PD) 3. I have tantrums or angry outburst (Borderline PD) 4. I have been the victim of unfair attacks on my character or reputation (Paranoid PD) 5. I can’t decide what kind of person I want to be (Borderline PD)6. I usually feel uncomfortable or helpless when I’m alone(Dependent PD) 7. I think my spouse (or lover) may be unfaithful to me (Paranoid PD)8. My feelings are like the weather: they’re always changing (Histrionic PD) 9. People have a high opinion on me (Narcissistic PD) 10. I take chances and do reckless things (Antisocial PD)

(a) Questionnaire

High-lethality attemptsNumber of postcode changes Occupation: pensioner Marital status: single/never marriedICD code: F19 (Mental disorders due to drug abuse) ICD code: F33 (Recurrent depressive disorder)ICD code: F60 (Specific personality disorders)ICD code: T43 (Poisoning by psychotropic drugs) ICD code: U73 (Other activity)ICD code: Z29 (Need for other prophylactic measures) ICD Code: T50 (Poisoning)

(b) EHR

I am now convinced that my condition is too chronic, and therefore a cure is doubtful. All of a sudden all will and determination to fight has left me. I did desperately want to get well. But it was not to be – I am defeated and exhausted physically and emotionally. Try not to grieve. Be glad I am at least free from the miseries and loneliness I have endured for so long.— MARRIED WOMAN, 59 YEARS OLD

(c) Suicide notes (d) Tweets

Fig. 4: Examples of content for suicidal ideation detection

including linear regression, stepwise linear regression,decision trees, Lars-en and SVMs to classify suicidalbehaviors.

B. Electronic Health RecordsThe increasing volume of electronic health records

(EHRs) has paved the way for machine learning tech-niques for suicide attempter prediction. Patient recordsinclude demographical information and diagnosis-relatedhistory like admissions and emergency visits. However,due to the data characteristics such as sparsity, variablelength of clinical series and heterogeneity of patientrecords, many challenges remain in modeling medicaldata for suicide attempt prediction. In addition, therecording procedures may change because of the changeof healthcare policies and the update of diagnosis codes.

There are several works of predicting suicide riskbased on EHRs [70], [71]. Tran et al. [65] proposed anintegrated suicide risk prediction framework with featureextraction scheme, risk classifiers and risk calibrationprocedure. Specifically, each patient’s clinical history isrepresented as a temporal image. Iliou et al. [72] proposeda data preprocessing method to boost machine learningtechniques for suicide tendency prediction of patientssuffering from mental disorders. Nguyen et al. [73]explored real-world administrative data of mental healthpatients from hospital for short and medium-term suiciderisk assessments. By introducing random forests, gradientboosting machines, and DNNs, the authors managed todeal with high dimensionality and redundancy issuesof data. Although, previous method gained preliminarysuccess, Iliou et al. [72] and Nguyen et al. [73] havea limitation on the source of data which focuses onpatients with mental disorders in their historical records.Bhat and Goldman-Mellor [74] used an anonymizedgeneral EHR dataset to relax the restriction on patient’sdiagnosis-related history, and applied neural networksas classification model to predict suicide attempters.

C. Suicide NotesSuicide notes are the written notes left by people

before committing suicide. They are usually written onletters and online blogs, and recorded in audio or video.Suicide notes provide material for NLP research. Previousapproaches have examined suicide notes using contentanalysis [51], sentiment analysis [75], [43], and emotion

detection [50]. Pestian et al. [51] used transcribed suicidenotes with two groups of completers and elicitors frompeople who have a personality disorder or potentialmorbid thoughts. White and Mazlack [76] analyzed wordfrequencies in suicide notes using a fuzzy cognitivemap to discern causality. Liakata et al. [50] employedmachine learning classifiers to 600 suicide messages withvaried length, different readability quality, and multi-classannotations.

Emotion in text provides sentimental cues of suicidalideation understanding. Desmet et al. [77] conducteda fine-grained emotion detection on suicide notes of2011 i2b2 task. Wicentowski and Sydes [78] used anensemble of maximum entropy classification. Wang etal. [43] and Kovacevic et al. [79] proposed hybrid machinelearning and rule-based method for the i2b2 sentimentclassification task in suicide notes.

In the age of cyberspace, more suicide notes are nowwritten in the from of web blogs and can be identified ascarrying the potential risk of suicide. Huang et al. [28]monitored online blogs from MySpace.com to identifyat-risk bloggers. Schoene and Dethlefs [80] extractedlinguistic and sentiment features to identity genuinesuicide notes and comparison corpus.

D. Online User Content

The widespread of mobile Internet and social network-ing services facilitate people’s expressing the own lifeevents and feelings freely. As social websites provide ananonymous space for online discussion, an increasingnumber of people suffering from mental disorders turn toseek for help. There is a concerning tendency that poten-tial suicide victims post their suicidal thoughts in socialwebsites like Facebook, Twitter, Reddit, and MySpace.Social media platforms are becoming a promising tunnelfor monitoring suicidal thoughts and preventing suicideattempts [81]. Massive user generated data provide agood source to study online users’ language patterns.Using data mining techniques on social networks andapplying machine learning techniques provide an avenueto understand the intent within online posts, provideearly warnings, and even relieve a persons suicidalintentions.

Twitter provides a good source for research on sui-cidality. O’Dea et al. [11] collected tweets using thepublic API and developed automatic suicide detection by

MySpace.com

7

applying logistic regression and SVM on TF-IDF features.Wang et al. [82] further improved the performance witheffective feature engineering. Shepherd et al. [83] con-ducted psychology-based data analysis for contents thatsuggests suicidal tendencies in Twitter social networks.The authors used the data from an online conversationcalled #dearmentalhealthprofessionals.

Another famous platform Reddit is a online forumwith topic-specific discussions, has also attracted muchresearch interest for studying mental health issues [84]and suicidal ideation [37]. A community on Reddit calledSuicideWatch is intensively used for studying suicidalintention [85], [16]. De Choudhury et al. [85] applied astatistical methodology to discover the transition frommental health issues to suicidality. Kumar et al. [86]examined the posting activity following the celebritysuicides, studied the effect of celebrity suicides on suiciderelated contents, and proposed a method to prevent thehigh-profile suicides.

Many researches [47], [48] work on detecting suicidalideation in Chinese microblogs. Guan et al. [87] studieduser profile and linguistic features for estimating suicideprobability in Chinese microblogs. There also remainssome work using other platforms for suicidal ideationdetection. For example, Cash et al. [88] conducted astudy on adolescents’ comments and content analysison MySpace. Steaming data provides a good source foruser pattern analysis. Vioules et al. [4] conducted user-centric and post-centric behavior analysis and applieda martingale framework to detect sudden emotionalchanges in Twitter data stream for monitoring suicidewarning signs. Blog stream collected from public blogarticles written by suicide victims is used by Ren et al. [13]to study the accumulated emotional information.

E. Summary

Applications of suicidal ideation detection mainlyconsist of four domains, i.e., questionnaires, electronichealth records, suicide notes, and online user content.Table II gives a brief summary of categories, data sources,and methods. Among these four main domains, ques-tionnaires and EHRs require self-report measurement orpatient-clinician interactions, and rely highly on socialworkers or mental health professions. Suicide notes havea limitation on immediate prevention, as many suicideattempters commit suicide in a short time after they writesuicide notes. However, they provide a good source forcontent analysis and the study of suicide factors. Thelast domain of online user content is one of the mostpromising way of early warning and suicide preventionwhen empowered with machine learning techniques.With the rapid development of digital technology, usergenerated content will play a more important role insuicidal ideation detection, and other forms of data suchas health data generated by wearable devices can be verylikely to help with suicide risk monitoring in the nearfuture.

TABLE II: Summary of Studies on Suicidal IdeationDetection

categoriesself-report examinationface-to-face suicide preventionautomatic suicidal ideation detection

data

questionnairessuicide notessuicide blogselectronic health recordsonline social texts

methods

clinical methodscontent analysisfeature engineeringdeep learning

IV. TASKS AND DATASETS

In this section, we make a summary for specifictasks in suicidal ideation detection and other suicide-related tasks about mental disorders. Some tasks, suchas reasoning suicidal messages, generating response, andsuicide attempters detection on a social graph, may lackof benchmarks for evaluation at present. However, theyare critical for effective detection. We propose these taskstogether with current research direction and call forcontribution to these tasks from the research community.Meanwhile, an elaborate list of datasets for currentavailable tasks is provided, and some potential datasources are also described to promote the research efforts.

A. Tasks1) Suicide Text Classification: The first task - suicide

text classification can be viewed as a domain-specificapplication of general text classification, which includesbinary and multi-class classification. Binary suicidal-ity classification simply determines text with suicidalideation or not, while multi-class suicidality classificationconducts fine-grained suicide risk assessment. For exam-ple, some research divides suicide risk into five levelsof no, low, moderate, and severe risk. Alternatively, itcan also consider four types of class labels according tomental and behavioral procedures, i.e, non-suicidal, sui-cidal thoughts/wishes, suicidal intentions, and suicidalact/plan.

Another subtask is risk assessment by learning frommulti-aspect suicidal posts. Adopting the definition ofcharacteristics of suicidal messages, Gilat et al. [89]manually tagged suicidal posts with multi-aspect labelsincluding mental pain, cognitive attribution, and level ofsuicidal risk. Mental pain includes loss of control, acuteloneliness, emptiness, narcissistic wounds, irreversibilityloss of energy, and emotional flooding, scaled into [0, 7].Cognitive attribution is frustration of needs associatedwith interpersonal relationships, or there is no indicationof attribution.

2) Reasoning Suicidal Messages: Massive data miningand machine learning algorithms have achieved remark-able outcomes by using DNNs. However, simple feature

8

sets and classification models are not predictive enoughto detect complicated suicidal intention. Machine learn-ing techniques require the ability of reasoning suicidalmessages to have a deeper insight of suicidal factors andthe innermost being from textual posts. This task aimsto employ interpretable methods to investigate suicidalfactors and incorporate with commonsense reasoning,which may improve the prediction of suicidal factors.Specific tasks includes automatic summarization of sui-cide factor, find explanation of suicidal risk in mentalpain and cognitive attribution aspects associated withsuicide.

3) Suicide Attempter Detection: Aforementioned twotasks focus on a single text itself. However, the mainpurpose of suicidal ideation detection is the identifysuicide attempters. Thus, it is important to achieve user-level detection, which consists of two folds, i.e, user-levelmulti-instance suicidality detection and suicide attempterdetection on a graph. The former one takes a bag of postsfrom individuals as input and conducts multi-instancelearning over a bag of messages. The later aims to toidentify suicide attempters in specific social graph whichis built by interaction between users in social networks.It considers the relationship between social users, andcan be regarded as a node classification problem in agraph.

4) Generating Response: The ultimate goal of suicidalideation detection is intervention and suicide prevention.Many people with suicidal intention tend to post theirsuffering at midnight. To enable immediate social careand relieve their suicidal intention, another task isgenerating considerate response for counseling potentialsuicidal victims. Gilat et al., [89] introduced eight types ofresponse strategies, they are: emotional support, offeringgroup support, empowerment, interpretation, cognitivechange inducement, persuasion, advising, and referring.This task requires machine learning techniques, especiallysequence-to-sequence learning, to have the ability ofadopting effective response strategies to generate betterresponse and eliminate people’s suicidality. When socialworkers or volunteers go back online, this responsegeneration technique can also be used for generatinghints for them to compose effective response.

5) Mental disorders and Self-harm risk: Suicidal ideationhas a strong relation with mental health issue and self-harm risks. Thus, detecting severe mental disorders orself-harm risks is also an important task. Such worksinclude depression detection [90], self-harm detection [91],stressful periods and stressor events detection [92], build-ing knowledge graph for depression [93], and correlationanalysis on depression and anxiety [94]. Correspondingsubtasks in this field are similar to suicide text classifica-tion in Section IV-A1.

B. Datasets

1) Text Data:

a) Reddit: Reddit is a registered online communitythat aggregates social news and online discussions. Itconsists of many topic categories, and each area of interestwithin a topic is called a subreddit. A subreddit called“Suicide Watch”(SW)8 is intensively used for furtherannotation as positive samples. Posts without suicidalcontent are sourced from other popular subreddits. Ji etal. [16] released a dataset with 3,549 posts with suicidalideation. Shing et al. [12] published their UMD RedditSuicidality Dataset with 11,129 users and totally 1,556,194posts, and sampled 934 users for further annotation.Aladag et al. [95] collected 508,398 posts using GoogleCloud BigQuery, and manually annotated 785 posts.

b) Twitter: Twitter is a popular social networkingservices, where many users also talk about their suicidalideation. Twitter is quite different with Reddit in postlength, anonymity, and the way communication andinteraction. Twitter user data with suicidal ideation anddepression are collected by Coppersmith et al. [32]. Ji etal. [16] collected an imbalanced dataset of 594 tweets withsuicidal ideation out of totally 10,288 tweets. Vioules etal. collected 5,446 tweets using Twitter streaming API [4],of which 2,381 and 3,065 tweets from the distressed usersand normal users, respectively. However, most Twitter-based datasets are no longer available as per to the policyof Twitter.

c) ReachOut: ReachOut Forum9 is a peer supportplatform provide by an Australian mental health careorganization. The ReachOut dataset [96] was firstlyreleased in CLPsych17 shared task. Participants wereinitially given a training dataset of 65,756 forum posts, ofwhich 1188 were annotated manually with the expectedcategory, and a test set of 92,207 forum posts, of which400 were identified as requiring annotation. Specific fourcategories are described as follows. 1) crisis: the authoror someone else is at risk of harm; 2) red: the post shouldbe responded to as soon as possible; 3) amber: the postshould be responded to at some point, if the communitydoes not rally strongly around it; 4) green: the post canbe safely ignored or left for the community to address.

2) EHR: EHR data contain demographical information,admissions, diagnostic reports and physician notes. Acollection of electronic health records is from Californiaemergency department encounter and hospital admis-sion. It contains 522,056 anonymous EHR records fromCalifornia-resident adolescents. However, it is not publicfor access. Bhat and Goldman-Mellor [74] firstly usedthese records from 2006-2009 to predict the suicideattempt in 2010. Haerian et al. [97] selected 280 casesfor evaluation from the Clinical Data Warehouse (CDW)and WebCIS database at NewYork Presbyterian Hospital/Columbia University Medical Center. Tran et al. [65]studied emergency attendances with a least one riskassessments from the Barwon Health data warehouse.

8https://reddit.com/r/SuicideWatch9https://au.reachout.com/forums

https://reddit.com/r/SuicideWatch

https://au.reachout.com/forums

9

The selected dataset contains 7,746 patients and 17,771assessments.

3) Mental Disorders: Mental health issues such asdepression without effective treatment can turn intosuicidal ideation. For the convenience of research onmental disorders, we also list several resources formonitoring mental disorders. The eRisk dataset of EarlyDetection of Signs of Depression [98] is released by thefirst task of 2018 workshop at the Conference and Labsof the Evaluation Forum (CLEF), which focuses on earlyrisk prediction on the Internet10. This dataset containssequential text from social media. Another dataset isthe Reddit Self-reported Depression Diagnosis (RSDD)dataset [90], which contains 9,000 diagnosed users withdepression and approximately 107,000 matched controlusers.

V. DISCUSSION AND FUTURE WORK

Many preliminary works have been conducted forsuicidal ideation detection, especially boosted by manualfeature engineering and DNN-based representation learn-ing techniques. However, current research has severallimitations and there are still great challenges for futurework.

A. Limitations

a) Data Deficiency: The most critical issue of currentresearch is data deficiency. Current methods mainly applysupervised learning techniques, which requires manualannotation. However, there are not enough annotateddata to support further research. For example, labeleddata with fine-grained suicide risk only have limitedinstances, and there are no multi-aspect data and datawith social relationship.

b) Annotation Bias: There is few evidence to confirmthe suicide action to obtain ground truth. Thus, currentdata are obtained by manual labeling with some prede-fined annotation rules. Crowdsourcing-based annotationmay lead to bias of labels. Shing et al. [12] askedexperts for labeling but only obtained limited numberof labeled instances. As for the demographical data,the quality of suicide data is concerning and mortalityestimation is general death but not suicide11. Some casesare misclassified as accidents or death of undeterminedintent.

c) Data Imbalance: Posts with suicidal intention ac-count for a very small proportion of massive socialposts. However, most work built datasets in an approxi-mately even manner to collect relatively balanced positiveand negative samples rather than treated it as an ill-imbalanced data distributed.

10https://early.irlab.org11World Health Organization, Preventing suicide: a global impera-

tive, 2014. https://apps.who.int/iris/bitstream/handle/10665/131056/9789241564779 eng.pdf

d) Lack of Intention Understanding: Current statisticallearning method failed to have a good understandingof suicidal intention. The psychology behind suicidalattempts is complex. However, mainstream methodsfocus on selecting features or using complex neuralarchitectures to boost the predictive performance. Fromthe phenomenology of suicidal posts in social content,machine learning methods managed to learn the statisticalclues, but failed to reason over the risk factors byincorporating the psychology of suicide.

B. Future Work

1) Emerging Learning Techniques: The advances of deeplearning techniques has boosted research on suicidalideation detection. More emerging learning techniquessuch as attention mechanism and graph neural networkscan be introduced for suicide text representation learning.Other learning paradigms such as transfer learning,adversarial training and reinforcement learning can alsobe utilized. For example, knowledge on mental healthdetection domain can be transferred for suicidal ideationdetection, and generative adversarial networks can beused to generated adversarial samples for data augmen-tation.

In social networking services, posts with suicidalideation are in the long tail of the distribution of differentpost categories. In order to achieve effective detection inthe illy imbalanced distribution of real-world scenario,few-shot learning can be utilized to train on a few labeledposts with suicidal ideation among the large social corpus.

2) Suicidal Intention Understanding and Interpretability:Many factors are correlated to suicide such as mentalhealth, economic recessions, gun prevalence, daylightpatterns, divorce laws, media coverage of suicide, andalcohol use12. A better understanding of suicidal in-tention can provide a guideline for effective detectionand intervention. A new research direction is to equipdeep learning models with the ability of commonsensereasoning, for example, by incorporating external suicide-related knowledge bases.

Deep learning techniques can learn an accurate predic-tion model. However, this would be a black-box model.In order to have a better understanding of people’ssuicidal intention and have a trustworthy prediction, newinterpretable models should be developed.

3) Temporal Suicidal Ideation Detection: Another direc-tion is to detect suicidal ideation over the data stream andconsider the temporal information. There exists severalstages of suicide attempts including stress, depression,suicidal thoughts and suicidal plan. Modeling the tempo-ral trajectory of people’s posts can effectively monitor thechange of mental status, and is important for detectingearly signals of suicidal ideation.

12Report by Lindsay Lee, Max Roser and Esteban Ortiz-Ospina in Our-WorldInData.org, retrieved from https://ourworldindata.org/suicide

https://early.irlab.org

https://apps.who.int/iris/bitstream/handle/10665/131056/9789241564779_eng.pdf

https://apps.who.int/iris/bitstream/handle/10665/131056/9789241564779_eng.pdf

https://ourworldindata.org/suicide

10

TABLE III: A summary of public datasets

Type Publication Source Instances Public Access

Text Ji et al. [16] Reddit 3,549 Request to the first authorText Shing et al. [12] Reddit 866/11,129 https://tinyurl.com/umd-suicidalityText Aladag et al. [95] Reddit 508,398 Request to the authorsText Coppersmith et al. [32] Twitter > 3,200 N.AText Coppersmith et al. [99] Twitter 1,711 https://tinyurl.com/clpsych-2015Text Vioules et al. [4] Twitter 5,446 N.AText Milne et al. [96] ReachOut 65,756 N.AEHR Bhat and Goldman-Mellor [74] Hospital 522,056 N.AEHR Tran et al. [65] Barwon Health 7,746 N.AEHR Haerian et al. [97] CDW & WebCIS 280 N.AText Pestian et al. [75] Notes 1,319 2011 i2b2 NLP challengeText Gaur et al. [56] Reddit 500/2,181 Request to the authorsText eRisk 2018 [98] Social networks 892 https://tec.citius.usc.es/ir/code/eRisk.htmlText RSDD [90] Reddit 9,000 http://ir.cs.georgetown.edu/resources/rsdd.html

4) Proactive Conversational Intervention: The ultimateaim of suicidal ideation detection is intervention andprevention. Very little work is undertaken to enable proac-tive intervention. Proactive Suicide Prevention Online(PSPO) [100] provides a new perspective with the combi-nation of suicidal identification and crisis management.An effective way is through conversations. To enabletimely intervention for suicidal thoughts, automatic re-sponse generation becomes a promising technical solution.Natural language generation techniques can be utilizedfor generating counseling response to comfort people’sdepression or suicidal ideation. Reinforcement learningcan also applied for conversational suicide intervention.After suicide attempters post suicide messages (as initialstate), online volunteers and lay individuals will takeaction to comment the original posts and persuadeattempters to give up their suicidality. The attemptermay do nothing, reply to the comments, or get theirsuicidality relieved. A score will be defined by observingthe reaction from a suicide attempter as a reward. Theconversational suicide intervention uses policy gradientfor agents to generated responses with maximum rewardsto best relieve people’s suicidal thoughts.

VI. CONCLUSION

Suicide prevention remains an important task in ourmodern society. Early detection of suicidal ideation isan important and effective way to prevent suicide. Thissurvey investigates existing methods for suicidal ideationdetection from a broad perspective which covers clinicalmethods like patient-clinician interaction and medicalsignal sensing; textual content analysis such as lexicon-based filtering and word cloud visualization; featureengineering including tabular, textual, and affectivefeatures; and deep learning based representation learninglike CNN- and LSTM-based text encoders. Four maindomain-specific applications on questionnaires, EHRs,suicide notes and online user content are introduced.

Most work in this field has been conducted by psycho-logical experts with statistical analysis, and computer sci-

entists with feature engineering based machine learningand deep learning based representation learning. Basedon current research, we summarized existing tasks andfurther propose new possible tasks. Last but not least, wediscuss some limitations of current research and proposea series of future directions including utilizing emerginglearning techniques, interpretable intention understand-ing, temporal detection, and proactive conversationalintervention.

Online social content is very likely to be the mainchannel for suicidal ideation detection in the future. It istherefore essential to develop new methods, which canheal the schism between clinical mental health detectionand automatic machine detection, to detect online textscontaining suicidal ideation in the hope that suicide canbe prevented.

REFERENCES

[1] S. Hinduja and J. W. Patchin, “Bullying, cyberbullying, andsuicide,” Archives of suicide research, vol. 14, no. 3, pp. 206–221,2010.

[2] J. Joo, S. Hwang, and J. J. Gallo, “Death ideation and suicidalideation in a community sample who do not meet criteria formajor depression,” Crisis, 2016.

[3] M. K. Nock, G. Borges, E. J. Bromet, J. Alonso, M. Angermeyer,A. Beautrais, R. Bruffaerts, W. T. Chiu, G. De Girolamo, S. Gluz-man et al., “Cross-national prevalence and risk factors for suicidalideation, plans and attempts,” The British Journal of Psychiatry,vol. 192, no. 2, pp. 98–105, 2008.

[4] M. J. Vioules, B. Moulahi, J. Aze, and S. Bringay, “Detectionof suicide-related posts in twitter data streams,” IBM Journal ofResearch and Development, vol. 62, no. 1, pp. 7:1–7:12, 2018.

[5] A. J. Ferrari, R. E. Norman, G. Freedman, A. J. Baxter, J. E. Pirkis,M. G. Harris, A. Page, E. Carnahan, L. Degenhardt, T. Vos et al.,“The burden attributable to mental and substance use disordersas risk factors for suicide: findings from the global burden ofdisease study 2010,” PloS one, vol. 9, no. 4, p. e91936, 2014.

[6] R. C. O’Connor and M. K. Nock, “The psychology of suicidalbehaviour,” The Lancet Psychiatry, vol. 1, no. 1, pp. 73–85, 2014.

[7] J. Lopez-Castroman, B. Moulahi, J. Aze, S. Bringay, J. Deninotti,S. Guillaume, and E. Baca-Garcia, “Mining social networksto improve suicide prevention: A scoping review,” Journal ofneuroscience research, 2019.

[8] V. Venek, S. Scherer, L.-P. Morency, J. Pestian et al., “Adolescentsuicidal risk assessment in clinician-patient interaction,” IEEETransactions on Affective Computing, vol. 8, no. 2, pp. 204–215, 2017.

https://tinyurl.com/umd-suicidality

https://tinyurl.com/clpsych-2015

https://tec.citius.usc.es/ir/code/eRisk.html

http://ir.cs.georgetown.edu/resources/rsdd.html

11

[9] D. Delgado-Gomez, H. Blasco-Fontecilla, A. A. Alegria, T. Legido-Gil, A. Artes-Rodriguez, and E. Baca-Garcia, “Improving theaccuracy of suicide attempter classification,” Artificial intelligencein medicine, vol. 52, no. 3, pp. 165–168, 2011.

[10] G. Liu, C. Wang, K. Peng, H. Huang, Y. Li, and W. Cheng,“SocInf: Membership inference attacks on social media healthdata with machine learning,” IEEE Transactions on ComputationalSocial Systems, 2019.

[11] B. O’Dea, S. Wan, P. J. Batterham, A. L. Calear, C. Paris,and H. Christensen, “Detecting suicidality on twitter,” InternetInterventions, vol. 2, no. 2, pp. 183–188, 2015.

[12] H.-C. Shing, S. Nair, A. Zirikly, M. Friedenberg, H. Daume III,and P. Resnik, “Expert, crowdsourced, and machine assessmentof suicide risk via online postings,” in Proceedings of the FifthWorkshop on Computational Linguistics and Clinical Psychology: FromKeyboard to Clinic, 2018, pp. 25–36.

[13] F. Ren, X. Kang, and C. Quan, “Examining accumulated emotionaltraits in suicide blogs with an emotion topic model,” IEEE journalof biomedical and health informatics, vol. 20, no. 5, pp. 1384–1396,2016.

[14] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentimentanalysis in social media,” Knowledge and Information Systems, pp.1–47, 2018.

[15] A. Benton, M. Mitchell, and D. Hovy, “Multi-task learning formental health using social media text,” in EACL. Associationfor Computational Linguistics, 2017.

[16] S. Ji, C. P. Yu, S.-f. Fung, S. Pan, and G. Long, “Supervisedlearning for suicidal ideation detection in online user content,”Complexity, 2018.

[17] S. Ji, G. Long, S. Pan, T. Zhu, J. Jiang, and S. Wang, “Detectingsuicidal ideation with data protection in online communities,” inDASFAA. Springer, Cham, 2019, pp. 225–229.

[18] J. Tighe, F. Shand, R. Ridani, A. Mackinnon, N. De La Mata, andH. Christensen, “Ibobbly mobile health intervention for suicideprevention in australian indigenous youth: a pilot randomisedcontrolled trial,” BMJ open, vol. 7, no. 1, p. e013518, 2017.

[19] N. N. G. de Andrade, D. Pawson, D. Muriello, L. Donahue, andJ. Guadagno, “Ethics and artificial intelligence: suicide preventionon facebook,” Philosophy & Technology, vol. 31, no. 4, pp. 669–684,2018.

[20] L. C. McKernan, E. W. Clayton, and C. G. Walsh, “Protecting lifewhile preserving liberty: Ethical recommendations for suicideprevention with artificial intelligence,” Frontiers in psychiatry,vol. 9, p. 650, 2018.

[21] K. P. Linthicum, K. M. Schafer, and J. D. Ribeiro, “Machinelearning in suicide science: Applications and ethics,” Behavioralsciences & the law, vol. 37, no. 3, pp. 214–222, 2019.

[22] S. Scherer, J. Pestian, and L.-P. Morency, “Investigating the speechcharacteristics of suicidal adolescents,” in 2013 IEEE InternationalConference on Acoustics, Speech and Signal Processing. IEEE, 2013,pp. 709–713.

[23] D. Sikander, M. Arvaneh, F. Amico, G. Healy, T. Ward, D. Kearney,E. Mohedano, J. Fagan, J. Yek, A. F. Smeaton et al., “Predicting riskof suicide using resting state heart rate,” in Signal and InformationProcessing Association Annual Summit and Conference (APSIPA),2016 Asia-Pacific. IEEE, 2016, pp. 1–4.

[24] M. A. Just, L. Pan, V. L. Cherkassky, D. L. McMakin, C. Cha, M. K.Nock, and D. Brent, “Machine learning of neural representationsof suicide and emotion concepts identifies suicidal youth,” Naturehuman behaviour, vol. 1, no. 12, p. 911, 2017.

[25] N. Jiang, Y. Wang, L. Sun, Y. Song, and H. Sun, “An erp studyof implicit emotion processing in depressed suicide attempters,”in Information Technology in Medicine and Education (ITME), 20157th International Conference on. IEEE, 2015, pp. 37–40.

[26] M. Lotito and E. Cook, “A review of suicide risk assessmentinstruments and approaches,” Mental Health Clinician, vol. 5,no. 5, pp. 216–223, 2015.

[27] Z. Tan, X. Liu, X. Liu, Q. Cheng, and T. Zhu, “Designingmicroblog direct messages to engage social media users withsuicide ideation: interview and survey study on weibo,” Journalof medical Internet research, vol. 19, no. 12, 2017.

[28] Y.-P. Huang, T. Goh, and C. L. Liew, “Hunting suicide notesin web 2.0-preliminary findings,” in Multimedia Workshops, 2007.ISMW’07. Ninth IEEE International Symposium on. IEEE, 2007,pp. 517–521.

[29] K. D. Varathan and N. Talib, “Suicide detection system based ontwitter,” in Science and Information Conference (SAI), 2014. IEEE,2014, pp. 785–788.

[30] J. Jashinsky, S. H. Burton, C. L. Hanson, J. West, C. Giraud-Carrier,M. D. Barnes, and T. Argyle, “Tracking suicide risk factors throughtwitter in the us,” Crisis, 2014.

[31] J. F. Gunn and D. Lester, “Twitter postings and suicide: Ananalysis of the postings of a fatal suicide in the 24 hours priorto death,” Suicidologi, vol. 17, no. 3, 2015.

[32] G. Coppersmith, R. Leary, E. Whyne, and T. Wood, “Quantifyingsuicidal ideation via language usage on social media,” in JointStatistics Meetings Proceedings, Statistical Computing Section, JSM,2015.

[33] G. B. Colombo, P. Burnap, A. Hodorog, and J. Scourfield,“Analysing the connectivity and communication of suicidal userson twitter,” Computer communications, vol. 73, pp. 291–300, 2016.

[34] G. Coppersmith, K. Ngo, R. Leary, and A. Wood, “Exploratoryanalysis of social media prior to a suicide attempt,” in Proceedingsof the Third Workshop on Computational Linguistics and ClinicalPsychology, 2016, pp. 106–117.

[35] P. Solano, M. Ustulin, E. Pizzorno, M. Vichi, M. Pompili, G. Ser-afini, and M. Amore, “A google-based approach for monitoringsuicide risk,” Psychiatry research, vol. 246, pp. 581–586, 2016.

[36] M. E. Larsen, N. Cummins, T. W. Boonstra, B. O’Dea, J. Tighe,J. Nicholas, F. Shand, J. Epps, and H. Christensen, “The use oftechnology in suicide prevention,” in Engineering in Medicine andBiology Society (EMBC), 2015 37th Annual International Conferenceof the IEEE. IEEE, 2015, pp. 7316–7319.

[37] H. Y. Huang and M. Bashir, “Online community and suicideprevention: Investigating the linguistic cues and reply bias,” inCHI’16, 2016.

[38] M. De Choudhury and E. Kiciman, “The language of socialsupport in social media and its effect on suicidal ideation risk,”in Eleventh International AAAI Conference on Web and Social Media,2017.

[39] N. Masuda, I. Kurahashi, and H. Onari, “Suicide ideation ofindividuals in online social networks,” PloS one, vol. 8, no. 4, p.e62262, 2013.

[40] S. Chattopadhyay, “A study on suicidal risk analysis,” in e-Health Networking, Application and Services, 2007 9th InternationalConference on. IEEE, 2007, pp. 74–78.

[41] D. Delgado-Gomez, H. Blasco-Fontecilla, F. Sukno, M. S. Ramos-Plasencia, and E. Baca-Garcia, “Suicide attempters classification:Toward predictive models of suicidal behavior,” Neurocomputing,vol. 92, pp. 3–8, 2012.

[42] S. Chattopadhyay, “A mathematical model of suicidal-intent-estimation in adults,” American Journal of Biomedical Engineering,vol. 2, no. 6, pp. 251–262, 2012.

[43] W. Wang, L. Chen, M. Tan, S. Wang, and A. P. Sheth, “Discoveringfine-grained sentiment in suicide notes,” Biomedical informaticsinsights, vol. 5, no. Suppl 1, p. 137, 2012.

[44] A. Abboute, Y. Boudjeriou, G. Entringer, J. Aze, S. Bringay,and P. Poncelet, “Mining twitter for suicide prevention,” inInternational Conference on Applications of Natural Language to DataBases/Information Systems. Springer, 2014, pp. 250–253.

[45] E. Okhapkina, V. Okhapkin, and O. Kazarin, “Adaptation ofinformation retrieval methods for identifying of destructiveinformational influence in social networks,” in Advanced Informa-tion Networking and Applications Workshops (WAINA), 2017 31stInternational Conference on. IEEE, 2017, pp. 87–92.

[46] M. Mulholland and J. Quinn, “Suicidal tendencies: The automaticclassification of suicidal and non-suicidal lyricists using nlp.” inIJCNLP, 2013, pp. 680–684.

[47] X. Huang, L. Zhang, D. Chiu, T. Liu, X. Li, and T. Zhu, “Detect-ing suicidal ideation in chinese microblogs with psychologicallexicons,” in Ubiquitous Intelligence and Computing, 2014 IEEE 11thIntl Conf on and IEEE 11th Intl Conf on and Autonomic and TrustedComputing, and IEEE 14th Intl Conf on Scalable Computing andCommunications and Its Associated Workshops (UTC-ATC-ScalCom).IEEE, 2014, pp. 844–849.

[48] X. Huang, X. Li, T. Liu, D. Chiu, T. Zhu, and L. Zhang, “Topicmodel for identifying suicidal ideation in chinese microblog,”in Proceedings of the 29th Pacific Asia Conference on Language,Information and Computation, 2015, pp. 553–562.

[49] Y.-M. Tai and H.-W. Chiu, “Artificial neural network analysison suicide and self-harm history of taiwanese soldiers,” in

12

Innovative Computing, Information and Control, 2007. ICICIC’07.Second International Conference on. IEEE, 2007, pp. 363–363.

[50] M. Liakata, J. H. Kim, S. Saha, J. Hastings, and D. Rebholzschuh-mann, “Three hybrid classifiers for the detection of emotions insuicide notes,” Biomedical Informatics Insights,2012,Suppl. 1(2012-01-30), vol. 2012, no. (Suppl. 1), pp. 175–184, 2012.

[51] J. Pestian, H. Nasrallah, P. Matykiewicz, A. Bennett, andA. Leenaars, “Suicide note classification using natural languageprocessing: A content analysis,” Biomedical informatics insights,vol. 2010, no. 3, p. 19, 2010.

[52] S. R. Braithwaite, C. Giraud-Carrier, J. West, M. D. Barnes, andC. L. Hanson, “Validating machine learning algorithms for twitterdata against established measures of suicidality,” JMIR mentalhealth, vol. 3, no. 2, p. e21, 2016.

[53] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient esti-mation of word representations in vector space,” arXiv preprintarXiv:1301.3781, 2013.

[54] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectorsfor word representation,” in Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP), 2014, pp.1532–1543.

[55] S. Ji, G. Long, S. Pan, T. Zhu, J. Jiang, S. Wang, and X. Li,“Knowledge transferring via model aggregation for online socialcare,” arXiv preprint arXiv:1905.07665, 2019.

[56] M. Gaur, A. Alambo, J. P. Sain, U. Kursuncu, K. Thirunarayan,R. Kavuluru, A. Sheth, R. Welton, and J. Pathak, “Knowledge-aware assessment of severity of suicide risk for early intervention,”in The World Wide Web Conference. ACM, 2019, pp. 514–525.

[57] G. Coppersmith, R. Leary, P. Crutchley, and A. Fine, “Naturallanguage processing of social media as screening for suicide risk,”Biomedical informatics insights, 2018.

[58] R. Sawhney, P. Manchanda, P. Mathur, R. Shah, and R. Singh,“Exploring and learning suicidal ideation connotations on socialmedia with deep learning,” in Proceedings of the 9th Workshop onComputational Approaches to Subjectivity, Sentiment and Social MediaAnalysis, 2018, pp. 167–175.

[59] A. Zirikly, P. Resnik, O. Uzuner, and K. Hollingshead, “Clpsych2019 shared task: Predicting the degree of suicide risk in redditposts,” in Proceedings of the Sixth Workshop on ComputationalLinguistics and Clinical Psychology, 2019, pp. 24–33.

[60] A. G. Hevia, R. C. Menendez, and D. Gayo-Avello, “Analyzingthe use of existing systems for the clpsych 2019 shared task,” inProceedings of the Sixth Workshop on Computational Linguistics andClinical Psychology, 2019, pp. 148–151.

[61] M. Morales, P. Dey, T. Theisen, D. Belitz, and N. Chernova,“An investigation of deep learning systems for suicide riskassessment,” in Proceedings of the Sixth Workshop on ComputationalLinguistics and Clinical Psychology, 2019, pp. 177–181.

[62] M. Matero, A. Idnani, Y. Son, S. Giorgi, H. Vu, M. Zamani,P. Limbachiya, S. C. Guntuku, and H. A. Schwartz, “Suiciderisk assessment with multi-level dual-context language and bert,”in Proceedings of the Sixth Workshop on Computational Linguisticsand Clinical Psychology, 2019, pp. 39–44.

[63] L. Chen, A. Aldayel, N. Bogoychev, and T. Gong, “Similar mindspost alike: Assessment of suicide risk using a hybrid model,” inProceedings of the Sixth Workshop on Computational Linguistics andClinical Psychology, 2019, pp. 152–157.

[64] X. Zhao, S. Lin, and Z. Huang, “Text classification of micro-blog’stree hole based on convolutional neural network,” in Proceedingsof the 2018 International Conference on Algorithms, Computing andArtificial Intelligence. ACM, 2018, p. 61.

[65] T. Tran, D. Phung, W. Luo, R. Harvey, M. Berk, and S. Venkatesh,“An integrated framework for suicide risk prediction,” in Proceed-ings of the 19th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 2013, pp. 1410–1418.

[66] S. Berrouiguet, R. Billot, P. Lenca, P. Tanguy, E. Baca-Garcia, M. Si-monnet, and B. Gourvennec, “Toward e-health applications forsuicide prevention,” in 2016 IEEE First International Conference onConnected Health: Applications, Systems and Engineering Technologies(CHASE). IEEE, 2016, pp. 346–347.

[67] D. Meyer, J.-A. Abbott, I. Rehm, S. Bhar, A. Barak, G. Deng,K. Wallace, E. Ogden, and B. Klein, “Development of a suicidalideation detection tool for primary healthcare settings: usingopen access online psychosocial data,” Telemedicine and e-Health,vol. 23, no. 4, pp. 273–281, 2017.

[68] K. M. Harris, J. P. McLean, and J. Sheffield, “Suicidal and online:How do online behaviors inform us of this high-risk population?”Death studies, vol. 38, no. 6, pp. 387–394, 2014.

[69] H. Sueki, “The association of suicide-related twitter use withsuicidal behaviour: a cross-sectional study of young internetusers in japan,” Journal of affective disorders, vol. 170, pp. 155–160,2015.

[70] K. W. Hammond, R. J. Laundry, T. M. OLeary, and W. P. Jones,“Use of text search to effectively identify lifetime prevalence of sui-cide attempts among veterans,” in 2013 46th Hawaii InternationalConference on System Sciences. IEEE, 2013, pp. 2676–2683.

[71] C. G. Walsh, J. D. Ribeiro, and J. C. Franklin, “Predicting risk ofsuicide attempts over time through machine learning,” ClinicalPsychological Science, vol. 5, no. 3, pp. 457–469, 2017.

[72] T. Iliou, G. Konstantopoulou, M. Ntekouli, D. Lymberopoulos,K. Assimakopoulos, D. Galiatsatos, and G. Anastassopoulos,“Machine learning preprocessing method for suicide prediction,”in Artificial Intelligence Applications and Innovations, L. Iliadis andI. Maglogiannis, Eds. Cham: Springer International Publishing,2016, pp. 53–60.

[73] T. Nguyen, T. Tran, S. Gopakumar, D. Phung, and S. Venkatesh,“An evaluation of randomized machine learning methods forredundant data: Predicting short and medium-term suicide riskfrom administrative records and risk assessments,” arXiv preprintarXiv:1605.01116, 2016.

[74] H. S. Bhat and S. J. Goldman-Mellor, “Predicting adolescentsuicide attempts with neural networks,” in NIPS 2017 Workshopon Machine Learning for Health, 2017.

[75] J. P. Pestian, P. Matykiewicz, M. Linn-Gust, B. South, O. Uzuner,J. Wiebe, K. B. Cohen, J. Hurdle, and C. Brew, “Sentiment analysisof suicide notes: A shared task,” Biomedical informatics insights,vol. 5, no. Suppl. 1, p. 3, 2012.

[76] E. White and L. J. Mazlack, “Discerning suicide notes causalityusing fuzzy cognitive maps,” in Fuzzy Systems (FUZZ), 2011 IEEEInternational Conference on. IEEE, 2011, pp. 2940–2947.

[77] B. Desmet and V. Hoste, “Emotion detection in suicide notes,”Expert Systems with Applications, vol. 40, no. 16, pp. 6351–6358,2013.

[78] R. Wicentowski and M. R. Sydes, “emotion detection in suicidenotes using maximum entropy classification,” Biomedical informat-ics insights, vol. 5, pp. BII–S8972, 2012.

[79] A. Kovacevic, A. Dehghan, J. A. Keane, and G. Nenadic, “Topiccategorisation of statements in suicide notes with integrated rulesand machine learning,” Biomedical informatics insights, vol. 5, pp.BII–S8978, 2012.

[80] A. M. Schoene and N. Dethlefs, “Automatic identification of sui-cide notes from linguistic and sentiment features,” in Proceedingsof the 10th SIGHUM Workshop on Language Technology for CulturalHeritage, Social Sciences, and Humanities, 2016, pp. 128–133.

[81] J. Robinson, G. Cox, E. Bailey, S. Hetrick, M. Rodrigues, S. Fisher,and H. Herrman, “Social media and suicide prevention: asystematic review,” Early intervention in psychiatry, vol. 10, no. 2,pp. 103–121, 2016.

[82] Y. Wang, S. Wan, and C. Paris, “The role of features and contexton suicide ideation detection,” in Proceedings of the AustralasianLanguage Technology Association Workshop 2016, 2016, pp. 94–102.

[83] A. Shepherd, C. Sanders, M. Doyle, and J. Shaw, “Using socialmedia for support and feedback by mental health service users:thematic analysis of a twitter conversation,” BMC psychiatry,vol. 15, no. 1, p. 29, 2015.

[84] M. De Choudhury and S. De, “Mental health discourse on reddit:Self-disclosure, social support, and anonymity.” in ICWSM, 2014.

[85] M. De Choudhury, E. Kiciman, M. Dredze, G. Coppersmith, andM. Kumar, “Discovering shifts to suicidal ideation from mentalhealth content in social media,” in Proceedings of the 2016 CHIConference on Human Factors in Computing Systems. ACM, 2016,pp. 2098–2110.

[86] M. Kumar, M. Dredze, G. Coppersmith, and M. De Choudhury,“Detecting changes in suicide content manifested in social mediafollowing celebrity suicides,” in Proceedings of the 26th ACMConference on Hypertext & Social Media. ACM, 2015, pp. 85–94.

[87] L. Guan, B. Hao, Q. Cheng, P. S. Yip, and T. Zhu, “Identifyingchinese microblog users with high suicide probability usinginternet-based profile and linguistic features: classification model,”JMIR mental health, vol. 2, no. 2, p. e17, 2015.

13

[88] S. J. Cash, M. Thelwall, S. N. Peck, J. Z. Ferrell, and J. A. Bridge,“Adolescent suicide statements on myspace,” Cyberpsychology,Behavior, and Social Networking, vol. 16, no. 3, pp. 166–174, 2013.

[89] I. Gilat, Y. Tobin, and G. Shahar, “Offering support to suicidalindividuals in an online support group,” Archives of SuicideResearch, vol. 15, no. 3, pp. 195–206, 2011.

[90] A. Yates, A. Cohan, and N. Goharian, “Depression and self-harmrisk assessment in online forums,” in Proceedings of the 2017Conference on Empirical Methods in Natural Language Processing,2017, pp. 2968–2978.

[91] Y. Wang, J. Tang, J. Li, B. Li, Y. Wan, C. Mellina, N. O’Hare, andY. Chang, “Understanding and discovering deliberate self-harmcontent in social media,” in Proceedings of the 26th InternationalConference on World Wide Web. International World Wide WebConferences Steering Committee, 2017, pp. 93–102.

[92] Q. Li, Y. Xue, L. Zhao, J. Jia, and L. Feng, “Analyzing andidentifying teens stressful periods and stressor events from amicroblog,” IEEE journal of biomedical and health informatics, vol. 21,no. 5, pp. 1434–1448, 2016.

[93] Z. Huang, J. Yang, F. van Harmelen, and Q. Hu, “Constructingknowledge graphs of depression,” in International Conference onHealth Information Science. Springer, 2017, pp. 149–161.

[94] F. Hao, G. Pang, Y. Wu, Z. Pi, L. Xia, and G. Min, “Providingappropriate social support to prevention of depression for highlyanxious sufferers,” IEEE Transactions on Computational SocialSystems, 2019.

[95] A. E. Aladag, S. Muderrisoglu, N. B. Akbas, O. Zahmacioglu,and H. O. Bingol, “Detecting suicidal ideation on forums: proof-of-concept study,” Journal of medical Internet research, vol. 20, no. 6,p. e215, 2018.

[96] D. N. Milne, G. Pink, B. Hachey, and R. A. Calvo, “CLPsych 2016shared task: Triaging content in online peer-support forums,” inProceedings of the Third Workshop on Computational Lingusitics andClinical Psychology, 2016, pp. 118–127.

[97] K. Haerian, H. Salmasian, and C. Friedman, “Methods foridentifying suicide or suicidal ideation in ehrs,” in AMIA annualsymposium proceedings, vol. 2012. American Medical InformaticsAssociation, 2012, p. 1244.

[98] D. E. Losada and F. Crestani, “A test collection for research ondepression and language use,” in International Conference of theCross-Language Evaluation Forum for European Languages. Springer,2016, pp. 28–39.

[99] G. Coppersmith, M. Dredze, C. Harman, K. Hollingshead, andM. Mitchell, “Clpsych 2015 shared task: Depression and ptsdon twitter,” in Proceedings of the 2nd Workshop on ComputationalLinguistics and Clinical Psychology: From Linguistic Signal to ClinicalReality, 2015, pp. 31–39.

[100] X. Liu, X. Liu, J. Sun, N. X. Yu, B. Sun, Q. Li, and T. Zhu, “Proactivesuicide prevention online (pspo): Machine identification and crisismanagement for chinese social media users with suicidal thoughtsand behaviors,” Journal of Medical Internet Research, vol. 21, no. 5,p. e11705, 2019.

suicidal ideation detection: a review of machine learning ... · suicidal ideation detection: a...

Documents