role of machine learning algorithms for alcohol addiction

13
Role of Machine Learning Algorithms for Alcohol Addiction Impacts M.Kokila 1 , Gayathriselvaraj 2 , R.Beulah 3 , S.Karthick 4 1 Department of Statistics(Unaided), PSG College of Arts and Science, Coimbatore, India, 2 PhD Research Scholar, Department of Computer Applications, Bharathiar University, Coimbatore, India 3PhD Research Scholar, Department of Computer Science, Sri Ramakrishna Mission Vidyalaya, Coimbatore, India Abstract: Alcoholism which is also known as alcohol use disorder, is a condition in which a person has a desire or physical need to consume alcohol, even though it has a negative impact on their life. This paper conducts a study related to human alcohol consumption and their life expectancies. This work also identifies the relevant topics that are being discussed along with alcohol consumption and its effects on our body and health from an analysis of a set of research papers that are written regarding the aimed context. Machine Learning technologies pawed way for this study and the algorithms Linear Regression and Latent Dirichlet Allocation (LDA a generative statistical model) has used here. This paper estimates the relationship and the degree of relationship among alcohol consumption and life expectancy through regression analysis and tries to get insights from it. Topic modeling has used for discovering the abstract topics hidden in the selected research papers by applying Latent Dirichlet Allocation. Keywords: Alcoholism, Latent Dirichlet Allocation, Topic modeling, Machine Learning 1. INTRODUCTION Almost everything we do, say and think is controlled by our brain, so when our brain is injured it has the potential to affect every aspect of life. The brain is the most complex vital organ in the human body. It is the central nervous system of the body. Brain damage can affect many organs including memory, sensation and even personality. Few Brain Disorders are Alzheimer ’s disease, Dementias, Brain Cancer, Epilepsy and Other Seizure Disorders, Mental Disorders, Parkinson’s and Other Movement Disorders, Stroke and Transient Ischemic Attack (TIA). The psychological conditions related to brain disorders are such as anxiety, depression and schizophrenia. A brain disorder is caused when there is trouble to the brain after birth, such as Falls, accidents, assault, concussion and other trauma, Stroke and other vascular disease, lack of oxygen (e.g. near drowning), Alzheimer's disease and other dementias, Degenerative diseases (e.g. dementia, Parkinson's disease), Parkinson's disease, Alcohol and other drugs, Brain tumors, Epilepsy and Infections and diseases (e.g. meningitis).Mental health is a major concern worldwide and India is not far behind in sharing this. If we evaluate developments in the field of mental health, the pace appears to be slow. About one-third of the total disease burden in developing countries comes from brain disorders. In India alone, there are 3.7 million people with dementia and the numbers are expected to double by 2030. One of the major cause for Brain disorder is Alcohol Addiction. Alcohol consumption is the leading risk factor for disease burden in the developing countries and the third largest risk factor in developed countries. The proportion of disease burden attributable to alcohol use in the developing countries is between 2.6% to 9.8%of the total burden for males and 0.5% to 2.0% of the total burden for females. Zeichen Journal Volume 7, Issue 1, 2021 ISSN No: 0932-4747 Page No:119

Upload: others

Post on 17-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Role of Machine Learning Algorithms for Alcohol Addiction

Role of Machine Learning Algorithms for Alcohol Addiction Impacts

M.Kokila1, Gayathriselvaraj2, R.Beulah3, S.Karthick4

1Department of Statistics(Unaided), PSG College of Arts and Science, Coimbatore, India,

2PhD Research Scholar, Department of Computer Applications, Bharathiar University, Coimbatore, India

3PhD Research Scholar, Department of Computer Science, Sri Ramakrishna Mission Vidyalaya, Coimbatore, India

Abstract: Alcoholism which is also known as alcohol use disorder, is a condition in which a person has a desire or physical need to consume alcohol, even though it has a negative impact on their life. This paper conducts a study related to human alcohol consumption and their life expectancies. This work also identifies the relevant topics that are being discussed along with alcohol consumption and its effects on our body and health from an analysis of a set of research papers that are written regarding the aimed context. Machine Learning technologies pawed way for this study and the algorithms Linear Regression and Latent Dirichlet Allocation (LDA a generative statistical model) has used here. This paper estimates the relationship and the degree of relationship among alcohol consumption and life expectancy through regression analysis and tries to get insights from it. Topic modeling has used for discovering the abstract topics hidden in the selected research papers by applying Latent Dirichlet Allocation.

Keywords: Alcoholism, Latent Dirichlet Allocation, Topic modeling, Machine Learning

1. INTRODUCTION Almost everything we do, say and think is controlled by our brain, so when our

brain is injured it has the potential to affect every aspect of life. The brain is the most complex vital organ in the human body. It is the central nervous system of the body. Brain damage can affect many organs including memory, sensation and even personality. Few Brain Disorders are Alzheimer ’s disease, Dementias, Brain Cancer, Epilepsy and Other Seizure Disorders, Mental Disorders, Parkinson’s and Other Movement Disorders, Stroke and Transient Ischemic Attack (TIA). The psychological conditions related to brain disorders are such as anxiety, depression and schizophrenia. A brain disorder is caused when there is trouble to the brain after birth, such as Falls, accidents, assault, concussion and other trauma, Stroke and other vascular disease, lack of oxygen (e.g. near drowning), Alzheimer's disease and other dementias, Degenerative diseases (e.g. dementia, Parkinson's disease), Parkinson's disease, Alcohol and other drugs, Brain tumors, Epilepsy and Infections and diseases (e.g. meningitis).Mental health is a major concern worldwide and India is not far behind in sharing this. If we evaluate developments in the field of mental health, the pace appears to be slow. About one-third of the total disease burden in developing countries comes from brain disorders. In India alone, there are 3.7 million people with dementia and the numbers are expected to double by 2030. One of the major cause for Brain disorder is Alcohol Addiction.

Alcohol consumption is the leading risk factor for disease burden in the developing countries and the third largest risk factor in developed countries. The proportion of disease burden attributable to alcohol use in the developing countries is between 2.6% to 9.8%of the total burden for males and 0.5% to 2.0% of the total burden for females.

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:119

Page 2: Role of Machine Learning Algorithms for Alcohol Addiction

Worldwide studies have addressed the behaviour of the college students is vulnerable to the consumption of alcohol, because of a sense of being free from family control and especially entrance to adulthood. Statistics also prove that a major cause of death is young adult involved in high risk of accident which is due to the influence of alcohol consumption Besides the direct toxic effects of intoxication and addiction, consumption of alcohol causes about 20% to 30% of each of oesophageal cancer, liver disease, homicide, epileptic seizures and motor vehicle accidents worldwide. Heavy alcohol use increases the risk of cardio vascular disease and stroke. Alcohol consumption during pregnancy is related to various risks to the fetus, which include Fetal Alcohol Spectrum Disorders. Alcohol consumption during pregnancy can also lead to spontaneous abortion, low birth weight and premature confinement .Higher volume of alcohol consumption is also associated with depression. Excessive alcohol consumption can severely impair individual's functioning in social roles such as parent spouse or partner. As a group, alcoholics share this constellation of behaviours characteristic of frontal lobe dysfunction, which includes, impaired judgement, poor insight, distractibility and short - tempered arguments even leading to a group clash causing menace to the society.Mainly it affects our brain functionality. It affects white and gray matter in the brain which leads to cancer. Few researchers have done their work on gray matter reduction associated with alcohol addiction using Machine Learning Algorithm. Machine Learning Algorithm is a technology to sift the data for needed information also called data surfing. It uses artificial intelligence, neural networks, and statistics to reveal trends, patterns and relationships which are undetected or hidden in the data

2. REVIEW OF STATUS OF RESEARCH AND DEVELOPMENT IN THE SUBJECT

The transition of the adulthood happens between 18-25 years old, approximately. Several experiences and discoveries take place in this period, including the consumption of alcohol. In addition to the individual's personality itself many variables influence drinking behaviours, genetics, gender, ethnicity, college, religiosity, occupation, marital status friends, family and community. Worldwide studies have addressed the behaviour of the college students vulnerable to the consumption of alcohol, because of a sense of being free from family control and especially entrance to adulthood. Statistics also prove that a major cause of death is young adults involved in high risk of accidents which are due to the influence of alcohol consumption. 2.1 International Status

The research on this topic at the global level reveals that alcohol could affect the academic performance significantly. This is particularly important considering high prevalence of alcohol ingestion by college students and their more vulnerable age. Brazilian studies with college students detected prevalence of alcohol use between 65% to 92%. The mean prevalence of lifetime alcohol consumption in the research was 93.1%, a result that is line with those obtained by researchers in Sao Paulo(SP, Brazil), which detected a prevalence of life time use of 89.6% for women and 93.5% for men. A study carried out in Boston(USA) involving the college students between 21 -24 years old found out that students presented mood alterations, deficit of attention and reaction time the day after an episode of binge drinking, even through their reading performance remain unaltered. A research work carried out at the University of Michigan revealed that most of the students grow up in a culture which equates the consumption of alcohol with having fun, relaxing, making social situations complete and reducing tension. The University reported that as per the findings of the large scale research project conducted by the National Institute of Alcohol Abuse and Alcoholism (NIAAA), there is a significant percentage of college students die each year due to alcohol-related injuries and more incidents of assault among the students take place while they are under the influence of alcohol. Binge drinking prevails in the campus, which is, a discrete member of drinks on a single occasion. The generally agreed upon numbers are 5 or more drinks in a single episode for men and either 4 or 5 drinks for women. The research work carries out in the University of Northampton

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:120

Page 3: Role of Machine Learning Algorithms for Alcohol Addiction

(United Kingdom) on the "Student Choices and Alcohol Matters"(SCAM) based on cross-sectional survey and interviews revealed that majority of the students surveyed(83%) classified themselves as drinkers, with only 17% classifying themselves as abstainers. About 62.6% of the student drinkers reported that they consume alcohol at least once or twice a week and 19.6% reported drinking more often and 4.2% drank alcohol nearly every day. The female students(44.6) drank more than the recommended maximum of 3-4 drinks per day, whilst 30% of male students drank more than the recommended limit of 5-6 per day.The study in South Africa States that 42.2% of men and 18.3% of women were found to be hazardous drinker. The research in Nigeria revealed that the 12 month prevalence of alcohol abuse in the community was 33.23% and 57.75% were social drinkers. The study in Venezuela stated that the prevalence of 86.5 among men and 7.5% among women. 2.2 National Status

During 1992-2012, the per capita consumption of alcohol in India has increased by a shopping 55% the third highest increase in the world, after Russian Federation and Estonia. This rapid rise in alcohol consumption is an alarming figure. The World Health Organization(WHO) released its Global Status Report on Alcohol and Health. According to the Report, about 30% of India's population, just than a third of country's populace, consumed alcohol regularly. About 11% are moderated drinkers. The average Indian consumes about 4.3 litres of alcohol per annum and the rural average is much higher at about 11.4% litres per year. In 2012, about 3.3 million deaths were attributed to alcohol consumption that amounts to 5.9% of global deaths that year. Apart from the health concerns, chronic alcoholism is one of the greatest causes for poverty in India. According to various studies, men primary bread earners are 10 times more likely to report alcohol abuse in the country. The regular consumption of alcohol is also inversely proportional to the family income, this means that consumption increases significantly with diminishing income. One Indian dies every 96 minutes due to alcohol consumption. The University scenario in India reveals that most of the students consume alcohol on special occasions (54%) and 25% consume on weekends only. 8% of students consume on alternated days, known as Binge drinkers. Peer pressure(15%) is one of the biggest reasons among the students to start consuming alcohol as they see it as trend or cool thing to do and sometimes to be a part of a group, they start consuming it. About 46% of the students consume it to have fun, 32% for relaxation and 3% when sad or depressed, 12% as an alternate drink and 7% due to the influence in a group activity. These studies indicate that peer pressure, academic stress, emotional stress and the behavioural attributes to have fun are the causes for the consumption of alcohol in the educational institutions.

3. DATA ANALYSIS AND MODELING Data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making processes. There are mainly four types of data analysis/techniques that we are making use of in our day today life. Descriptive Analysis: Descriptive analysis as the name suggests is just descriptive. It answers the question “what happened?” .They does not go generalizing beyond the data in hand. It is the simplest of all and with the most practical uses in the business today. Diagnostic Analysis: At this stage, historical data can be measured against other data to answer the question of “why something happened?” .There is a possibility to drill down, find out dependencies and identify patterns. Companies go for diagnostic analytics as it gives in-depth insights into a particular problem. Predictive Analysis: Predictive analytics tells “what is likely to happen ?”.It uses the findings of descriptive and diagnostic analytics to detect tendencies, clusters and exceptions, and to predict future trends, which makes it a valuable tool for forecasting. Prescriptive analysis is the area of business analytics dedicated to finding the best course of action for any given situation. Apart from answering “What is likely to happen?” it also tackles “what should we do to reach the desired outcome?” component. Prescriptive analysis is the frontier of data analysis; it takes into the insights of all the previous

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:121

Page 4: Role of Machine Learning Algorithms for Alcohol Addiction

analyses to determine the best course of action for a problem or decision. Data modeling is a set of tools and techniques used for correctly identifying and applying the set of insights that we inferred from data analysis. Data modeling comes in hand with predictive analysis where we are predicting the future scope or values from the analyzed data. Data modeling can be done with the help of statistical and machine learning algorithms. 3.1 Big Data and Technologies

Big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems and other real world issues. That is It can be seen as a combination of structured, semi structured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications. Big data is often characterized by the 3Vs: the large volume of data in many environments, the wide variety of data types stored in big data systems and the velocity at which the data is generated, collected and processed.

3.2 Machine Learning

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly. There are mainly four types of machine learning algorithms namely Supervised Machine Learning Algorithm, Un Supervised Machine Learning Algorithm, Semi Supervised Machine Learning Algorithm, Reinforcement Machine Learning Algorithm

3.3 Text Mining and Natural Language Processing (NLP)

Text mining is the process of exploring and analyzing large amounts of unstructured text data aided by software that can identify concepts, patterns, topics, keywords and other attributes in the data. It's also known as text analytics, although some people draw a distinction between the two terms; in this view, text analytics is an application enabled by the use of text mining techniques to sort through data sets. Text mining has become more practical for data scientists and other users due to the development of big data platforms and deep learning algorithms that can analyze massive sets of unstructured data. The five Text Mining systems are Information Extraction, Natural Language Processing, Data Mining, Information Retrieval. Natural language processing (or NLP) is a component of text mining that performs a special kind of linguistic analysis that essentially helps a machine “read” text. NLP uses a variety of methodologies to decipher the ambiguities in human language, including the following: automatic summarization, part-of-speech tagging, disambiguation, entity extraction and relations extraction, as well as disambiguation and natural language understanding and recognition.

3.4 Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) is an example of topic model and is used to

classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions. The LDA is based upon two general assumptions they are Documents that have similar words usually have the same topic and Documents that have groups of words frequently occurring together usually have the same topic. These assumptions make sense because the documents that have the same topic, for instance, Business topics will have words like the

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:122

Page 5: Role of Machine Learning Algorithms for Alcohol Addiction

"economy", "profit", "the stock market", "loss", etc. The second assumption states that if these words frequently occur together in multiple documents, those documents may belong to the same category Mathematically, the above two assumptions can be represented as Documents are probability distributions over latent topics and Topics are probability distribution over words applications

4. RESEARCH METHODOLOGY Dataset Used: Alcohol a risk factor for mortality (csv file)This dataset gives the alcohol-attributable fraction (AAF), which is the proportion of deaths which are caused or exacerbated by alcohol (i.e. that proportion which would disappear if alcohol consumption was removed).

Fig1 Alcohol-attribute fraction Data

List of Attributes are Country Name, Country Code, Year, Indiicator Modaliity

(i) Drinks (csv file)

This file contains the data of different countries along with number of servings of different drinks and calculates the average number of liters of pure alcohol a person consumes. Servings of three types of drinks are given as integers and total number of alcohol consumption as decimal Source: https://data.world Dataset shape: (194, 5)

Fig2 Drinks data

5. PROPOSED METHODOLOGY

5.1 Data Preprocessing

Text Preprocessing

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:123

Page 6: Role of Machine Learning Algorithms for Alcohol Addiction

5.1.1. Removing New line characters

Fig 3 Output after removing new line characters Fig 4 Output after removing punctuations and special characters 5.1.3 Removing Numbers

Numbers are not useful for this data modeling and hence they are removed through the following regular expression.

Fig 5 Output after removing numbers 5.1.4. Removing Test and Lower case

Converting text into Lowercase: The entire text document is converted into lower case

words

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:124

Page 7: Role of Machine Learning Algorithms for Alcohol Addiction

Fig 6 Output after converting text into lowercase5.1.5. Stop words Removal

A stop word is a commonly used word that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. So they are removed.

Fig 7 Output after removing stop words

5.1.6. RemoveLemmatization

Lemmatization is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word's lemma, or dictionary form.

Fig 8 Output after lemmatization

Dataset Preprocessing

A. Removing null values: Null values in the data set can be viewed through isnull() function

in pandas package. To remove null values dropna() function is used.

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:125

Page 8: Role of Machine Learning Algorithms for Alcohol Addiction

B. Scaling the data: All the data values are converted into a single scale unit using

the StandardScaler() method in scikit learn.

Exploratory Data Analysis

A. Describing the data

It gives the statistical summary of the numerical columns in the data set.

Fig 9 Description of the data

Histogram Analysis

Fig 10 Histogram for “Indicator_Mortality” attribute in the data’

The histogram shows that the Alcohol-attributable fraction for mortality(AAF) that is the “indicator mortality” of people follows a right skewed distribution. Majority of the population has an AAF factor in between 0 and 7

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:126

Page 9: Role of Machine Learning Algorithms for Alcohol Addiction

Fig 11 Histogram for “total_litres_of_pure_alcohol” attribute in the data

The histogram is rightly skewed and the alcohol amount served is higher in the

range of 0 to 5 liters.

Boxplot Analysis

Fig 12 Boxplot for beer, spirit and wine servings

The boxplot(Fig.4.10) shows beer served is much greater than the

quantity of spirit and wine. But amount of wine served shows a greater number of anomalies it can be caused due to the country wise differences.

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:127

Page 10: Role of Machine Learning Algorithms for Alcohol Addiction

Fig 13 Heatmap for numerical attributes

The mortality indicator value shows positive correlation with beer, spirit, wine and alcohol consumption. As the alcohol, beer and wine consumption increases mortality rate also increases in a linear manner.

Word Cloud:Result and Discussion

Regression Modeling: Choosing modeling features

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:128

Page 11: Role of Machine Learning Algorithms for Alcohol Addiction

Fig 14 Output for choosing dependent and independent variables

Step2: Fitting a multi linear model

Fig 15 Output for Multi linear regression

Step3: Fitting Polynomial regression

Fig 16 Output for fitting Polynomial regression

Step4: Evaluating the model

Fig 17 Output showing error and model accuracy

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:129

Page 12: Role of Machine Learning Algorithms for Alcohol Addiction

Fiig 18 Graph plotting actual and predicted

So the final model fitted gives an accuracy value (R square) of 0.73

and thereby the independent variables “beer servings”, “wine servings”, “spirit servings”, and “total liters of pure alcohol” can explain 73% of the dependent variable “mortality rate”. It gives a clear idea that alcohol consumption has a great impact on mortality rate.

LDA (Latent Dirichilet Algorithm) for Topic Modeling

Fig 19 Output showing 1st prominent topic

The most dominant topic in the text data represents information regarding alcohol consumption and its effect on health. It points out the studies conducted on drinking and associated vulnerable effects. The prominent words in this topic are alcohol, consumption, use, disease, manuscript, author, drinking, associated, impact, research etc. All the featured words in this topic are almost 31.7 % of the entire tokens.

The 2nd most dominant topic talks about alcohol effect on brain, liver and blood. It consists a number of relevant words like brain, disease, risk, chronic, alcoholism, cirrhosis, stroke, emotional, family, shrinkage etc. These all implies the overall effect of alcohol on our body and health.

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:130

Page 13: Role of Machine Learning Algorithms for Alcohol Addiction

Fig 21 Output showing 3rd prominent topic

The third dominant topic includes set of words like brain, drinking, damage, alcoholic, etc. It talks primarily about the effect of alcohol on brain.

CONCLUSION Alcoholism is a villain in our society and it has a wide variety of side effects which are even mortal. In India 5.4% of deaths are caused due to alcohol and its impacts both on individual and social realms. The study conducted showed that the alcohol and related drug usage causes mortality and they both has dependency degree of 70%. A linear relationship has fitted using multi linear machine learning model and is analyzed. Similarly the topic modeling conducted also presents the various harmful sides of alcohol. They showed that alcohol consumption has affected brain, liver, mental and other bodily functionalities. But the study was limited to identifying the relationship between mortality and alcoholism. In future works it can be extended to study in a wider range, like specifically identifying alcohol effects on social economy, and relationship. Also analysis can be conducted regarding alcohol and its relation over road accidents, crimes etc. Topic modeling can also be extended in future works for better identification of alcoholic diseases also.

Reference

1. https://ourworldindata.org/alcohol-consumption

2. https://data.world/vchockal/drinks

3. https://towardsdatascience.com/end-to-end-topic-modeling-in-

python-latent- dirichlet-allocation-lda-35ce4ed6b3e0

4. https://dzone.com/articles/interactive-topic-modeling-using-python

5. https://towardsdatascience.com/polynomial-regression-bbe8b9d97491

6. https://www.researchgate.net/

Zeichen Journal

Volume 7, Issue 1, 2021

ISSN No: 0932-4747

Page No:131