des and md5 based healthcare data protection with big data ... · programming language? this...

12
DES and MD5 based Healthcare data Protection with Big Data Analytics Geetabai S Hukkeri 1 , R H Goudar 2 , 1,2 Dept. of Computer Network Engineering 1,2 Visvesvaraya Technological University Belagavi, India. 1 [email protected] 2 [email protected] August 14, 2018 Abstract In the current days big data has become a trending tech- nology that many industries like banking, education, so- cial media, transportation, etc., are started to make use of big data technology to store and process the huge amount of data being generated in their fields. Big data has also started to play an important role in the field of healthcare. Healthcare data is generating in different formats for exam- ple text, video, image, digital etc. and it is becoming too huge that it cannot be managed by any traditional database management tools. Thus, hospitals need to maintain pa- tients record in such a way that no one should able to catch the data except an authorized one. To do that every hos- pital must provide security to the patients information that is by keeping the data in cipher text format. Key Words :Big data; Hadoop; Apache Pig; Health- care; Security. 1 International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 12087-12097 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 12087

Upload: others

Post on 25-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

DES and MD5 based Healthcare dataProtection with Big Data Analytics

Geetabai S Hukkeri1, R H Goudar2,1,2Dept. of Computer Network Engineering

1,2Visvesvaraya Technological UniversityBelagavi, India.

[email protected]@gmail.com

August 14, 2018

Abstract

In the current days big data has become a trending tech-nology that many industries like banking, education, so-cial media, transportation, etc., are started to make use ofbig data technology to store and process the huge amountof data being generated in their fields. Big data has alsostarted to play an important role in the field of healthcare.Healthcare data is generating in different formats for exam-ple text, video, image, digital etc. and it is becoming toohuge that it cannot be managed by any traditional databasemanagement tools. Thus, hospitals need to maintain pa-tients record in such a way that no one should able to catchthe data except an authorized one. To do that every hos-pital must provide security to the patients information thatis by keeping the data in cipher text format.

Key Words:Big data; Hadoop; Apache Pig; Health-care; Security.

1

International Journal of Pure and Applied MathematicsVolume 120 No. 6 2018, 12087-12097ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

12087

Page 2: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

1 Introduction

Nowadays huge amount of data is being generated from many sourcesincluding social interactions, business transactions, broadcasting,electronic files and so on, likewise many hospitals, sensors, Mobilephones, organizations are generating large amount of healthcaredata. The collection of such huge amount of data is called as Bigdata. Big data can also be defined as The data which is highly im-possible to store and process by any conservative tools. The databeing generated from healthcare industry is stored in hard copyform which anybody can easily access and do their needful work.To avoid this problem it is very essential to secure the patients in-formation from being misused by unwanted people. Hospitals canimprove their healthcare delivery system and provide good qualityof care to patients with less cost by using big data analytics. Sincewe are dealing with very large amount of data; there is a questionof where that data is to be stored and how it is to be managed andprocessed? Yes there is a solution for these problems that is beingprovided by big data analytics. Big data has the capability to storeand analyze the terabytes and petabytes of data.

2 BIG DATA AND HEALTHCARE

For the first time the term Big Data was introduced by RogerMougalas in the year of 2005. Big Data is the data which is highlyimpossible to store and process by any conservative tools. Thereare two fundamental constituents involved in Big Data, namelyHadoop and MapReduce. The hadoop was also created in the yearof 2005 by yahoo. In the earlier day we used to keep the data indatabases to process it in the future, but now we have the abilityto analyze the data meanwhile it is being generated from varioussources without having to store it in any database.The 5 Vs of Big Data:Velocity: We all know that what is mean by velocity, obviously it isa speed. Velocity in big data refers to the speed at which amountof data is being produced, processed and inspected.Volume:The volume commends the fabulous amount of information pro-duced by mobile phones, sensors, video, social media, photos etc.

2

International Journal of Pure and Applied Mathematics Special Issue

12088

Page 3: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

This becomes very vast amount of data, that we cannot store andprocess by any available data management technology.Variety: In our daily life, we are using different types of data likestructured data (name, address etc), unstructured data (image,video, audio, text files etc) and semi structured data (XML file,NoSQL database). Big data technology has the capability to storeand analyze all these kinds of data.Veracity: Veracity is defined as the good quality or responsibilityof the data. For example, reliability and accuracy of all the infor-mation posted in face book twitter etc. another best example isabout the accuracy of GPS data.Value: The last one is value, when we use the term value it refersto the amount of useful or worth information being taken out byanalyzing the huge amount of data.The last one is value, when we use the term value it refers to theamount of useful or worth information being taken out by analyzingthe huge amount of data. The one who starts studying about bigdata will definitely come across the term Hadoop. Lets understandwhat is hadoop? Hadoop generation was introduced by Googles filesystem. It is defined as the system of open-source elements thatbasically changes the fashion of storing, analyzing and processingof data by enterprises. Doug Cutting is the person who createdhadoop in the year of 2005. MapReduce algorithms are used inhadoop to run the big data applications. MapReduce algorithmsprocess the data in parallel on different nodes. The flexibility, scal-ability, lowcost and reliability of hadoop make many organizationsto get attracted. The framework of hadoop includes four modules:Common services: Common services are services required byother hadoop modules to activate the hadoop. It gives necessaryjava scripts and files to start hadoop.Hadoop YARN: The term YARN stands for Yet Another Re-source Negotiator. It is one of the hadoop module used for clusterresource management and job scheduling.Common services: Common services are services required by otherhadoop modules to activate the hadoop. It gives necessary javascripts and files to start hadoop.HDFS stands for Hadoop Distributed File System. This module isused to store the large amount of data being generated from varioussources.

3

International Journal of Pure and Applied Mathematics Special Issue

12089

Page 4: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

MapReduce: This module performs two tasks, one is map taskand other one is reduce task. It processes the datasets in parallelfashion. The map task is to take the data sets as an input anddivides each element into tuples which is in key/value pairs. Thereduce task takes the output of map task as an input to the reducerand combines those tuples into smaller sets of tuples.Apache pig: If the programmer used MapReduce to process thedata; then he/she must have knowledge of any programming lan-guage like java, python, Scala, R. What if users dont have strongprogramming language? This problem can be solved by usingApache Pig Latin. Pig Latin is a high-level language used to writedata analysis programs. There is a Pig Engine in Apache Pig thattakes the Pig Latin scripts and converts to MapReduce jobs inter-nally. Apache Pig has many operators called as join, filter, group,co-group, store, load, split etc. all these operators are used to per-form the needful task. The components involved in Apache Pig toprocess the data are Parser, Optimizer, Compiler and ExecutionEngine.Healthcare: Now doctors check their patients health conditionusing smart phones, so there is a chance of confidential informa-tion leakage to the internet, which can help the criminals to getattention over this information. Thus, hospitals need to maintainpatients record in such a way that no one should able to catch thedata except an authorized one. To do that every hospital mustprovide security to the patients information that is by keeping thedata in cipher text format, whenever there is a need to show thehealthcare data to the authorized person then the encrypted datacan be decrypted. Now a days big data is playing an importantrole in Healthcare field because Healthcare data is generating indifferent formats for example text, video, image, digital etc. and itis becoming too huge that it cannot be managed by any traditionaldatabase management tools; hence many hospitals are now makinguse of big data analytics to store and analyze the huge amount ofhealthcare data effectively.

4

International Journal of Pure and Applied Mathematics Special Issue

12090

Page 5: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

3 RELATED WORK

In the design and construction of a big data analytics framework forhealth applications [1] the MapReduce, HDFS and HBASE config-urations need to be changed. In this paper the proposed BDA pro-cess and configuration encountered patients data and performancerequirements to analyze the healthcare BDA. In paper [2] authorsdone the case study of spesis by using analogize process miningtechnique in healthcare. Spesis is a serious condition caused byan infection. This spesis result has been taken from hospitals byusing ERP (Enterprise Resource Planing) system. Two main ob-jectives of this paper are control flow and conformance testing. Inthe present generation of mobile phones and sensor devices, hugeamounts of patient healthcare data files forming Big Data are needto be placed into large databases to manage such huge amount ofdata, multiple users including doctors, caregivers and patients canaccess that data whenever they need. Hence, authors have pre-sented an application of analytics to Big Data in healthcare [3]. Inpaper [4] authors described the challenges and opportunities of BigData in healthcare. Vast amount of healthcare data is being gener-ating from many sources including smart phones, sensors, hospitals,researchers etc. The big challenge in healthcare system is how tocollect, maintain such vast amount of data and how to improvetreatment. Big data analytics is introduces to solve this problem.In paper [5] big data analytics was introduces by many authors tostore and process the huge amount of healthcare data being gener-ated by various sources. These analytics plays an important role inhealthcare to improve the treatment for patients. In this paper au-thors concentrated on an interest of big data analytics in healthcareby analyzing multi-diseases.

5

International Journal of Pure and Applied Mathematics Special Issue

12091

Page 6: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

4 FRAMEWORK FORHEALTHCARE

DATA ANALYSIS

Figure 1: Framework of Secure Healthcare data analysis usinghadoop based apache pig

The above figure shows the framework of secure healthcare dataanalysis. In the user interface, the admin of the hospital will loginto the system by using his/her user name and password. Once theadmin login to the system new frame will be opened, this framehas load, encrypt and decrypt buttons shown in the below figure2.When we click on load button Healthcare data (input file) will beloaded to the Hadoop Environment, from here the healthcare datawill be splited and distributed to the hadoop cluster as shown infigure 1. Hadoop has two main components; one is HDFS to storethe data and other one is MapReduce to process the data. In theproposed system we have used Apache Pig tool, which resides ontop of MapReduce to write data analysis programs. In a clusterall the hadoop systems process the data in parallel fashion. In thispaper DES and MD5 algorithm are used to encrypt the input file.Once the healthcare data is encrypted the result issent back to the

6

International Journal of Pure and Applied Mathematics Special Issue

12092

Page 7: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

HDFS because this data is needed for decryption as an input. DESalgorithm is a symmetric key algorithm, hence for the decryptionalso same key is used which is used for encryption.

Figure 2: Security system

5 RESULTS

The below figures shows the results of proposed system,

Figure 3: Healthcare data

The figure 3 describes that, when we click on click button the inputfile will be loaded from local to the HDFS and display the healthcare

7

International Journal of Pure and Applied Mathematics Special Issue

12093

Page 8: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

data when we click on display button. Click on back button to gofor main frame shown in figure 2, and click on encrypt button toencrypt the healthcare data which is stored in HDFS. In the belowfigure 4 we have shown the encrypted result of healthcare data.

Figure 4: Encrypted Healthcare data

Now we can say that patients information secure, even if this data isstolen by unauthorized user he/she cannot read it, whenever thereis a need to display the original data admin can decrypt the health-care data and display it. The below figure 5 shows the decryptedhealthcare data.

Figure 5: Decrypted Healthcare data

8

International Journal of Pure and Applied Mathematics Special Issue

12094

Page 9: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

6 CONCLUSION

Now a days big data is playing an important role in Healthcarefield because healthcare data is generating in different formats forexample text, video, image, digital etc. Healthcare data is becom-ing too huge that it cannot be managed by any traditional databasemanagement tools; hence many hospitals are now making use of bigdata analytics to store and analyze the huge amount of healthcaredata effectively. We know that in the modern life almost everyonemakes use of smart phones, like that even doctors view their pa-tients health condition by using smart phones, so there is a chanceof confidential information leakage to the internet, which can helpthe criminals to get attention over this information. Thus, hospitalsneed to maintain patients record in such a way that no one shouldable to catch the data except an authorized one. To do that ev-ery hospital must provide security to the patients information thatis by keeping the data in cipher text format, whenever there is aneed to show the healthcare data to the authorized person then theencrypted data can be decrypted.

References

[1] Mu-Hsing Kuo, Dillon Chrimes, Belaid Moa, Wei Hu, De-sign and Construction of a Big Data Analytics Frameworkfor Health Applications, IEEE International Conference onSmart City/SocialCom/SustainCom together with DataCom,pp. 631-636, 2015.

[2] Guneet Kukreja, Shalini Batra, Analogize Process Miningtechniquea in Healthcare, IEEE International Congress on Sig-nal Processing, computing and control, pp. 482-487, 2017.

[3] Shankar Krishnan PhD, Application of Analytics to Big Datain Healthcare, 32nd Southern Biomedical Engineering Confer-ence, pp. 156-157, 2016.

[4] Hiba Asri, Hajar Mousannif, Hassan Al Moatassime, ThomasNoel, Big Data in healthcare: Challenges and Opportunities,978-1-4673-8149-9/15/$31.00 2015 IEEE, 2015.

9

International Journal of Pure and Applied Mathematics Special Issue

12095

Page 10: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

[5] Asif Adil. Hushmat Amin Kar, Rajendra Jangir. Shabir Ah-mad Sofi, Analysis of Multi-diseases using Big Data for im-provement in Healthcare, IEEE UP Section Conference onElectrical Computer and Electronics (UPCON), 2015.

[6] Jingquan Li, Xueying Li, Privacy Preserving Data Analysis inMental

[7] Youngho Song, Young-Sung Shin, Miyoung Jang, Jae-WooChang, Design and Implementation of HDFS Data EncryptionScheme using ARIA Algorithm on Hadoop, 978-1-5090-3015-6/17/$31.00 2017 IEEE, pp. 84-90, 2017.

[8] Weili Kou, Xuejing Yang, Changxian Liang, Changbo Xie andShu Gan, HDFS Enabled Storage and Management of Re-mote Sensing Data, 2016 2nd IEEE International Conferenceon Computer and Communications, pp 80-84, 2016.

[9] Wenzhi Liu, Qi Li, Yunpeng Cai, Ye Li and Xiaoyan Li. APrototype of Healthcare Big Data Processing System Basedon Spark, 2015 8th International Conference on BioMedicalEngineering and Informatics , pp. 516-520, 2015.

[10] Rui Cao, Jing Gao, Research on Reliability Evaluation of BigData System, 2018 the 3rd IEEE International Conference onCloud Computing and Big Data Analysis, pp. 261-265, 2018.

[11] Junyan Tan, Tianyu Xiong, Hongxia Miao, Rurong Sun, MinWu, A Case Study of Medical Big Data Processing: Data Min-ing for the Hyperuricemia, 2018 the 3rd IEEE InternationalConference on Cloud Computing and Big Data Analysis, pp.196-201, 2018.

10

International Journal of Pure and Applied Mathematics Special Issue

12096

Page 11: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

12097

Page 12: DES and MD5 based Healthcare data Protection with Big Data ... · programming language? This problem can be solved by using Apache Pig Latin. Pig Latin is a high-level language used

12098