recognizing the electronic medical record data from unstructured medical data using visual text...
TRANSCRIPT
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 1/11
Recognizing The Electronic Medical Record Data
From Unstructured Medical Data Using Visual
Text Mining Techniques
Abstract: Computer systems and communication technologiesmade a strong and influential presence in the different fields
of medicine. The cornerstone of a functional medical
information system is the Electronic Health Records (EHR)management system. EHR implementation and adoption face
different barriers that slow down its deployment in different
organizations. This research focuses on resolving the most
public barriers, which are data entry, unstructured clinicaldata modifying the physician work flow. This research
proposed a solution, which use Text mining and Natural
language processing techniques.This solution tested andverified in four real-world clinical organizations. The
suggested solution proved correcteness and perciseness with
91.88%..
Keywords: Electronic Health Reacord, Textmining,
Unstructured Medical Data , medical Data entry, Health
Information Technology.
I.INTRODUCTION
The paper-based medical record is woefully inadequate
for meeting the needs of modern medicine. It arose in the19th century as a highly personalized "lab notebook" thatclinicians could use to record their observations and plansso that they could be reminded of pertinent details whenthey next saw that same patient. There were no bureaucraticrequirements, no assumptions that the record would be usedto support communication among varied providers of care,and remarkably few data or test results to fill up therecord’s pages. The record that met the needs of clinicians acentury ago has struggled mightily to adjust over thedecades so as to accommodate to new requirements ashealth care and medicine have changed which leads to theexistence of Health Information Technology (HIT) [1].
HIT allows comprehensive management of medicalknowledge and its secure exchange among health careconsumers and providers. Broad uses of HIT will:
1. Help to eliminate the manual tasks of extracting datafrom charts or filling out specialized datasheets.
2. Help to derive data directly from the electronic record,making research-data collection by product of routineclinical record keeping. .
3. Help to Move from paper-based health care system tosecure electronic medical records which will save livesand reduce health care costs.
4. Help in Early detection of infectious disease byadvanced data collection, fusion and processingtechniques which would be at the forefront in spottingthe emergence of new diseases, and crucial to trackingthe spread of known diseases[2].
II.ELECTRONIC HEALTH RECORD ,DEFINITION AND MODELS
EHR defined as longitudinal electronic record of patients' health information generated by one or moreencounters in any care delivery setting. This informationincludes, but not limited to, patient demographics, progressnotes, examinations details like symptoms and findings,medications, vital signs, past medical history,immunizations, laboratory data, and radiology reports. The
EHR automates and streamlines the clinician's workflow.The EHR has the ability to generate a complete record of aclinical patient encounter as well as supporting other caredirectly or indirectly related activities via interfaceincluding evidence-based decision support, qualitymanagement, and outcomes reporting. The EHR means arepository of patient data in a digital form stored andexchanged securely and accessible by multiple authorizedusers. [2][3][4]
There are many EHR architectural models that can beused all over the world. The most two popular EHR modelsare:
1. Central Repository Model
The center of EHR model will be the repository, whichwill be fed by the existing applications in different carelocations such as hospitals, clinics, and family physicianpractices. The feed from these applications will bemessaging based on the pre-agreed standards. Themessaging needs to be based well-defined standards, for
Prof. Hussain Bushinak
Faculty of Medicine
Ain Shams University
Cairo, Egypt
Dr. Sayed AbdelGaber
Faculty of Computers and Information
Helwan University
Cairo, Egypt
Mr. Fahad Kamal AlSharif
Collage of Computer Science
Modern Academy
Cairo, Egypt
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
25 http://sites.google.com/site/ijcsis/ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 2/11
example the HL7. Reference Information Model (RIM) forwhich XML could be used as the recommendedImplementation Technology Specification (ITS). [5]
Figure 1. EHR Central Repository Model
The event-driven messages that need to be sent andstored in the repository will essentially be event-basedsummaries as shown in figure (2). The event-based
summaries stored in the repository can be queried andretrieved by different clinicians who are treating thepatients in different scenarios and by different clinicalsettings. The retrieval and access of data from therepository is subject to establishing that the clinicianslegitimately access the data for treating only those patientswho are in their care. The retrieval is done throughmessaging which can be done either through synchronousor asynchronous messages depending on the urgency,complexity, and importance of the data that is beingretrieved. [5]
Figure 2. EHR Message Events
2. Managed Services Model
The managed services model is based on hostingapplications for different care providers and care settings ina data center by a consortium, which may consist of groupof infrastructure providers, system integrators, andapplication providers. The hosted applications can be usedto provide an effective EHR by building a common
repository using a shared database or by providing acommon user interface to all hosted applications andextracting data from these systems using a portal whoseauthentication and authorization mechanism can also becontrolled at the data center level as shown in figure 3. [5]
Figure 3. Shared Services Model
III.BARRIERS OF THE ELECTRONIC HEALTH RECORD
IMPLEMENTATION
Implementation of EHR faces different barriers, butthese barriers vary from one environment to another.Hereafter, the main focus will be on the general barriersthat exist in most of EHR implementation attempts, thesebarriers are:
1. Financial Barriers
Financial barriers are divided into the following points:
High Costs: These costs are divided into twomain parts, initial cost and ongoing cost. [6]
Under-developed business case: This barrierraised because of the following: Uncertaintyof EHR returns on investment, Financialbenefits are only achieved on the long run andThe main objective and benefits of EHR is toprovide a high quality medical service for thecitizens. [6]
2. Technological Barriers
Technological barriers are divided into four points: [7]
Inadequate technical support
Inadequate data exchange
Security and privacy
Lack of standards
3. Physicians Attitudinal and Behavioral Barriers in dataentry:
Many health information system projects fail due toattitudes, behaviors, barriers in data entry and lack of systematic consideration of human-centered computingissues such as usability, workflow, organizational change,and process reengineering. There are two major factors that
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
26 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 3/11
lead to sluggish performance of this EHR system, thesefactors are: complexity of the Graphical User Interface(GUI) and system response time. This forces clinician tosee fewer patients and have longer workdays, largelybecause of the extra time needed to use the system. [8]
In 2004,Lisa Pizziferri and others concluded that the
benefits of using EHR system can be achieved and acceptedby physicians if only the physicians do not need to sacrificetheir time with patients or other activities during clinicsessions. Physicians recognize the quality improvementsachieved by EHRs, but their time should be saved bydecreasing the time required for data entry in EHR systems.[9]
4. Organizational Change Barriers
This category contains many points, these points are:
Design of and alignment with workflow andoffice integration:
54.2 percent out of the 5000 respondentsreported that they are worried about slowerworkflow and low productivity according tothe American Academy of Family Physicianssurvey results (American Academy of FamilyPhysicians 2004). [10]
Migration from paper-based systems:
Staff training:
5. The format of Clinical Data store in EHR systems
Generally speaking, there are two main types of
data store shapes: structured data and
unstructured data.
Structured data: Structured data is a data thathas a relational data model and enforcecomposition to the atomic data types.Structured data is managed by technology thatallows for querying and reporting againstpredetermined data types and understoodrelationships, like patient demographics,laboratory tests, etc. [11]
Unstructured data: Unstructured data consistsof any data stored in an unstructured format atan atomic level. That is, in the unstructuredcontent, there is no conceptual definition and
no data type definition - in textual documents,a word is simply a word. [11]
Unstructured data consists of two basic categories:
Bitmap Objects: Inherently non-languagebased, such as X-rays, radiology, video oraudio files.
Textual Objects: Based on a written or printedlanguage, such as clinical reports, nurserynotes and examination sheets. [11]
Using unstructured data for storing clinical data has thefollowing limitations:
The data is not consumable from a semanticlevel without a compatible interface orapplication.
Any technology cannot be necessarily gainedinsight into the context of the informationunless it can actually be read.
6. Barriers of using unstructured data in Electronic HealthRecord:
Aggregation of information across all the records in
a large repository could bring benefits for clinical
research. When physicians work with structured data,
they could receive alerts of the drugs that have badinteraction together which enables them to enhance
the treatment process and avoid the medication errors;
but this cannot be done with unstructured data [12].
IV.SURVEYING THE SOLUTIONS OF EHR DATA ENTRY
BARRIERS:
In October 2010, Ergin Soysal, Ilyas Cicekli, and
Nazife Baykal designed and developed an ontology
based information extraction system for radiological
reports. [15]
The main goal of this technique is to extract and
convert the available information in free text Turkish
radiology reports into a structured information modelusing manually created extraction rules and domain
ontology. This technique extracts data from the
radiological reports, which is a free text written by
physicians and insert it as a structured data into the
EHR. [13]
However, this technique has the following
drawbacks:
It concentrates mainly on abdominal
radiology reports.
It does not use a huge and trusted medical
expressions repository, which may reduce
the quality of information extractionprocess. Consequently, wrong clinical
information will be recorded.
In September 2010, Adam Wright, Elizabeth S.
Chen, and Francine L. Maloney developed a technique
for identifying associations between medications,
laboratory results and problems. They developed a
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
27 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 4/11
knowledge base of medication and laboratory result
problems associations in an automated fashion. It was
based on two data mining techniques; frequent item
set mining and association rule mining. This technique
was successfully able to identify a large number of
clinically accurate associations. A high proportion of
high-scoring associations were adjudged clinicallyaccurate when evaluated against the gold standard
(89.2% for medications with the best-performing
statistic, chi square, and 55.6% for laboratory results
using interest) [14]. However, this technique has the
following drawbacks:
The researchers assumed that patients’ data
was structured.
Building the knowledge base concentrated
only on patient’s problems, medications
and laboratory results, which mean the
other data, such as the patient’s history,
diagnosis, and procedures are not in
account. Data entry is done through traditional GUI.
So, this solution did not enhance the
physician workflow.
In September 2010, a system for misspellings in
drug information system queries was developed by
Christian Senger, Jens Kaltschmidt, Simon P.W.
Schmitt, Markus G. Pruszydlo and Walter E. Haefeli.
This system attempted to solve the problem of drug’s
data entry in Drug Information System (DIS). The
researchers evaluated correctly spelled and misspelled
drug names from all queries of the University Hospital
of Heidelberg. The results identified that search
engines of DIS should be equipped with error-tolerant
search capabilities. Auto-completion lists might
expedite searches but might fail regularly due to the
high frequency of typographic errors already in initials.
It improved the DIS data entry by using spelling
corrected tools to make the drug information
understandable and available, but it concentrated only
on DIS without examination, history, and procedure
data [16].
In august 2010, a technique was developed by
Yong-gang Cao, James J. Cimino, John Ely and Hong
Yu. It was an automated identification of diseases and
diagnosis in clinical records. This technique presents
an approach for a prototyping of a diagnosis classifier
based on a popular computational linguistics platform
[18]. This technique has the following limitations:
It focuses only on the diseases key words
to be extracted and ignores other important
parts like operations, symptoms,
finding…etc.
It does not use spelling correction.
There is no clear structure data model to
store the extracted data from the clinical
report.
It does not use a huge and trusted data
source for medical expressions like Unified
Medical Language Systems (UMLS).
In July 2010, another technique for automatically
extracting information needed from complex clinical
questions was developed by Yong-gang Cao, James J.
Cimino, John Ely and Hong Yu. They built a fully
automated system Ask EHRMES Help clinicians
extract and articulate multimedia information from
literature to answer their ad hoc clinical questions.
This system automatically retrieves, extracts, and
integrates information from the literature and other
information resources and attempts to formulate this
information as answers in response to ad hoc medical
questions posted by clinicians, all of which can be
achieved within a time-frame that meets their demands[17]. This technique succeeds in clinical question
answering and in identifying the category of the
question but in the EHR system adoption process
faced the following limitations:
This technique extracted the clinical
information to identify the question
category but not to store this information in
the EHR repository.
It works only on question answering but
not in the data entry process.
It does not enhance the physician workflow
during the examination process.
Although the previous techniques attempted to solve
the EHR data entry barrier but it has the following
limitations:
These techniques concentrate on specific
parts of data, such as diseases and leaves.
The used medical expression repository
does not contain all the expressions or the
semantic relations between them.
Some of these techniques store the EHR
data as free text (unstructured data form).
The physician workflow has some
modifications which, in turn, leads to more
physical and mental efforts and reduces the
physician’s productivity.
V. BRIDGING THE UNSTRUCTURED DATA TO STRUCTUREDEHR
The suggested idea is to convert the unstructured
free text clinical data to structured EHR data without
modifying the workflow of physicians or adding any
additional physical or mental effort to them. Figure (4)
shows the algorithm of the suggested technique.
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
28 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 5/11
Figure 4 Objective Technique Steps
Step1: Optical Character Recognition OCR
The physician writes his/her diagnoses as regular on
pen-pad, paper or tablet PC. If the clinical report wrote
on paper, it will need to scan it. The clinical report
data will be stored as image of a free hand text which
can be process. This free hand text image scans with
OCR tool to convert to machine encoded text. The
Details of this step represented in figure (5).
Step 2: Spelling Corrector
Machine encoded text may include spelling errors
which may yield wrong information during the
extraction process. So, all the incorrect spelling words
will be correct to move to the next step. This step
requires a medical dictionary that contains most of the
medical expressions in different forms such as verbs,
adjectives, nouns… etc. Figure (6) represent the
details of this step.
Figure 6 Spell Check input and output
Step 3: Text mining with Natural Language Processing
Techniques
In this step, the resulted data will be cleaned andpartitioned into statements. to be classified and coded;
Using text mining and NLP all medical data will be
classified and coded in the form of multiple statements
and remove the unwanted words. This step consists of:
[19]
Text preprocessing,
Part of speech tagging,
Statements Segmentation,
Noun phrase extraction.
The declaration of each pervious component is
showing in the following.
1. Text preprocessing: Is called tokenization or text
normalization and it does include the following
steps: [19]
Throw away unwanted stuff (e.g.,
unwanted brackets and tags).
Word boundaries: white space and
punctuations.
Stemming (Lemmatization): This is
optional. English words like ‘look’ can be
inflected with morphological suffixes to
produce ‘looks, looking, looked’. They
share the same stem ‘look’. Often (but not
always) it is beneficial to map all inflected
forms into the stem. This is a complex
process since there can be many
exceptional cases (e.g., department vs.
depart, be vs. were). The most commonly
used stemmer is the Porter Stemmer.
However, there are many others.
Stop word removal: the most frequent
words often do not carry much
meaning.
Capitalization, case folding: often it is convenient to lower case every
character.
2. Part of speech tagging: A Part-Of-Speech Tagger
(POS Tagger) is a piece of software that reads text
in some language and assigns parts of speech to
each word (and other token), such as nouns, verbs,
adjectives, etc. [19]
3. Statements segmentation: The output of this part
divides the clinical text into several statements.
[19]
Figure 5 OCR and Handwriting input and output
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
29 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 6/11
v
4. Noun phrase extraction: In this part, all noun
phrases are extracted and the complex noun
phrase is decomposed into smaller noun phrases.
Figure 7 Text mining and NLP tasks
Step 4: Unified Medical Language System (UMLS)
Coding
To identify the clinical information, there is a need for
a huge repository for all clinical expressions to extract
the matched clinical expressions. UMLS used to
achieve this purpose. The UMLS is a compendium of
many controlled vocabularies in the biomedical
sciences and created in 1986. It provides a mapping
structure among these vocabularies and allows
translating among the various terminology systems. It
may be viewed as a comprehensive thesaurus and
ontology of biomedical concepts. [20]
UMLS consists of the following components: [20] Metathesaurus, the core database of the
UMLS, a collection of concepts and terms
from the various controlled vocabularies
and their relationships.
Semantic Network, a set of categories and
relationships that are being used to classify and relate the entries in the Metathesaurus.
Specialist Lexicon, a database of
lexicographic information to be used in
natural language processing. A number of supporting software tools.
Morphologically analyzed words are compared to the
UMLS entries to find the best matched expression
according to its Morphological position. Each noun
phrase which matches a clinical expression entry in
the UMLS, put as a pair that contains the noun phrase
with its UMLS’s clinical codes.
Figure 8 UMLS expressions coding
The pseudo code of UMLS coding algorithm can be:
For each Statement S in Statements // in physician
sheet
Begin
For each noun-phrase N in S
Begin
If N exists in UMLS then,
Extract N and C // where c is the
UMLS code
Put N with C as pair <N, C>
End if
End
End
Step 5: Classify EHR Components
The suggested technique applied on physician’s
examination sheet. The examination sheet contains the
following classes:
History
Examination
Diagnosis
Procedure
Each part treated as a class and all coded clinical data
that were produced from the previous steps classified
into one of the previous classes.
The first step in the classification process is building a
collective set of features that is typically called a
dictionary. The UMLS clinical expressions in the
dictionary form represent the base to create a
spreadsheet of numeric data corresponding to the
previous defined classes.
TABLE (1): CLASSES DICTIONARY
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
30 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 7/11
Each row defines a class and each column represents a
UMLS code. The cell in the spreadsheet represents a
measurement of the feature corresponding to the
column and the class corresponding to the row. The
dictionary of words covers all the possibilities and thenumber corresponds to the columns. All cells values
ranged between zero and one depending on whether
the words were encountered in the Class or not. The
form of classes’ dictionary is shown in table (1).
The second step is measuring the similarity between
extracted expressions and the defined classes then
classify each expression to the most similar class. The
Cosine algorithm selected to calculate the Similarity
between the extracted clinical phrases and predefined
classes. Steps of Cosine Similarity algorithm are:
Compute the similarity of new clinical
phrase to all Classes in Dictionary.
Select the Class that is most similar to thenew clinical phrase.
The class which occurs most frequently is
the similar one.
For cosine similarity, only positive words shared by
the compared phrases are considered. Frequency of
word occurrence is also valued. The clinical phrase is
compared with each class by the following equation:
[21]
Norm (P) = W (j): is the weight of the word phrase in
class
Cosine (P1, P2) = wp1 (j) * wp2 (j))/ (Norm (P1) *Norm (P2))
Wpi: is the weight of the word phrase in class i
The cosine similarity of two Classes will range from 0
to 1. The angle between two term frequency vectors
cannot be greater than 90°, consequently, when the
cosine value is close to 1 this means that the clinical
phrase is more similar to the compared class.
Step 6: Storing data in EHR Repository
The classified clinical phrase stored in its class inside
the EHR database with its matched UMLS code. For
example, a physician wrote the following:
There is enlarged prostate with tender base of the bladder .
This statement contains two findings, and then this
statement compared with each class. The cosine vector
scores for this statement against each defined class
according to the previous equations are calculated.
The winning class will be the high score one. The data
will store in the winning class with its UMLS codes as
pairs inside EHR repository:
< enlarged prostate, Finding>
< tender base of the bladder , Finding>
The EHR put in a structured form for analysis and data
mining operation, or as a perfect resource for decisionsupport system.
VI. THE EXPERIMENTAL STUDY
The aim of the experiment is to prove the success of
the suggested technique in a real world cases. For any
experiment, there are some hypotheses; the hypotheses
of this experiment are:
Physician has little experience of computer
using.
Physician’s handwriting is readable.
The used medical abbreviations should be
standard. The experiment applied during the
examination session.
The required equipments to implement the
experiment are:
An electronic pen pad.
A Laptop or personal computer.
Windows vista or later
SQL server 2008
Microsoft office 2007 or later (For
applying OCR in Pin pad)
.Net framework 4
UMLS database system
Medical dictionary (for spelling correction)The implementation of the experimental study is
going through the following steps:
Step 1: At the nurse office the patient
demographics data recorded using the following
screen.
Figure 9: Computing similarity scores for New Clinical Phrase
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
31 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 8/11
Step 2: The physician uses the pen pad to write
the diagnosis.
The physician has the freedom to erase, add or
modify any partition of his/her diagnosis. This
step helps him/her to work as regular without any
additional effort. The data is directly recorded on
the computer which will help the physician to
retrieve it easy with its form or as structured data.
Step 3: After the physician finished his/her hand
writing, he/she press OCR button to convert the
diagnosis from image form to machine coded text
as shown in the following figure:
Step 4: After the OCR done, the system starts to
checks and corrects the spelling errors of the
examination data according to the installed
medical dictionary through an interaction session
with the physician.
Step 5: After the spelling correction done, the
physician presses “insert into EHR” button to
convert the diagnosis data from unstructured to
the structured form. Conversion is done through
the following steps:
Text preprocessing: All brackets, unwanted
stuff, and word boundaries are removed.
Figure 10: EHR demographics form
Figure 11: Pen pad to Computer Form
Figure 12: Applying OCR on the diagnosis sheet
Figure 13: Applying spell check on the examination text
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
32 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 9/11
(TOP (S (NP (DT A) (ADJP (NP (CD 15) (NNS years)) (JJ
old)) (JJ female) (NN patient)) (VP (VBZ complains) (PP (IN
from) (NP (JJ nocturnal) (NN enuresis))) (PP (IN since) (NP
(NN birth)))) (. . .)))
(TOP (S (NP (NP (JJ Plain) (NN X-ray)) (PP (IN of) (NP (DT
the) (NN abdomen)))) (VP (VBD was) (ADJP (JJ free))) (. .)))
(TOP (S (NP (JJ Abdominal) (NN ultra) (NN sonography))
(VP (VBD was) (ADJP (JJ free))) (. .)))
(TOP (S (NP (PRP he)) (VP (VBZ has) (NP (NP (NNP
Enuresis)) (SBAR (S (NP (DT The) (NN patient)) (VP (MD
should) (VP (VB receive))))) (: :) (NP (NP (NNP R1) (NNP
Uipam) (NN tablet)) (NP (NP (CD one) (NN tablet)) (NP (RB
twice) (RB daily)) (PP (IN for) (NP (CD three) (NNSmonths))))))) (. .)))
(TOP (S (PP (IN R2) (NP (NNP Dipripam) (CD 20) (NN mg)
(NN capsule))) (NP (NP (CD one) (NN tablet)) (NP (RB
twice) (RB daily)) (PP (IN for) (NP (CD three) (NNS
months)))) (. .))) (TOP (S (NP (DT R3) (NNP Depavit) (NNP
B12) (NN ampule)) (. .)))
Parts of speech tagging: Assigning parts of
speech to each word.
Statements segmentation: Examination text
is split into multiple statements.
Phrase tagging: Each phrase is tagged with
the suitable code to identify all phrases
contained in the diagnosis sheet.The output of this step is the examination of
words with their parts of speech; this output exists
in the following format:
Noun Phrase Extraction:
All noun phrases are extracted and
compounded. Noun phrases are divided
into a smaller noun phrases, such as the
following:
o A 15 years old female patient
o 15 years
o Nocturnal enuresis since birth
o Birth
o Plain X-ray of the abdomen
o Plain X-ray
o The abdomen
o Abdominal ultra sonography
o Enuresis
o The patient
o R1 Uipam tablet
o One tablet twice daily for threemonths
o One tablet
o Twice daily
o Three months
o Dipripam 20 mg capsule
o One tablet twice daily for three
months
o One tablet
o Twice daily
o Three months
o R3 Depavit B12 ampule
Step 7: All noun phrases are coded with UMLS
codes. The output of this step represented in table
(2).
TABLE (2): NOUN PHRASES WITH THEIR UMLS CODES.
Each statement got score according to UMLS
codes and the class’s dictionary which declared in
table (1). Table (3) shows the statements and theirscores.
TABLE (3): STATEMENTS’ SCORE.
Step 8: According to the scores showed in table
(3), the statements classified into their classes.
The predefined classes are:
History
Examination
Diagnosis
Procedure
Figure 14: Output of Text mining technique
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
33 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 10/11
The classifier uses the COS similarity algorithm
to classify each statement according to the class
dictionary. Table (4) shows the score of each
statement relative to nearst class.
TABLE (4): COS SIMILARITY SCORES FOR EACH CLASS.
Step 9: After determining the winning class for
each statement, each noun phrase with its UMLS
code saved inside the EHR in the winning class as
a paired tag. Table (5) shows this format.
Step 10: This extracted information compared
with the physician manual results to identify the
suggested technique precision.
VII. RESULTS DISCUSSION
The experimental study conducted on four
Medical departments. In each department 10
diagnosis sheets tested. The tested departmentsare:
Surgical Oncology
Surgery Urology
Cardiology
General Surgery
Table (6) shows the overall precession
percentage in each of tested department.
TABLE (6): RESULTS OF THE EXPERIMENTAL STUDY.
Department Overall Precise
Surgical Oncology 92.96%
Surgery Urology 91.55%
Cardiology 92.33 %
General Surgery 88.61%
Overall precession 91.36
Some factors affect the results, such as quality of
physician hand writing. The effect of this factor clears
in the result of experiment four, since it is the lowest
precision percentage (91.36 %). High precision OCR
tool can minimize the effect of this factor; but it may
be expensive. The results indicated that the suggested
technique success with high percentage in a real world
experiment, which means that this technique can be
applied in the real live in future.
VIII. CONCLUSION
The suggested technique succeeded in working as a
bridge between unstructured and structured medical
data. The medical data stored inside the EHR system
in its right position without any additional physical or
mental effort by physician, which in turn satisfy the
main objective of this research.
REFERENCES
[1] Institute of Medicine. “Review of the Adoption and
Implementation of Health IT Standards by the DHHS
Office of the National Coordinator for Health
Information
Technology”http://www.iom.edu/Activities/Workforc
e/HealthITStandards.aspx
[2] Richard Dick, Elaine B. Steen, and Don Detmer, “The
Computer Based Patient Record: An Essential
Technology for Health Care”, National Academy
Press, 1997.
[3] See HIMSS web page for the consensus definition of
an electronic health record.
http://www.himss.org/ASP/topics_ehr.asp.
[4] J.H. van Bemmel and M.A. Musen, “Handbook of
Medical Informatics”, Springer, 1997.
TABLE (5): DATA THAT INSERTED INSIDE THE EHR
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
34 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
8/6/2019 Recognizing the Electronic Medical Record Data from Unstructured Medical Data Using Visual Text Mining Techniques
http://slidepdf.com/reader/full/recognizing-the-electronic-medical-record-data-from-unstructured-medical-data 11/11
[5] K. Ananda Mohan,” National Electronic Health
Record Models”, Tata Consultancy Services
(TCS),2004.
[6] Miller, R. H. and Sim, Ida. “Physicians’ Use Of
Electronic Medical Records: Barriers And Solutions”.
Health Affairs, 2004.
[7] Waegemann, “EHR vs. CPR vs. EMR. Healthcare
Informatics”, 2003.
[8] Himali Saitwala, Xuan Fengb, Muhammad Walji,
Vimla Patel, Jiajie Zhanga, ”Assessing performance of
an Electronic Health Record (EHR) using Cognitive
Task Analysis” , Elsevierhealth, 2010.
[9] Lisa Pizziferri, Anne F. Kittler, Lynn A. Volk, Melissa
M. Honourb, Sameer Gupta, Samuel Wang, Tiffany
Wang, Margaret Lippincott, Qi Li and David W.
Bates,” Primary care physician time utilization before
and after implementation of an electronic health
record: A time-motion study”, Elsevierhealth,2004.
[10] American Academy of Family Physicians. “Family
Practice Management Monitor”, AAFP pushes for
affordable EMR system, 2004.
[11] Oleh Hrycko,” Electronic Discovery in Canada: Best
Practices and Guidelines”,CCH,2007.
[12] Angus Roberts , Robert Gaizauskas, Mark Hepple,
George Demetriou, Yikun Guo, Ian Roberts, Andrea
Setzer,” Building a semantically annotated corpus of
clinical texts”, Elsevierhealth,2009.
[13] Hanna M. Seidlingab, Marilyn D. Paternoac, Walter E.
Haefelib, David W. Bates,” Coded entry versus free-
text and alert overrides: What you get depends on how
you ask”, Elsevierhealth,2010.
[14] Adam Wright, Elizabeth S. Chenc, d and Francine L.
Maloney,” An automated technique for identifying
associations between medications, Laboratory results
and problems”, Elsevierhealth, 2010.
[15] Ergin Soysal, IlyasCicekli, NazifeBaykal,” An
ontology based information extraction system for
radiological reports”, Elsevierhealth, 2010.
[16] Christian Senger, Jens Kaltschmidt, Simon P.W.
Schmitt,Markus G. Pruszydlo, Walter E.
Haefeli ,“Misspellings in drug information system
queries: Characteristics of drug name spelling errorsand strategies for their prevention”, Elsevierhealth,
2010.
[17] Yong-gang Cao, James J. Cimino, John Ely, Hong Yu,
“Automatically extracting information needs from
complex clinical questions”, Elsevierhealth, 2010.
[18] Dina Demner-Fushman, James G. Mork, Sonya E.
Shooshan, Alan R. Aronson ,“UMLS content views
appropriate for NLP processing of the biomedical
literature vs. clinical text”, Elsevierhealth, 2009.
[19] Malgorzata Marciniak,Agnieszka Mykowiecka,”
Aspects of Natural LanguageProcessing”,Springer,2009.
[20] Catherine R. Selden,Betsy L. Humphreys,” Unified
Medical Language System: Current Bibliographies in
Medicine”, National institute of health,1990.
[21] Jiawei Han,Micheline Kamber,” Data mining:
concepts and techniques”,Diana Cerra,2006.
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, June 2011
35 http://sites.google.com/site/ijcsis/
ISSN 1947 5500