online assessment of content skill levels for medical texts

Expert Systems with Applications 36 (2009) 12272–12280

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Online assessment of content skill levels for medical texts

Rey-Long Liu a,*, Yun-Ling Lu b

a Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, ROCb Computer Center, Chung Hua University, Hsinchu, Taiwan, ROC

a r t i c l e i n f o a b s t r a c t

Keywords:Medical textsContent skill level assessmentMedical text recommendationMedical text writing

0957-4174/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.eswa.2009.04.060

* Corresponding author. Tel.: +886 3 8565301x237E-mail address: [email protected] (R.-L. Liu

Content skill levels of medical texts are essential for the comprehension (and hence utility) of medicalinformation. A text that is too professional (i.e. high skill level) for a reader may be incomprehensibleto the reader, and hence be of no value. Therefore, readers of different professional backgrounds requiremedical texts of different content skill levels. In this paper, we explore how content skill levels of medicaltexts may be assessed in an online manner without relying on any domain-dependent knowledge. Wefind that several assessment strategies have weaknesses, and propose an intelligent online assessmentstrategy OCSLA. Empirical evaluation on a medical text corpus from MedlinePlus shows that OCSLAmay achieve both better and more fault-tolerant performance. The contributions are of practical signif-icance to online medical text writing and recommendation, which are essential for heath educationand promotion.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

For health education and promotion, medical information is of-ten written in a text form and disseminated through the Internet,newspapers, brochures, and research papers. The medical textsmay be helpful only when their content skill levels fit the profes-sional backgrounds of the readers. A professional (i.e. high skill le-vel) text may be incomprehensible to a reader, reducing the valueof the medical information in the text. Comprehensibility of med-ical information is thus routinely identified as a key issue for healtheducation and promotion through the Internet (e.g., Eysenbach &Köhler, 2002; Zeng et al., 2004).

1.1. Problem definition and motivation

In this paper, we explore how content skill levels of medicaltexts may be assessed in an online and domain-independent man-ner. Our goal lies in helping individual readers to access thosemedical texts whose contents are suitable for the readers to com-prehend. Fig. 1 outlines two applications: medical text recommen-dation and writing. In medical text recommendation, the assessorhelps a reader to identify those texts whose content skill levelsare similar to those of the text being read (or designated) by thereader. In that case, the reader may focus more on those medicaltexts that are comprehensible to him/her. On the other hand, inmedical text writing, the assessor helps writers to estimate contentskill levels of the materials being written so that the materials may

ll rights reserved.

0; fax: +886 3 8579409.).

be refined to fit the skill levels of the targeted learners, makingboth teaching and learning more efficient.

1.2. Organization and contribution of the paper

In the next section, we identify the main challenges and discussthe related studies. Accordingly, we present a technique OCSLA(Online Content Skill Level Assessor), which is efficient in assessingcontent skill levels of medical texts, and may work without relyingon any domain-dependent knowledge (Section 3). Empirical evalu-ation on a medical text corpus from MedlinePlus1 shows that OC-SLA achieves both better and more fault-tolerant performance(Section 4). The contributions are of practical significance to onlinemedical text writing and recommendation, which are essential forhealth education and promotion.

2. Related studies and main challenges

Content skill level assessment aims at helping individual read-ers to get those medical texts that are suitable for them to compre-hend, so that the individuals may make appropriate healthdecisions. Therefore, medical information comprehensibility wasa main issue, which was noted in previous studies on health literacyas well. Health literacy was a key factor dominating whether med-ical information may be obtained, understood, and used by individ-uals in making health decisions (Ad Hoc Committee on HealthLiteracy for the Council on Scientific Affairs of American Medical

1 http://medlineplus.gov.

mailto:[email protected]

http://medlineplus.gov

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

Content Skill Level Assessor

writingrecommendationMedical text Medical text

Texts to be rec-ommended

Texts under revision

Professional backgrounds of targeted readers

Information ranking

Professional background of the reader / Content skill level of the text being read

Text writing & revision

Texts written with suitable content skill levels

Texts ranked with suitable content skill levels

Fig. 1. Usage scenario of online content skill level assessment.

R.-L. Liu, Y.-L. Lu / Expert Systems with Applications 36 (2009) 12272–12280 12273

Association, 1999; Rootman, 2006). By considering health literacyin the context of electronic environments (e.g., information sys-tems and the Internet), e-health literacy was noted (Norman & Skin-ner, 2006). These literacy studies focused more on health literacylevels of the readers, rather than on content skill levels of medicaltexts. From a practical standpoint, given a group of readers with acertain health literacy level, content skill level assessment helps toidentify comprehensible medical information for the readers.

Therefore, several studies explored how the existing medicaltexts incurred reading difficulties to healthcare consumers. Medicalterminology in the texts (Eysenbach & Köhler, 2002; Zeng et al.,2004) and readability requirement of the texts (Thomson & Hoff-man-Goetz, 2007) incurred main difficulties to the consumers.The existence of the former was inevitable in the medical texts,while the measurement of the latter was often based on sentencelengths and numbers of polysyllabic words in the medical texts(Thomson & Hoffman-Goetz, 2007) (similar approaches wereimplemented in some software packages, e.g., Microsoft OfficeWord (Microsoft Corporation, 2008)). Obviously, the readabilitymeasurement cannot be an accurate indicator of user comprehen-sion (Thomson & Hoffman-Goetz, 2007), which heavily depends onhow medical terminology occurs in the texts. Therefore, automaticcontent skill level assessment for medical texts is still an importantresearch problem.

Technically speaking, assessing the content skill level of a text isa task of assigning a score to the text so that multiple texts may beranked to fit the skill level of a specific reader. Score assignment fora text was a common task in information retrieval (IR) studies aswell. However, they often focused more on other kinds of scores,such as relevancy, novelty (e.g., Gabrilovich, Dumais, & Horvitz,2004), probability of being accessed (e.g., Page, Brin, Motwani, &Winograd, 1998), and topic coverage (e.g., Zhang et al., 2005) of

Table 1Main challenges of online content skill level assessment.

Challenge Description

I. Word skill level assessment without domainknowledge

(1) Words are fundamental compontexts

(2) Words are evolving, and hence d

II. Dealing with contexts (neighbors) of wordsin a medical text

(1) A term may be formed by severforms, and hence no domain kno

(2) Even a rare word (or concept) m

III. Dealing with the effects of wordreoccurrence in a medical text

A word (or concept) with reoccurren

IV. Efficient online content skill levelassessment

To support healthcare information nanavigation

the text. Content skill level assessment may complement the stud-ies by providing an additional ranking–content skill level ranking,so that the reader may access comprehensible information moreeasily.

More specifically, Table 1 summarizes the technical challengesof online content skill level assessment. Since medical texts arecomposed of words, content skill levels of the texts heavily dependon skill levels of the words. Therefore, to assess a text’s contentskill level, we need to assess skill levels of the words in the text,which is a technical challenge since biomedical terminology isevolving, and no medical resources may provide skill levels ofthe medical terms (Challenge I). The evolution of biomedical termi-nology motivated lots of researches such as dynamic terminologyrecognition (e.g., Nenadic, Spasic, & Ananiadou, 2003) and namedentity recognition (NER, e.g., Fukuda, Tsunoda, Tamura, & Takagi,1998; Zhou, Zhang, Su, Shen, & Tan, 2004). However, the ap-proaches often required domain-specific training corpora and/orknowledge, and even a biomedical term was recognized, its skill le-vel is still unknown.

Moreover, a word t may have different skill levels in differenttexts, depending on context (neighboring) words of t in the texts(Challenge II). A professional word in a text may become easy tocomprehend if its context words are naive, which may help readersto comprehend t using the concepts already known. The challengethus lies in recognizing word contexts and estimating their effectsin the word skill level assessment. It was not considered in previ-ous studies.

On the other hand, the skill level of a word t also depends on thetimes t occurs in the texts (Challenge III), since introducing t manytimes in the text helps readers to comprehend t. The challengesthus lie in the adjustment of word skill levels for reoccurring terms,which was not tackled by previous studies either.

ents of medical texts, and hence essential for content skill level assessment for the

omain knowledge (e.g. dictionaries) cannot always cover all words

al neighboring words. Multi-word terms are evolving and may occur in differentwledge is perfectly reliableay be comprehensible if its neighboring words are comprehensible

ces in a text may become comprehensible, even if it is quite professional

vigation, content skill level assessment should be efficient enough to support online

12274 R.-L. Liu, Y.-L. Lu / Expert Systems with Applications 36 (2009) 12272–12280

The last challenge of the study lies in the efficiency of the con-tent skill level assessment (Challenge IV). It is motivated by onlinehealthcare information navigation, which has been a commonneed for both healthcare professionals and consumers. The asses-sor should be efficient enough to support the online behavior. Notime-consuming process may be conducted.

3. An online content skill level assessor

We develop an online content skill level assessor OCSLA, whoseoverview is illustrated in Fig. 2. Given a document d, a word se-quence is produced by removing stop words, punctuations, andnumbers (Component 1). Content skill level of d is then estimatedbased on the word sequence. OCSLA first estimates skill level ofeach word by examining the uniqueness (Component 2), contexts(Component 3), and reoccurrences (Component 4) of the word.

Word sequence

Document (1) Removal of punwords, and number

(3) Refinement of wby context reconci

Words with estimate

(2) Preliminary estiskill levels

(4) Refinement of wby reoccurrence eff

Words with reconcil

Search engine

(5) Assessment of skdocument

Skill level of each d

Fig. 2. Overview

Table 2Algorithm for assessing a text’s content skill level.

Algorithm SkillLevelAssessment(d)Input: d: a text document whose content skill level is to be assessed;Return: Content skill level of d.Begin

(1) W Sequence of words resulting from removing punctuations and stop words in(2) MaxEO Max{for each stop word p, expected number of documents containing p(3) For each distinct word t in W, do

(3.1) EOt Expected number of documents containing t (retrieved from a database(3.2) Uniquenesst (1 + MaxEO)/(1 + EOt)(3.3) SLt Log2(Uniquenesst);

// Context Reconciliation(4) For each occurrence ti of each distinct word t in W, do

(4.1) ContextSL Average SL values of left and right neighboring words of ti (at mo(4.2) TSLt,i (2 � ContextSL � SLt)/(ContextSL + SLt);

// Reoccurrence Effect Reduction(5) DocTSL 0;(6) For each distinct word t in W, do

(6.1) TotalDecayedTSLt 0;(6.2) For each occurrence ti of t in W, do

(6.2.1) DecayedTSLt,i TSLt,i � (a2/(a2 + i2)), where a is set to 3;(6.2.2) TotalDecayedTSLt TotalDecayedTSLt + DecayedTSLt,i;

(6.3) DocTSL DocTSL + TotalDecayedTSLt/Number of occurrences of t in W;(7) Return DocTSL/Number of distinct words in W;

End.

Based on the word skill levels, content skill level of d may be as-sessed (Component 5). The basic idea is that a document d is ex-pected to have a higher content skill level if it contains morewords with higher skill levels.

More specifically, Table 2 presents the algorithm of OCSLA,which includes three main components: preliminary estimationof word skill levels (Step 3), refinement of word skill levels by con-text reconciliation (Step 4), and further refinement of word skill lev-els by reoccurrence effect reduction (Steps 5–6). The threecomponents are presented in the following subsections,respectively.

3.1. Preliminary estimation of word skill levels

To get a preliminary skill level estimation for a word t,OCSLA first estimates the EOt, which is the number of documents

OCSLA

ctuations, stop s

ord skill levelsliation

Stop words

d skill levels

mation of word

ord skill levels ect reduction

ed skill levels

Word uniqueness

ill level of the

istinct word

of OCSLA.

d;(retrieved from a database or an Internet search engine)};

or an Internet search engine);

st 2 left neighbors and 2 right neighbors);

|w1 t1 w2 w3 ……………………. ………. ………. ……………x1 x2 t2 x3 x4 ………. …. ……….. …. y1 y2 t3 y3 y4 …………………… … …………… ………………… ……………………… ………………….. ……………………………z1 z2 t4 z3|

TSLt,1 = (2×ContextSL×SLt) / (ContextSL + SLt), where ContextSL is average skill level of w1 to w3, since t1 has only 1 left neighbor.

TSLt,2 = (2×ContextSL×SLt) / (ContextSL + SLt), where ContextSL is average skill level of x1 to x4.

DecayedTSLt,4 = TSLt,4×(32/(32+i2), where i= 4, since t4 is the 4th occurrence of t.

DecayedTSLt,3 = TSLt,3×(32/(32+i2), where i= 3, since t3 is the 3rd occurrence of t.

Fig. 3. Refining skill level of a word t that has multiple occurrences in the document (t1–t4).

2 http://medlineplus.gov.3 http://www.nlm.nih.gov/medlineplus/healthtopics.html.


containing t (refer to Step 3.1). Uniqueness of t may thus be esti-mated by (1 + MaxEO)/(1 + EOt) (refer to Step 3.2), where MaxEOis the number of documents containing the most common stopword (refer to Step 2). The preliminary estimation for skill levelof t (i.e., SLt) is then derived by taking logarithm on uniquenessof t (refer to Step 3.3).

Estimation of EOt (and MaxEO as well) is achieved by sending tas a query to an Internet search engine, which provides a globalview to assess how unique a word is, without calling for statisticalanalysis on any predefined text corpus, making OCSLA more do-main independent. The word skill levels may be cached so thatinvocation to the search engine is conducted only when OSCLAencounters a new word not seen before, dramatically speedingup the preliminary estimation of word skill levels.

3.2. Refinement by context reconciliation

The preliminary word skill levels (i.e., SLt) are refined by contextreconciliation, which is motivated by the observation that skill levelof a word t may be increased (decreased) if neighboring words of thave higher (lower) skill levels. Even a simple word may becomedifficult to comprehend if its neighboring words are quiteprofessional.

Therefore, a word t may have multiple occurrences in a docu-ment d, and the occurrences may lead to different skill levels,depending on neighbors of each occurrence. As illustrated in theupper part of Fig. 3, for the ith occurrence ti of t in d, OCSLA aver-ages the preliminary skill levels of its neighboring words (refer toStep 4.1), and harmonically integrates the average skill level andskill level of t (refer to Step 4.2), producing a refined skill levelfor ti (i.e., TSLt,i).

It is interesting to note that, the harmonic integration places thesame importance on skill level of t and average skill level of neigh-bors of ti. Moreover, skill level of ti will be lower if t and the neigh-bors have more inconsistent skill levels (i.e., one is high but theother is low, or vice versa), since in that case, ti may be easier forthe reader to comprehend.

3.3. Refinement by reoccurrence effect reduction

After context conciliation, the resulting word skill levels (i.e.,TSLt,i, for all occurrences of each word t) are further refined by reoc-currence effect reduction, which is motivated by the observationthat skill level of a word t may be gradually decreased as t is men-tioned several times in the document d. Even a professional wordmay become easy to comprehend if it has been described severaltimes in the document.

Therefore, based on the skill levels refined by context reconcil-iation, OCSLA further refines the skill levels by examining the reoc-

currence times of each word. For each word t, the basic idea is toassign a weight to skill level of each occurrence of t. As illustratedin the lower part of Fig. 3, by a reverse sigmoid function, lateroccurrences of t receive lower weights (refer to Step 6.2.1). The fi-nal skill level of t in d is simply the average of the weighted skilllevels (refer to Step 6.3).

3.4. Assessment of document content skill level

Based on the refined skill levels of all distinct words, contentskill level of the document d is simply their average (refer to Step7). Therefore, d will get a higher (lower) content skill level if higher(lower) percentage of distinct words in d get high skill levels. Otherfactors (e.g., length of d) are not the keys to assess content skill le-vel of d.

OCSLA may tackle the challenges raised in Table 1. In responseto Challenge I (i.e., word skill level assessment without domainknowledge), OCSLA invokes an Internet search engine to get a pre-liminary skill level for each word (Section 3.1). No domain knowl-edge and learning process are needed, making OCSLA domainindependent and easy to implement. In response to Challenge II(i.e., dealing with word contexts), OCSLA employs context reconcil-iation (Section 3.2). It focuses on context-based skill level assess-ment, without needing to recognize multi-word terms. Inresponse to Challenge III (i.e., effect of word reoccurrences), reoc-currence effect reduction is employed (Section 3.3). Finally, in re-sponse to Challenge IV (i.e., efficiency of content skill levelassessment), OCSLA does not conduct time-consuming computa-tions, making it efficient enough to support online content skill le-vel assessment.

4. Empirical evaluation

OCSLA is empirically evaluated. Table 3 summarizes the maindesigns of the experiment, including experimental data, baselines,and evaluation criteria, which are described in the followingsubsections.

4.1. Experimental data

Experimental data come from MedlinePlus, which is a popularsource of healthcare information on the Internet.2 MedlinePlusorganizes healthcare information into health topics,3 and for eachhealth topic, it provides hyperlinks to the related documents. Fol-lowing the hyperlinks, we extract 8030 documents, which fall

http://medlineplus.gov

http://www.nlm.nih.gov/medlineplus/healthtopics.html

Table 3Experiment design.

Aspect Setting

Experimental data (1) 8030 documents from health topics of MedinePlus:(1A) 1635 Simple documents: For each topic G, randomly extract 3 documents (if available) from ‘‘News” part of G(1B) 2981 Professional documents: For each topic G, randomly extract 3 documents (if available) from ‘‘Journal Articles” part of G(1C) 3414 Medium documents: For each topic G, randomly extract documents from all parts other than ‘‘News” and ‘‘Journal Articles” of G

Baseline contentskill level assessmentstrategies

Given a document d, the following baselines share the same routines and databases with OCSLA, except for(1) AvgUniq: content skill level of d = average SLt, for each word t in d(2) AvgLogUniq: content skill level of d = average Log2SLt, for each word t in d(3) AvgUniq_Bound_c: content skill level of d = average SLt, for each word t in d, with SLt set to c if SLt > c (settings for c: 200, 500, 1000,

5000)

Evaluation criteria (1) Hypothesis Test: Estimation of how average SL values of simple, medium, and professional documents differ from each other(2) Skill level ranking precision: Estimation of the precision rate in distinguishing simple, medium, and professional documents


into three types depending on their content skill levels: simpledocuments, medium documents, and professional documents. Thedocuments help to measure the performance of various kinds ofcontent skill level assessment strategies. A better assessmentstrategy should distinguish the three types of documents moresuccessfully.

There are 1635 simple documents, which are collected by fol-lowing the hyperlinks to news documents (if available) concerningthe health topics. They are simple (i.e., having low content skill lev-els) since news documents are written for the general public. Thereare 3414 professional documents, which are collected by followingthe hyperlinks to journal articles concerning the health topics (i.e.,abstracts in PubMed4). They are professional (i.e., having high con-tent skill levels) since they contain professional contents. There are2981 medium documents, which are collected by following otherkinds of hyperlinks, including treatment, specific condition, preven-tion/screening, and diagnosis/symptom concerning the health topics.Content skill levels of medium documents should thus lie betweenthose of simple documents and professional documents.

4.2. Baseline strategies for content skill level assessment

In addition to OCSLA, three assessment strategies are imple-mented as the baselines for performance comparison: (1) AvgUniq,(2) AvgLogUniq, and (3) AvgUniq_Bound_c. All the baselines sharethe same underlying databases and routines of OCSLA, includingComponents 1 and 2 illustrated in Fig. 2. The main differencesare that the baselines do not have Components 3 and 4 by whichOCSLA refines content skill level assessment by context reconcilia-tion and reoccurrence effect reduction. Therefore, given that noprevious studies aimed at online content skill level assessment (re-fer to Section 2), the baselines may serve as possible ways to ap-proach the assessment, and may help to measure thecontributions of context reconciliation and reoccurrence effectreduction.

Given the uniqueness values of words in a document d (i.e.,Uniquenesst in Step 3.2 of Table 2), AvgUniq estimates content skilllevel of d by averaging the uniqueness values. Therefore, it tends toassign higher content skill level to d if more words in d have higheruniqueness values. It may suffer from the problem incurred by rarewords, which have high uniqueness values, and hence mislead Avg-Uniq to assign a too high content skill level to d.

To tackle the problem, the other two baselines (AvgLogUniq andAvgUniq_Bound_c) are implemented. AvgLogUniq tackles the prob-lem by taking logarithm on uniqueness values (i.e., Log2Unique-nesst). It was a common approach employed in many informationretrieval systems. On the other hand, AvgUniq_Bound_c tacklesthe problem by setting an upper bound (i.e., c) to uniqueness val-

4 http://www.pubmed.gov.

ues. When a word’s uniqueness value exceeds the upper bound, itis set to the upper bound. Therefore, the upper bound is actually aparameter, and hence we try several settings for the upper bound:200, 500, 1000, and 5000, which are referred to as Avg-Uniq_Bound_200, AvgUniq_Bound_500, AvgUniq_Bound_1000, andAvgUniq_Bound_5000, respectively.

4.3. Evaluation criteria

Since the test data are composed of three types of documents:simple, medium, and professional documents, we may measurethe performance of OCSLA and the baselines in distinguishing thedifferent types of documents. We thus develop two evaluation cri-teria that focus on differences of content skill levels and ranking ofdifferent types of documents, which are defined in the followingtwo subsections.

4.3.1. Differences of mean content skill levelsTo measure the differences among means of estimated content

skill levels of simple, medium, and professional documents, weemploy the statistical technique for testing the differences be-tween means of two samples, under the cases where variances ofthe two samples are unknown and unequal:

t-test � X1 � X2ffiffiffiffiffiffiffiffiffiffiffiffiffis2

1n1þ s2

2n2

q ;

where X1 and X2 are mean content skill levels of two documentsamples, respectively; s1 and s2 are standard deviations of contentskill levels of the two samples, respectively; and n1 and n2 are thenumbers of documents of the two samples, respectively. Therefore,a larger t-test value indicates that the assessment strategy is morecapable in distinguishing the two samples of documents. Since thetest data contain three types of documents, we compute three sam-ple pairs: medium vs. simple, professional vs. medium, and profes-sional vs. simple. The resulting t-test values may thus indicate thecapability of the strategies in distinguishing the documents fromdifferent pairs of content skill level types.

4.3.2. Precision of ranking documents by estimated content skill levelsWe are also concerned with the precision of ranking documents

by their content skill levels. Ideally, if test documents are sorted byincreasing order of estimated content skill levels, simple docu-ments should be followed by medium documents, which are thenfollowed by professional documents. Under this case, ranking isperfect, and hence precision is 1.0. When more documents donot follow the perfect ranking, precision degrades. The rankingprecision is essential in practice, since it reflects how the informa-tion reader may easily access those documents with suitable con-tent skill levels.

http://www.pubmed.gov

Performance in t-test values: 0% unknown uniquenesses

0

10

20

30

40

50

60

70

Medium-Simple Professional-Medium Professional-Simple

Pair of document types

t-tes

t val

ue

OCSLA

AvgLogUniq

AvgUniq_Bound_200

AvgUniq_Bound_500

AvgUniq_Bound_1000

AvgUniq_Bound_5000

AvgUniq

Fig. 4. Performance in t-test values: no unknown uniqueness.

Table 4Relative strengths (‘o’) and weaknesses (‘x’) of skill level assessment strategies inrecognizing simple, medium, and professional types of documents.

Strategy Simpledocuments

Mediumdocuments

Professionaldocuments

AvgUniq x x xAvgUniq_Bound_c o x xAvgLogUniq x o oOCSLA o o o


Therefore, for simple documents, we define SLRPSimple (Skill Le-vel Ranking Precision) to be

SLRPSimple �PS

i¼1xi

MþP

S;

where S, M, and P are the numbers of simple, medium, and profes-sional documents, respectively, and xiis the number of medium andprofessional documents whose estimated content skill levels arehigher than that of the ith simple document. Note that, SLRP hasonly one version, since both micro-averaged SLRP and macro-aver-aged SLRP are the same. Similarly, for medium documents, we de-fine SLRPMedium to be

SLRPMedium �PM

j¼1yj

SþP

M;

where yjis the number of simple documents whose estimated con-tent skill levels are lower than that of the jth medium docu-ment + number of professional documents whose estimatedcontent skill levels are higher than that of the jth medium docu-ment. Finally, for professional documents, we define SLRPProfessional

to be

SLRPProfessional �PP

k¼1zk

SþM

P;

where zk = number of simple and medium documents whose esti-mated content skill levels are lower than that of the kth professionaldocument.

4.4. Results and discussion

We separate the discussions of the result by considering thepercentage of unknown uniqueness values of terms. The unique-ness value of a term t is unknown when t is not in the word unique-ness database (refer to Fig. 2) and the search engine is not availableto return a uniqueness value for t. In that case, the input to the sys-tems is incomplete, and hence the systems need to employ theexisting information to make their decisions. The results show thatOCSLA outperforms all the baselines, and the improvement be-comes larger when uniqueness values of more terms are unknown,indicating that OCSLA is more fault tolerant in dealing with incom-plete input, which is common in practice.

4.4.1. Result when uniqueness values of all terms are knownWhen uniqueness values of all terms are known, Figs. 4 and 5

show performance of all skill level assessment strategies underthe criteria of t-test values and skill level ranking precision (SLRP),respectively. Strengths and weaknesses of the strategies are sum-marized in Table 4, which indicates that all the baselines haveweaknesses in recognizing some type(s) of documents, and OCSLA

achieves the best performance in the experiment. We discuss theresults and analyze the reasons for them.

AvgUniq is straightforward but weak in dealing with those rareterms that happen to appear in documents. The rare terms may gethigh uniqueness values, which are actually noises for AvgUniq toestimate content skill levels. Therefore, a rare term in a simple doc-ument may even mislead AvgUniq to assign a high content skill le-vel to the document, making AvgUniq unable to distinguish thedocuments very well.

Although AvgUniq_Bound_c may tackle weaknesses of AvgUniqby setting bounds on uniqueness values (i.e., c), it also incursweaknesses in recognizing professional documents. Avg-Uniq_Bound_c is successful in distinguishing simple documentsfrom medium documents (refer to Fig. 4), but is poor in distin-guishing professional documents from others (refer to Figs. 4 and5). This is because, by setting upper bounds for uniqueness values,AvgUniq_Bound_c may deal with those rare terms that happen toappear in documents (and hence improve AvgUniq), but also incurthe problem of reducing the estimated content skill levels of pro-fessional documents. Therefore, AvgUniq_Bound_c has the best per-formance in distinguishing simple documents from mediumdocuments (in terms of t-test values in Fig. 4), but the overall per-formance for medium and professional documents is not good (interms of both t-test values and SLRP in Figs. 4 and 5, respectively).

Although AvgLogUniq may tackle weaknesses of Avg-Uniq_Bound_c by taking logarithm on uniqueness values, it also in-curs weaknesses in recognizing simple documents. AvgLogUniqimproves the performance in recognizing medium and professionaldocuments (in terms of both t-test values and SLRP), but theimprovement is at the expense of sacrificing the performance indistinguishing simple documents from medium documents. Thisis because taking logarithm reduces the difference between theestimated content skill levels of simple and medium documents.Therefore, AvgLogUniq performs worse than AvgUniq_Bound_c inrecognizing simple documents (in terms of both t-test values andSLRP in Figs. 4 and 5, respectively).

Therefore, the best two baselines (AvgUniq_Bound_c and Avg-LogUniq) tend to focus on either simple documents or professional

Performance in Skill Level Ranking Precision (SLRP): 0% unknown uniquenesses

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Simple Medium Professional

Document type

SLR

P

OCSLAAvgLogUniqAvgUniq_Bound_200AvgUniq_Bound_500AvgUniq_Bound_1000AvgUniq_Bound_5000AvgUniq

Fig. 5. Performance in skill level ranking precision (SLRP): no unknown uniqueness.


0

10

20

30

40

50

60

70

Medium-Simple Professional-MediumProfessional-Simple

Pair of document types

t-tes

t val

ue


Fig. 6. Performance in t-test values: 5% unknown uniqueness.


0.6

0.65

0.7

0.75

0.8

0.85

0.9

Simple Medium Professional

Document type

SLR

P

OCSLA

AvgLogUniq

AvgUniq_Bound_200

AvgUniq_Bound_500

AvgUniq_Bound_1000

AvgUniq_Bound_5000

AvgUniq

Fig. 7. Performance in skill level ranking precision (SLRP): 5% unknown uniqueness.


0

10

20

30

40

50

60

70

Medium-Simple Professional-MediumProfessional-SimplePair of document types

t-tes

t val

ue


Fig. 8. Performance in t-test values: 10% unknown uniqueness.



0.6

0.65

0.7

0.75

0.8

0.85

0.9

Simple Medium ProfessionalDocument type

SLR

P

OCSLA

AvgLogUniq

AvgUniq_Bound_200

AvgUniq_Bound_500

AvgUniq_Bound_1000

AvgUniq_Bound_5000

AvgUniq

Fig. 9. Performance in skill level ranking precision (SLRP): 10% unknown uniqueness.

Improvement provided by OCSLA: average t-test values

0.00

20.00

40.00

60.00

80.00

100.00

0% unknown 5% unknown 10% unknownPercentage of terms with unknown uniqueness

t-tes

t val

ue im

prov

emen

t (%

)

OCSLA vs.AvgLogUniq

OCSLA vs.AvgUniq_Bound_200


Fig. 10. Improvements under various degrees of incomplete input: average t-test values.

Improvement provided by OCSLA: average SLRP

0.00

5.00

10.00

15.00

20.00

25.00

30.00

0% unknown 5% unknown 10% unknown Percentage of terms with unknown uniqueness

SLR

P im

prov

emen

t (%

)

OCSLA vs.AvgLogUniq



Fig. 11. Improvements under various degrees of incomplete input: average SLRP.


documents, but not on both. On the other hand, OCSLA successfullyachieves the best overall performance. It achieves the best SLRP forall types of documents (refer to Fig. 5) by enlarging the skill leveldifferences between professional documents and the other twotypes of documents (i.e., simple and medium documents, refer toFig. 4). The results justify the contributions of context reconciliationand reoccurrence effect reduction (refer to Sections 3.2 and 3.3)which are the main technical differences between OCSLA and thebaselines.

4.4.2. Result when uniqueness values of some terms are unknownWhen uniqueness values of some terms are unknown, all the

systems need to make their decisions using incomplete informa-tion. This case commonly happens if some terms in the document

are not included in the word uniqueness database, and no searchengines may be invoked online.

To measure the contributions of OCSLA under the case ofincomplete information, we remove those terms that have higheruniqueness values in the word uniqueness database, and nosearch engine may be invoked (and hence uniqueness values ofthe terms become unknown). The reason for removing the termswith higher uniqueness values is that, the terms are rarer andhence are more possible to be absent in the word uniquenessdatabase (not seen before). We try two settings for the amountof terms being removed: 5% and 10% of all terms. When process-ing a document, terms with unknown uniqueness values are re-moved, and content skill level assessment is conducted on theresulting document.


Figs. 6 and 7 show the performance of all systems when thereare 5% unknown uniqueness values, while Figs. 8 and 9 show theperformance of all systems when there are 10% unknown unique-ness values. As in the previous case where uniqueness values of allterms are known (refer to Figs. 4 and 5), OCSLA outperforms all thebaselines.

It is interesting to measure how OCSLA provides improvementswhen more uniqueness values are unknown. Therefore, we identifythe best three baselines,5 and compare them with OCSLA. The com-parison is illustrated in Figs. 10 and 11, which show the improve-ments provided by OCSLA in average t-test values and averageSLRP, respectively. The results show that, when more uniquenessvalues are unknown, the improvements become larger. OCSLA isthus more fault tolerant in dealing with incomplete input. Sinceincomplete input is common in practice, the contribution is particu-larly of practical significance.

5. Conclusion

Assessment of content skill levels of medical texts is essential,since readers of different professional backgrounds require medicaltexts of different content skill levels. The assessment should bedone efficiently in order to support online recommendation andwriting of medical information. In this paper, our goal lies inexploring how content skill levels of medical texts may be assessedwithout relying on domain-dependent knowledge and time-con-suming processing. Under this goal, we test several qualifiedassessment strategies, and identify their weaknesses.

We also find that contexts and reoccurrences of each term in amedical text are helpful factors in assessing the text’s content skilllevel. A novel assessment strategy OCSLA is thus developed basedon the two factors. Empirical evaluation on a text corpus fromMedlinePlus shows that OCSLA successfully achieves both betterand more fault-tolerant performance.

The contributions are significant for online recommendation ofmedical information for the readers, based on content skill levels ofthe information and professional backgrounds of the readers. Thecontributions are also significant for health education in whichmedical text writers should control content skill levels of the texts

5 From Table 4 and Figs. 4 and 5, the best three baselines are AvgLogUniq,AvgUniq_Bound_200, and AvgUniq_Bound_500.

so that targeted readers may easily comprehend the contents. OC-SLA complements the recommendation systems and writing sup-port systems online, promoting the utility of medical informationto individual readers.

Acknowledgement

This research was supported by the National Science Council ofthe Republic of China under the grant NSC 95-2218-E-320-006.

References

Ad Hoc Committee on Health Literacy for the Council on Scientific Affairs ofAmerican Medical Association. (1999). Health literacy: Report of the Council onScientific Affairs. Journal of the American Medical Association, 281, 552–557.

Eysenbach, G., & Köhler, C. (2002). How do consumers search for and appraisehealth information on the world wide web? Qualitative study using focusgroups usability tests and in-depth interviews. British Medical Journal, 324,573–577.

Fukuda, K., Tsunoda, T., Tamura, A., & Takagi, T. (1998). Toward informationextraction: Identifying protein names from biological papers. In Proceedings ofthe Pacific symposium on biocomputing’98 (PSB’98) (pp. 707–718).

Gabrilovich, E., Dumais, S., & Horvitz, E. (2004). Newsjunkie: Providing personalizednewsfeeds via analysis of information novelty. In WWW2004.

Microsoft Corporation. (2008). Readability scores: Applies to Microsoft Office Word2003. <http://office.microsoft.com/en-us/word/HP051863181033.aspx?pid=CH060830131033>.

Nenadic, G., Spasic, I., & Ananiadou, S. (2003). Terminology-driven mining ofbiomedical literature. Bioinformatics, 19, 938–943.

Norman, C. D., & Skinner, H. A. (2006). eHEALS: The eHealth literacy scale. Journal ofMedical Internet Research, 8(4), e27.

Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking:Bringing order to the Web. Technical report, Stanford University Database Group.

Rootman, I. (2006). Health literacy: Where are the Canadian doctors? CanadianMedical Association Journal, 175(6), 606–607.

Thomson, M. D., & Hoffman-Goetz, L. (2007). Readability and cultural sensitivity ofweb-based patient decision aids for cancer screening and treatment: Asystematic review. Medical Informatics and the Internet in Medicine, 32(4),263–286.

Zeng, Q. T., Kogan, S., Plovnick, R. M., Crowell, J., Lacroix, E.-M., & Greenes, R. A.(2004). Positive attitudes and failed queries: An exploration of the conundrumsof consumer health information retrieval. International Journal of MedicalInformatics, 73, 45–55.

Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., et al. (2005). Improving web searchresults using affinity graph. In SIGIR’05.

Zhou, G., Zhang, J., Su, J., Shen, D., & Tan, C. (2004). Recognizing names in biomedicaltexts: A machine learning approach. Bioinformatics, 20, 1178–1190.

http://office.microsoft.com/en-us/word/HP051863181033.aspx?pid=CH060830131033

http://office.microsoft.com/en-us/word/HP051863181033.aspx?pid=CH060830131033

online assessment of content skill levels for medical texts

Documents