short text coherence hypothesis

Upload: anita-jankovic

Post on 02-Mar-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Short Text Coherence Hypothesis

    1/14

    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271167995

    Short Text Coherence Hypothesis

    Article in Journal of Quantitative Linguistics August 2016

    Impact Factor: 0.33 DOI: 10.1080/09296174.2016.1142328

    READS

    127

    4 authors:

    Sylvia Poulimenou

    Ionian University

    3PUBLICATIONS 1CITATION

    SEE PROFILE

    Sofia Stamou

    Ionian University

    68PUBLICATIONS 369CITATIONS

    SEE PROFILE

    S. Papavlasopoulos

    Ionian University

    39PUBLICATIONS 69CITATIONS

    SEE PROFILE

    Marios Poulos

    Ionian University

    77PUBLICATIONS 559CITATIONS

    SEE PROFILE

    All in-text references underlined in blueare linked to publications on ResearchGate,

    letting you access and read them immediately.

    Available from: Marios Poulos

    Retrieved on: 13 June 2016

    https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_3https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_2
  • 7/26/2019 Short Text Coherence Hypothesis

    2/14

    [1]

    Short Text Coherence Hypothesis

    Sylvia Poulimenou, Sofia Stamou, Sozon Papavlasopoulos, Marios PoulosLaboratory of Information Technologies

    Faculty of Information Science and Informatics - Ionian University

    IoannouTheotoki 72-CorfuCorresponding Author: [email protected]

    Abstract: In this paper, we experimentally study the degree to which the length of a short text affects its

    comprehensiveness and readability, within quantitative linguistics. The quantitative linguistics focus mainly in

    analysis of large text collections and one of the major scientific theories in use is the Menzerath-Altmann law. In

    this paper we attempt to define the quantitative analysis framework for short texts consisting approximately of one

    or two sentences, due to the fact that they are considered very important in many scientific fields. To achieve the

    aim of this paper, a coherence statistical testing process of 3 variables was created for short texts. The

    implementation of that was possible through experimental and statistical evaluation. Upon completion of the above

    mentioned evaluation, the statistical results showed that short text coherence, comprehensiveness and readability

    are fully achieved in short texts consisting of 14 words, when 3 predetermined variables are associated and vice

    versa. To prove the above hypothesis the theory of Vector Space Model and Kendalls Coefficient of Concordance

    were used. The assessment of statistical results concluded that the above hypothesis can be fully met for a number

    of cases with a probability p>99%. Moreover, in the experiment were used short texts in English language but it

    was proven that language can be considered irrelevant. To corroborate this, a smaller scale experiment with short

    texts in the German language was conducted and hypothesis was confirmed, that the proposed model of this paper

    can be applied in all short texts regardless of their linguistic origin.

    Keywords: Short Text Processing, Vector Space Model, Lexical Coherence

    1.

    Introduction

    In the quantitative linguistics theory, the lexical coherence of texts regarding word distribution is considered as a

    very important scientific field. According to Carstens (2001) text linguistics coherence is defined as the ways inwhich components of the sentences of a text, i.e. the words we actually hear and use, are mutually connected

    (grammatically and lexically). Haliday and Hasan (1976) underline cohesion as a semantic relation between one

    element and another in the text and some other element that is crucial to the interpretation of it. In addition as

    mentioned by Fahnestock (1983), coherence derived from correct text composition is a crucial element so that

    ideas can flow smoothly throughout the text and so that readability can maintain high levels in comprehending the

    texts meaning. Based on Richards et al. (1992), readability means: how easily written materials can be read and

    understood. This depends on several factors including the average length of sentences, the number of new words

    contained, and the grammatical complexity of the language used in a passage. Moreover, the concept of

    readability is associated with the concept of comprehensiveness. Sparks (2012) mentions that discourse

    comprehension involves building meaning from extended segments of language. Moreover that successfully

    comprehending larger unit of text and discourse requires making inferences to connect ideas both within and

    across local and global discourse contexts.

    As mentioned in Eroglu (2013) linguistic organization in texts can be accomplished, where the Menzerath-

    Altmann law (MA) exists in Altmann, G. (1980). MA law is considered a fundamental law in quantitative

    linguistics, where one can observe the relationships between the size of the whole and the size of the parts in

    language according to Baixeries et al. (2013). In short, according to MA law in Eroglu S. (2013), the longer a

    linguistic construct the shorter its constituents, where a construct is considered to be the whole and a constituent to

    be a part of the whole. MA law is a basic and important law in quantitative linguistics where the main focus is in

    statistical analysis of large texts. In particular, as mentioned by Hebek (2002), MA law does not apply in

    extremely short texts whereas short texts can be considered as a sentence or a complex sentence. However, short

    texts are considered very important in many scientific fields, mainly in online communication and e-commerce but

    also in quick searching in internet. According to Ge Song et al. (2014) it is considered a big challenge to classify

    short texts because their limited number of words cannot represent either the feature space or the actual

    relationships between words and documents. Because short texts have small word length Xiaojun Quan (2009)

    explains, similarity measure cannot be applied successfully due to the lack of word co-occurrence or sharedcontext.

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

    https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==
  • 7/26/2019 Short Text Coherence Hypothesis

    3/14

    [2]

    Its a known fact that people use sentences in order to communicate with each other successfully and certain

    parameters need to be taken under consideration, due to the fact that correct sentencing is quite an important issue.

    In successful communication there has to be an average length of the sentence, so that it is not confusing, or

    complicated. Thus, the lexical coherence of the sentence has been introduced empirically by Kornai (2008), in

    journalistic prose that the median sentence length is above 15 words. Moreover, Titelov (1992) mentions that

    during the 80s there was interest in scientific research of the sentence in the context of syntactic phenomena.

    Furthermore, the complexity of sentence meaning depends on the length of the sentence. According to Cutts(2009) the average sentence length lies in between 15-20 words, in order to maintain readability, when the average

    length cannot always be achieved. Taskar B. et al. (2004) in their work regarding descriptive parsing and dynamic

    approach techniques undertake their experiments by setting a restriction of sentence length equal or less than 15

    words. Other scientific fields that express interest regarding sentence length in research considering memory are

    cognitive psychology and neuroscience. Baddeley (2003)in his three part model for working memory encountered

    problems in interaction to long term memory, where the limitation of 15 words per sentence in mentioned again. In

    their experiment Daveman and Carpenter (1980) support that it requires about 5 seconds for a person to read a

    sentence. In addition Anderson et al. (2001) in their study concerning sentence memory, where they describe

    various models of memory, observed that in order for the brain to process a word, it takes a few hundred

    milliseconds. The same research refers to an experiment conducted by Zimny (1987) where a word by word

    presentation procedure was completed, with 300 ms per word. Therefore, from the above mentioned researches

    one can draw the conclusion that a typical sentence must contain approximately 16 to 17 words. Taking everything

    into account, coherence analysis in short texts and especially in sentences can be considered as a very important

    scientific field to be explored. The lexical coherence in sentences has been empirically observed and placed

    between 15 to 20 words per sentence.

    The aim of this paper is to establish a statistical hypothesis that corroborates the empirical observations regarding

    the coherence of short texts or of a sentence. That way the gap left from the MA law considering short texts can be

    filled and this study can set the cornerstone in short text analysis.

    For the implementation of the above, we tracked variables which are considered crucial regarding text coherence.

    Text coherence was examined regarding the impact in terms of each constituent to the construct. That correlation

    was feasible through the use of three variables. The proof of the above hypothesis was realized through the use of

    Kendalls coefficient of concordance. The conclusions emerged from that methodology introduce an innovation in

    the field of computational linguistics because the above experimental linguistics observations about short text

    stability around 15 constituents are verified, which demonstrates statistically the correlation of the short text

    through those 3 variables.This paper is divided to the following parts:

    a) Methodology, where the algorithm is presented fully in detail. Moreover its statistical evaluation.

    b) Experimental part, which is implemented via the application of the algorithm on a wide sample of short

    texts.

    2. Method

    2.1 Vector Space Model

    According to Salton, Wong and Yang (1975) documents can be represented as vectors, in order to index them and

    find their degree of similarity. As noted by Raghavan and Wong (1986) vectors are quite useful since they obey

    basic axioms and algebraic rules. Vector Space Model (VSM) is used in several scientific fields such as

    information filtering and relevancy rankings. According to Turney and Patel (2010)with the usage expansion of

    VSM to semantic tasks considering language processing, brilliant results can be found. VSM is an algebraic model

    which makes possible the representation of any text object (term), such as document, sentence, clause, phrase,

    word and morpheme. The VSM representation can be analyzed in to three steps.

    In the first step, the content bearing terms (which are typically words or short phrases) are extracted creating the

    document indexing. This indexing is executed via two alternative methods the non - linguistic. The linguistic

    methods are based on gathering function words containing high and low frequency which are reflected in

    document semantically. On the other hand non - linguistic methods are based on different indexing procedures

    such as probabilistic indexing and automatic indexing.

    In the second step, the weighting of the indexed terms is created according to relevance to the user in a possible

    retrieval procedure. Term weighting has been applied by testing the sensitivity and specificity of the search, where

    the specificity is related to precision and sensitivity to recall. There are three dominant variables term weighting,

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

    https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==
  • 7/26/2019 Short Text Coherence Hypothesis

    4/14

    [3]

    which are related with the term frequency, the collection frequency and the length normalization. These three

    variables are combined via propagation way in order to make the resulting term weight.

    As an example, the definition of the algebraic expression of a text with VSM is possible by using the following

    equation:

    1, 2, ,, ...,j j j t jV w w w

    (1)

    Where j stands for the number of the constituents of a text and t represents the number of weights (variables)

    which are defined by the model.

    Finally, in the third step, the text is ranked according to similarity measure with respect to the query. The similarity

    in VSM is estimated by using connective variables based on the normalized inner product between text vector and

    query vector, where constituent overlap indicates similarity. The inner product is usually normalized. The most

    current similarity measure is the cosine variable, which measures the angle between the text vector and the query

    vector.

    2.2 The Basis of the Algorithm

    The variable extraction is based on VSM theory. In more details, in first step (section 2.1) we select a short text as

    construct which is as sentence or complex sentence and we considered the word as the constituent of the construct.Subsequently according to second step (see section 2.1) a non-linguistic approach is adopted because all the

    constituents of the construct are used with an indexing procedure which depends on their order position. In details,each constituent obtains particular weighting independently from its number of appearances in the construct, sot=j.

    In our case, the three (3) dominant variables term weighting are replaced by the variables of the following vector

    jj jj jj jjw i s k

    (2)

    Where, iis the order of the constituent in a short text, kis the number of characters and sis an encoding measure

    and for simplification reason this is defined by the ASCII encoding procedure (Poulos, Papavlasopoulos,Chrissikopoulos 2006),

    Then the equation (1) is transformed into equation (3)

    1, 2, ,, ...,j j j j jV w w w

    (3)

    For normalization reason, the vectorjV

    is obtained by the equivalent vector

    j

    j

    j

    VF

    V

    (4)

    Furthermore, vectorsV

    is depicted as the resultant vector and from now on will be addressed as short text vector

    (see figure 1).

    s j

    j

    V F

    (5)

    2.3 Similarity Criterion

    The degree of correlation between thejF

    ands

    V

    can be extracted by equation 6 and specifically of factor rwhich

    represents the inner product between text vector sV

    and query vectorjF

    according to step 3 (see section 2.1).

    Also, this procedure is expressed by the general consideration regarding to document similarities theory (Harispe

    et al. 2013)

    ( ) ( )cosj sj j

    j s

    F Vr

    F V

    (6)

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

    https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==
  • 7/26/2019 Short Text Coherence Hypothesis

    5/14

    2.4

    A ve

    equa

    in th

    reprto its

    2.5

    The

    are a

    dem

    M r

    cons

    In theach

    hypo

    the e

    Fig. 1 The d

    ransformati

    ctor jjU

    is ad

    tion 7). The re

    e semantic theo

    sents the influinfluence.

    Statistic Fou

    statistical corro

    ssociated. This

    nstrated in the

    presents the

    ituents in the s

    is case the statother, in orde

    thesis oH in

    xact opposite.

    epiction of co

    on of variab

    opted instead

    son of this rep

    ry according t

    nces degree t

    ndation

    boration of thi

    is possible by

    formula belo

    umber of va

    hort text.

    istic control isr to define th

    icated that the

    herefore, the

    stituent vecto

    les

    f vector jjw

    lacement took

    (Harispe et al

    hrough variabl

    jjU r

    transformatio

    applying Ken

    :

    iables being c

    based on thenumber of c

    three variable

    hi square test i

    23

    rx

    [4]

    r (F1, F2) and

    here the i va

    lace because

    . 2013).The re

    e rbecause t

    j jj jjs k

    is achieved b

    alls Coefficie

    2

    1

    2

    n

    j

    j

    R

    FM

    orrelated, in

    here

    1

    n

    jR

    ull hypothesisonstituents tha

    are not assoc

    s defined by:

    *( 1)*n F

    resultant vect

    iable of jjw

    is

    ariable ris c

    placement was

    is will lead in

    testing wheth

    t of Concorda

    1

    2 1

    n

    j

    j

    R

    n

    his case M=3

    (rank

    1

    n

    j

    d

    that these thrt influence th

    iated and the a

    ( , , )r s k

    r vs in the Eu

    replaced by v

    nsidered as a

    made delibera

    a new ranking

    er the three (3)

    nce (Jerrold H.

    and n stands

    )j

    e variables arparticipating

    lternative hypo

    lidean plane

    riable rof U

    ery important

    ely since vecto

    procedure acc

    variables ( r,. Zar 1999), w

    for the num

    not associateshort text. Th

    thesis AH in

    j (see

    factor

    r jU

    rding

    (7)

    , k)

    ich is

    (8)

    er of

    (9)

    (10)

    withe null

    icates

    (11)

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    6/14

    [5]

    The value of chi-square cumulative distribution is extracted by using as degree of freedom 1n and iscalculated by equation 12.

    2

    0 2

    2

    21

    22

    t

    x t e

    P

    (12)

    This test is a one-tailed test because we search the appropriate constituent number for a short text. Then, the null

    hypothesiso

    H is accepted when 0.001P and consequentlyA

    H is satisfied with 0.001P .

    3. Experimental Part

    This experiment lies in the following steps:

    1. At first a sample was to be gathered in order to apply the algorithm. The data set of this research consists

    of 100 short texts extracted from abstracts of scientific articles. The scientific articles were acquired in pdfform from the Directory of Open Access Journals (DOAJ), all coming from the biomedicine field,

    regarding their subject. By using a common browser it was possible to visit the website of Directory ofOpen Access Journals (DOAJ) and to download all 100 articles in a local folder.

    2. Continuing, 100 short texts were selected, where each one of them had an average length of more than 25constituents. Each short text was processed until the length of 25 constituents and not further above due tothe reasons below:

    Since this algorithm is based on VSM, there is limitation considering long texts. Their representation

    cannot be successful because in such length there can be found little similarity variables according to

    Salton (1975).

    Another reason for choosing for the experiment the limit of 25 constituents resulted from the

    experimental procedure, which showed that above the 25th constituent results did not show a

    substantial change.

    Finally that limitation occurs to the whole sample since the selected sample must be homogeneous in

    order to derive valid and objective conclusions on a general level.

    Finally, the compilation of the algorithm and the implementation process were carried out by the use of MATLAB

    software.

    3.1 Implementation Algorithm and Statistical Process

    In this section the implementation of the proposed algorithm will be presented by using as an example the

    following short text (see Table 1):

    Many theorists have suggested that working memory capacity plays a crucial role in reading comprehension

    however, traditional measures of short-term memory, like digit span and word span, are either not correlated or

    only weakly correlated with reading ability

    In the implementation part by equations 2-6, the variables data (r,s,k) are extracted (see figure 7). In Kendalls

    Coefficient of Concordance procedure the variables data (r, s, k) are ranked and Rank (Rj) sums are estimated for

    each constituent by equations 8-10.

    For example the constituent many, which is the first word on the above mentioned example, corresponds to a

    vector with deviation angle equal to 1.6435 in relevance with the resultant vector of the short text. S equals to

    value 4 (number of characters) and k according to ASCII encoding corresponds to 405. In the same way, the values

    for all 37 constituents of the short text are extracted, as one can see in Table 1.

    After the process that was analyzed above, the chi square value with (38-1=37) degrees of freedom is calculated by

    equation 2 and from this value the cumulative probability 3.5055e-06P is obtained through equation 12.

    Finally, by using the one-tailed probability test for 00.001H P

    , the null hypothesis is accepted, indicating thatthere is no association among the three variables (r, s, k).

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    7/14

    [6]

    Table 1. Algorithm Implementation-Kendalls Coefficient of Concordance-Decision H0(an example short text)

    Words r s k Sums of Rjj Data rank data rank data rank

    1 1.6435 28 4 12.0 405 8.0 48.0000

    2 1.6699 29 9 32.5 997 33.0 94.5000

    3 1.3755 23 4 12.0 420 9.0 44.0000

    4 1.5487 26 9 32.5 971 32.0 90.5000

    5 1.1232 16 4 12.0 433 11.0 39.0000

    6 1.3378 22 7 25.5 769 28.0 75.5000

    7 1.1818 18 6 21.0 665 22.0 61.0000

    8 1.2430 21 8 30.0 846 30.0 81.0000

    9 0.8526 11 5 18.0 553 19.0 48.0000

    10 4.1013 37 1 1.0 97 1.0 39.0000

    11 0.9320 12 7 25.5 739 26.0 63.5000

    12 0.2012 4 4 12.0 434 12.5 28.5000

    13 1.6753 30 2 3.0 215 3.0 36.0000

    14 0.6862 7 7 25.5 730 24.5 57.0000

    15 1.1718 17 13 38.0 1402 38.0 93.0000

    16 0.6565 6 8 30.0 812 29.0 65.0000

    17 0.9587 13 11 37.0 1179 37.0 87.0000

    18 0.5982 5 8 30.0 869 31.0 66.000019 3.3125 35 2 3.0 213 2.0 40.0000

    20 0.6884 8 10 35.0 1045 34.0 77.0000

    21 0.0927 1 7 25.5 709 23.0 49.5000

    22 1.2065 20 4 12.0 421 10.0 42.0000

    23 0.7047 9 5 18.0 529 18.0 45.0000

    24 1.3804 24 4 12.0 434 12.5 48.5000

    25 2.8707 33 3 6.0 307 5.0 44.0000

    26 1.5666 27 4 12.0 444 14.5 53.5000

    27 1.4494 25 5 18.0 478 17.0 60.0000

    28 3.3433 36 3 6.0 312 6.0 48.0000

    29 0.8056 10 6 21.0 641 20.0 51.0000

    31 3.3023 34 3 6.0 337 7.0 47.0000

    32 0.1112 3 10 35.0 1061 35.5 73.5000

    33 6.3095 38 2 3.0 225 4.0 45.000034 2.4095 31 4 12.0 450 16.0 59.0000

    35 1.1958 19 6 21.0 653 21.0 61.0000

    36 0.1046 2 10 35.0 1061 35.5 72.5000

    37 2.8506 32 4 12.0 444 14.5 58.5000

    38 1.1168 15 7 25.5 730 24.5 65.0000

    F 0.8023 2 89.06rx 3.5055e-06P accept for 0.001P

    3.2 Iteration Procedure

    By using the same example of short text the algorithm38

    3

    JU

    is applied. The iteration procedure is carried out

    with j=3:1:38 words. In other words, the original short text is segmented into short texts with various lengths

    beginning from 3 constituents up to 38 and this procedure is executed for each of the above short texts (see section

    3.1). Then the chi-square test is applied (35) thirty five times (see figure 2) and the cumulative probabilities are

    estimated as well as the control of hypothesis testing (see figure 3).

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    8/14

    [7]

    Fig. 2 The Chi-Square Distribution Function according to iterative procedure of the segmented short text

    Fig. 3 The cumulative probabilities and the control of hypothesis testing

    3.3 Iteration Procedure in the Data Set

    Iteration procedure for each short text in a range of 3 up to 25 constituents is executed totally (22) twenty two

    times. The procedure is executed for a sample of 100 data sets, in total 2200 calculations. Then the chi-square test

    of the data set is represented (see figure 4) and the cumulative probabilities as well as the control of hypothesis

    testing are presented in figures 5 and 6 respectively.

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    9/14

    [8]

    Fig. 4 The cumulative probabilities and the control of hypothesis testing

    Fig. 5 Data Set of Probability Cumulative Distribution

    Fig. 6 The Cumulative Probabilities and the Control of Hypothesis Testing

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    10/14

    [9]

    By conducting an assessment for the above statistical results they are divided into 3 parts and make the followingobservations:

    1. As one can see the distribution a, x in figure 4 The cumulative probabilities and the control of

    hypothesis testing via the experimental function, coherence is observed for the whole sample of 100short texts at the sentence length of 14 constituents. From 14 constituents and more one can observe a

    lack of coherence at the experimental function.

    2.

    According to the experiment as represented in figure 5 Data Set of Probability CumulativeDistribution, all estimated probabilities using equation 11 reject the null hypothesis for a short textlength equal to 14 constituents, with a probability, p

  • 7/26/2019 Short Text Coherence Hypothesis

    11/14

    [10]

    7 21 0 7

    7 22 0 7

    6 23 0 6

    6 24 0 6

    3 25 0 33 26 0 3

    2 27 0 2

    We present an example derived from table 2 in the following sentence:

    Die Computerindustrie htte nach Michael Levitt einen Teil des Nobelpreises fr Chemie 2013 vedient denn ihre

    Forschungs und Entwicklungsleistung hatte zu drastisch hheren Rechengeschwindigkeiten gefhrt (siehe

    Tabelle)

    Fig. 7 The German Case of the Cumulative Probabilities and the Control of Hypothesis Testing

    Fig. 8 Data Set of Probability Cumulative Distribution from ten (10) German short Texts

    3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Number of Words

    CumulativeProbabilities

    3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

    Number of Words

    PValues

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

  • 7/26/2019 Short Text Coherence Hypothesis

    12/14

    [11]

    As can be seen in Table 2 and in figures 7 and 8, the experiments results conducted with short texts from the

    German are in agreement with the main large scale experiment that was carried out in the beginning of the

    experimental part. That proves that constituents of a short text in the proposed model can be considered as

    morphs, where the linguistic origins of the short text are disregarded and therefore the language of the text is not

    significant. More specifically, according to Figures 7 and 8 the cumulative probability begins to declinedrastically between 15th and 16thword , where p

  • 7/26/2019 Short Text Coherence Hypothesis

    13/14

    [12]

    the possible relation between the number of variables and the constituents number in extended fields (such

    biology) should be considered as an axis to the systems coherence and as the upcoming scientific priority.

    References

    Altmann, G. (1980). Prolegomena to Menzeraths Law. Glottometrika 2, 110. Bochum: Brock-meyer,

    Anderson J. R., Budiu R. and Reder L. M. (2001). A theory of sentence memory as part of a General Theory ofMemory. Journal of Memory and Language. 45. 337-336.

    Baddeley A. (2003). Working memory: looking back and looking forward. Nature reviews - neuroscience. 5. 829-839.

    Baixeries J. et al. (2013). The parameters of the Menzerath-Altmann Law in genomes. Journal of Quantitative

    Linguistics. 20 (2). 94104.

    Baixeries J., Hernndez-Fernndez A., Ferrer-i-Cancho R. (2012), Random models of MenzerathAltmann law ingenomes. Biosystems. 107 (3), 167173.

    Carstens W. (2001). Text Linguistics: relevant linguistics? Poetics and linguistics; discources of war andconflictConference. 588-595.

    Cutts, M. (2009). Oxford guide to plain English. 3rded. Oxford: Oxford University Press.

    Daveman M., Carpenter P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior. 19. 450-466.

    Eroglu S. (2013).MenzerathAltmann law for distinct word distribution analysis in a large text. Physica A. 392

    (12). 27752780

    Fahnestock J. (1983). Semantic and Lexical Coherence. College Composition and Communication. 34 (4). 400-416. National Council of Teachers of English.

    Forns N. et al. (2013). The challenges of statistical patterns of language: The case of Menzerath s law in genomes.Complexity. 18 (3). 1117.

    Ge Song et al (2014). Short Text Classification: A Survey. Journal of Multimedia. 9 (5). 635-643. AcademyPublisher.

    Halliday, M.A.K., Hasan R. (1976). Cohesion in English. London: Longman.

    Harispe S. et al. (2013). Semantic measures for the comparison of units of language, concepts or entities from text

    and knowledge base analysis. Arxiv. 1310. 1285. 1-159.

    Harispe, S. et al. (2013). The semantic measures library and toolkit: fast computation of semantic similarity andrelatedness using biomedical ontologies. Bioinformatics. 30 (5). Oxford: Oxford University Press.Kornai A. (2008). Mathematical linguistics. [online]. Advanced Information and Knowledge Processing. London:

    Springer.

    Ludk Hebek (2002). Zipfs Law and Text. Glottometrics. 3. 27-38. Ram Verlag.

    Poeppel, D., Embick, D. (2005). Defining the relation between linguistics and neuroscience. Twenty-first Century

    Psycholinguistics: Four Cornerstones. 103118.

    Poulos M., Papavlasopoulos S., Chrissikopoulos V. (2006). A text categorization technique based on a numerical

    conversion of a symbolic expression and an onion layers algorithm . Journal of Digital Information. 6 (1).

    Raghavan V. V., Wong S. K. M. (1986). A critical analysis of vector space model for informationretrieval. Journal of the American Society for Information Science. 37 (5). 279-287.

    Richards J. C., Platt J., Platt H. (1992). Longman dictionary of language teaching and applied linguistics. London:

    LongmanSalton G., Wong A., and Yang C. S. (1975). A vector space model for automatic indexing. Communications of the

    ACM. 18 (11). 613-620.

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

    https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==
  • 7/26/2019 Short Text Coherence Hypothesis

    14/14

    [13]

    Sparks J.R. (2012). Language/discourse comprehension and understanding. Encyclopedia of the Learning Sciences. 1713-1717. Springer.

    Taskar B. et al. (2004). Max-Margin Parsing. Proceedings of EMNLP 2004. 1-8.

    Titelov M. (1992). Quantitative linguistics. Linguistics and literary studies in Eastern Europe (LLSEE). 37.

    Philadelphia: John Benjamins

    Turney P. D., Pantel P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial

    Intelligence Research. 37 (1). 141-188.

    Xiaojun Quan, Gang Liu et al. (2009). Short text similarity based on probabilistic topics. Knowl Inf Syst. 473-491.

    Zar Jerrold H. (1999). Biostatistical Analysis. 4th ed. Upper Saddle River, N.J.: Prentice Hall.

    Zimny, S. T. (1987). Recognition memory for sentences from a discourse. Unpublished doctoral dissertation,Boulder: University of Colorado.

    Journal of Quantitative Linguistics

    Acceptance Date 5 Dec. 2014

    in press -Volume 22 Issue 3

    Taylor & Francis Group

    https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==