transformational grammars

Download Transformational grammars

Post on 09-Jan-2016

71 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Transformational grammars. Anastasia Berdnikova & Denis Miretskiy. Overview. Transformational grammars – definition Regular grammars Context-free grammars Context-sensitive grammars Break Stochastic grammars Stochastic context-free grammars for sequence modelling. - PowerPoint PPT Presentation

TRANSCRIPT

  • Transformational grammarsAnastasia Berdnikova&Denis Miretskiy

    Transformational grammars

  • OverviewTransformational grammars definitionRegular grammarsContext-free grammarsContext-sensitive grammarsBreakStochastic grammarsStochastic context-free grammars for sequence modelling

    Transformational grammars

  • Why transformational grammars?The 3-dimensional folding of proteins and nucleic acidsExtensive physical interactions between residues

    Chomsky hierarchy of transformational grammars [Chomsky 1956; 1959]Application to molecular biology [Searls 1992; Dong & Searls 1994; Rosenblueth et al. 1996]

    Transformational grammars

  • IntroductionColourless green ideas sleep furiously.Chomsky constructed finite formal machines grammars.Does the language contain this sentence? (intractable) Can the grammar create this sentence? (can be answered).TG are sometimes called generative grammars.

    Transformational grammars

  • DefinitionTG = ( {symbols}, {rewriting rules - productions} ){symbols} = {nonterminal} U {terminal} contains at least one nonterminal, terminals and/or nonterminals.S aS, S bS, S e (S aS | bS | e)Derivation: S=>aS=>abS=>abbS=>abb.

    Transformational grammars

  • The Chomsky hierarchyW nonterminal, a terminal, and strings of nonterminals and/or terminals including the null string, the same not including the null string.regular grammars: W aW or W acontext-free grammars: W context-sensitive grammars: 1W2 12. AB BAunrestricted (phase structure) grammars: 1W2

    Transformational grammars

  • The Chomsky hierarchy

    Transformational grammars

  • AutomataEach grammar has a corresponding abstract computational device automaton.Grammars: generative models, automata: parsers that accept or reject a given sequence.- automata are often more easy to describe and understand than their equivalent grammars. - automata give a more concrete idea of how we might recognise a sequence using a formal grammar.

    Transformational grammars

  • Parser abstractions associated with the hierarchy of grammars----------------------------------------------------------------------GrammarParsing automaton----------------------------------------------------------------------regular grammarsfinite state automatoncontext-free grammarspush-down automatoncontext-sensitive grammarslinear bounded automatonunrestricted grammarsTuring machine----------------------------------------------------------------------

    Transformational grammars

  • Regular grammarsW aW or W asometimes allowed: W eRG generate sequence from left to right (or right to left: W Wa or W a)RG cannot describe long-range correlations between the terminal symbols (primary sequence)

    Transformational grammars

  • An odd regular grammarAn example of a regular grammar that generates only strings of as and bs that have an odd number of as:start from S,S aT | bS,T aS | bT | e.

    Transformational grammars

  • Finite state automataOne symbol at a time from an input string.The symbol may be accepted => the automaton enters a new state.The symbol may not be accepted => the automaton halts and reject the string.If the automaton reaches a final accepting state, the input string has been succesfully recognised and parsed by the automaton.{states, state transitions of FSA}{nonterminals, productions of corresponding grammar}

    Transformational grammars

  • FMR-1 triplet repeat regionHuman FMR-1 mRNA sequence, fragment. . . GCG CGG CGG CGG CGG CGG CGG CGG CGGCGG CGG AGG CGG CGG CGG CGG CGG CGG CGGCGG CGG AGG CGG CGG CGG CGG CGG CGG CGGCGG CGG CTG . . .

    12345678Sacggccggctg

    Transformational grammars

  • Moore vs. Mealy machinesFinite automata that accept on transitions are called Mealy machines.Finite automata that accept on states are called Moore machines. (HMM)The two types of machines are interconvertible:S gW1 in the Mealy machine S g1, 1 gW1 in the Moore machine.

    Transformational grammars

  • Deterministic vs. nondeterministic automataIn a deterministic finite automaton, no more than one accepting transition is possible for any state and any input symbol. An example of nondeterministic finite automaton FMR-1.Parsing with deterministic finite state automaton is extremely efficient [BLAST.]

    Transformational grammars

  • PROSITE patternsRU1A_HUMAN S R S L K M R G Q A F V I F K E V S S A TSXLF_DROME K L T G R P R G V A F V R Y N K R E E A QROC_HUMAN V G C S V H K G F A F V Q Y V N E R N A R ELAV_DROME G N D T Q T K G V G F I R F D K R E E A T RNP-1 motif[RK] G {EDRKHPCG} [AGSCI] [FY] [LIVA] x [FYM].A PROSITE pattern = pattern element - pattern element - ... - pattern element. In a pattern element, a letter indicates the single-letter code for a amino-acid, [] any one of enclosed residues can occur; {} anything but one can occur, x any residue can occur at this position.

    Transformational grammars

  • A regular grammar for PROSITE patternsS rW1 | kW1W1 gW2 W2 [afilmnqstvwy]W3W3 [agsci]W4W4 fW5 | yW5W5 lW6 | iW6 | vW6 | aW6 W6 [acdefghiklmnpqrstvwy]W7W7 f | y | m[ac]W means aW | cW

    Transformational grammars

  • What a regular grammar cant doRG cannot describe language L when:L contains all the strings of the form aa, bb, abba, baab, abaaba, etc. (a palindrome language).

    L contains all the strings of the form aa, abab, aabaab (a copy language).

    Transformational grammars

  • Regular language: a b a a a b Palindrome language: a a b b a a

    Copy language: a a b a a b

    Palindrome and copy languages have correlations between distant positions.

    Transformational grammars

  • Context-free grammarsThe reason: RNA secondary structure is a kind of palindrome language.The context-free grammars (CFG) permit additional rules that allow the grammar to create nested, long-distance pairwise correlations between terminal symbols.S aSa | bSb | aa | bbS => aSa => aaSaa => aabSbaa => aabaabaa

    Transformational grammars

  • A context-free grammar for an RNA stem loopseq 1seq 2seq 3A AC AC A G A G A G A C A G G A A A C U G seq 1 GCUA U x C G C U G C A A A G C seq 2AUCG C x U G C U G C A A C U G seq 3CGGC G x G x S aW1u | cW1g | gW1c | uW1a, W1 aW2u | cW2g | gW2c | uW2aW2 aW3u | cW3g | gW3c | uW3a, W3 gaaa | gcaa

    Transformational grammars

  • Parse treesRoot start nonterminal S, leaves the terminal symbols in the sequence, internal nodes are nonterminals.The children of an internal node are the productions of it.Any subtree derives a contiguous segment of the sequence. S5 3S S C G G CW1 W1 A U G CW2 W2 G C U AW3 W3 G A G Ac a g g a a a u g g g u g c a a a c c A A C A

    Transformational grammars

  • Parse tree for a PROSITE patternParse tree for the RNP-1 motif RGQAFVIF. Regular grammars are linear special cases of the context-free grammars. Parse tree for a regular grammar is a standard linear alignment of the grammar nonterminals into sequence terminals.SW1W2W3W4W5W6W7r g q a f v i f

    Transformational grammars

  • Push-down automataThe parsing automaton for CFGs is called a push-down automaton.A limited number of symbols are kept in a push-down stack.A push-down automaton parses a sequence from left to right according to the algorithm. The stack is initialised by pushing the start nonterminal into it. The steps are iterated until no input symbols remain.If the stack is empty at the end then the sequence has been successfully parsed.

    Transformational grammars

  • Algorithm: Parsing with a push-down automatonPop a symbol off the stack.If the poped symbol is nonterminal:- Peek ahead in the input from the current position and choose a valid production for the nonterminal. If there is no valid production, terminate and reject the sequence.- Push the right side of the chosen production rule onto the stack, rightmost symbols first.If the poped symbol is a terminal: - Compare it to the current symbol of the input. If it matches, move the automaton to the right on the input (the input symbol is accepted). If it does not match, terminate and reject the sequence.

    Transformational grammars

  • Parsing an RNA stem loop with a push-down automatonInput string StackAutomaton operation on stack and inputGCCGCAAGGC SPop S. Peek at input; produce S g1c.GCCGCAAGGC g1cPop g. Accept g; move right on input.GCCGCAAGGC 1cPop 1. Peek at input; produce 1 c2g.GCCGCAAGGC c2gcPop c. Accept c; move right on input.GCCGCAAGGC 2gcPop 2. Peek at input; produce 2 c3g.GCCGCAAGGC c3ggcPop c. Accept c; move right on input.(several acceptances)GCCGCAAGGC cPop c. Accept c; move right on input.GCCGCAAGGC -Stack empty. Input string empty. Accept.

    Transformational grammars

  • Context-sensitive grammarsCopy language: cc, acca, agaccaga, etc.initialisation:S CWterminal generation:nonterminal generation:CA aCW AW | GW | CCG gCnonterminal reordering: C CaG G C CgA Atermination:A ACC ccG G

    Transformational grammars

  • Linear bounded automatonA mechanism for working backwards through all possible derivations:either the start was reached, or valid derivation was not found.Finite number of possible derivations to examine.Abstractly: tape of linear memory and a read/write head.The number of possible derivations is exponentially large.

    Transformational grammars

  • NP problems and intractabilityNondete