com2010 - functional programming regular expressions and abstract data types marian gheorghe lecture...
TRANSCRIPT
Com2010 - Functional Programming
Regular Expressions and Abstract Data Types
Marian Gheorghe
Lecture 14
Module homepage Mole & http://www.dcs.shef.ac.uk/~marian
©University of Sheffieldcom2010
Functions over the type of REs are defined by recursion over the structure of the expression. Example
literals :: RegExp -> [Char]
literals Epsilon = []
literals (Literal ch) = [ch]
literals (Or r1 r2) = literals r1 ++ literals r2
literals (Then r1 r2) = literals r1 ++ literals r2
literals (Star r) = literals r
which shows a list of the literals (characters) occurring in
a RE.
re1 denoting (‘a’|(‘b’’c’)); ie re1 = Or a (Then b c)
Leads to literals re1 ⇒ "abc"
RE - Examples
©University of Sheffieldcom2010
REs are patterns and we may match a word w against each RE.
w will match the empty word if it is epsilon
x w will match x if it is an arbitrary ASCII character
(r1|r2) w will match (r1|r2) if w matches either r1 or r2 (or both).
(r1r2) w will match (r1r2) if w can be split into two subwords w1 and w2, w = w1++w2, so that w1 matches r1 and w2 matches r2
(r)* w will match (r)* if w can be split into zero or more subwords, w = w1++w2++… wn, each of which matches r. The zero case implies that the empty string will match (r)* for any re r
Matching REs
©University of Sheffieldcom2010
The first three cases are a simple transliteration of the definitionsmatches :: RegExp->String->Bool
matches Epsilon st = (st=="")
matches (Literal ch) st = (st==[ch])
matches (Or x y) st = matches x st || matches y st
Catenation needs an auxiliary function:splits :: String->[(String,String)]
splits st = [(take n st, drop n st)|n<-[0..length st]]
splits "123" [("","123"),("1","23"),("12","3"),("123","")]
Catenationmatches (Then x y) st = foldr (||) False
[matches x st1 && matches y st2| (st1,st2)<-splits st]
Matching against REs
©University of Sheffieldcom2010
matches (Star r) st = matches Epsilon st ||
foldr (||) False
[matches r st1 && matches (Star r) st2|
(st1,st2)<-splits st]
This uses the fact that Star r means Epsilon or r or Star r
Examples:Matches (Or Epsilon (Then a (Then b c))) "abc"
⇒ Truematches (Star (Or Epsilon b)) "b"
⇒ ERROR - Control stack overflow -- OOPS!!
The problem is that once discovered the empty word (the first equation) it should be removed from the set of strings produced by further splitting the word, i.e. to avoid tuples ([],st).
Matches against Star r
©University of Sheffieldcom2010
matches (Star r) st = matches Epsilon st ||
foldr (||) False
[matches r st1 && matches (Star r) st2|
(st1,st2)<-frontSplits st]
wherefrontSplits :: String -> [(String,String)]
frontSplits st =[(take n st,drop n st)|
n<-[1..length st]]
matches (Star (Or Epsilon b)) "b“⇒ True
“b” has been successfully matched against the RE ( | ‘b’)*.
Matches against Star r - again
©University of Sheffieldcom2010
21.1 Representing Rationals
21.2 Haskell modules
Abstract Data Types
©University of Sheffieldcom2010
Data are abstract in the sense that the programmer does not need to care about how they are implemented
Their implementation is abstracted and hidden from the user. All that the programmer needs to know are the generic operations for constructing and manipulating elements of the data type at hand.
Data abstraction is a very important design principle which consists in separating the definition or representation of a data type from its use.
Abstract data types - Introduction
©University of Sheffieldcom2010
Goal: designing a system to perform simple arithmetic operations with rational numbers.
We’ll start by representing rationals …
Every rational number r is of the form r = n/d with n the numerator and d the denominator. Will be represented as a pair (n,d).
Thus the type of rational numbers:type Rat = (Int,Int)
What’s next?
Representing Rationals
©University of Sheffieldcom2010
Implement operations
In order to implement addition and multiplication of rational numbers with respect to the usual priority rules of these operations we write:
infixl 7 `rmult`
infixl 6 `radd`
rmult :: Rat -> Rat -> Rat
rmult (n_1,d_1) (n_2,d_2) = (n_1*n_2,d_1*d_2)
radd :: Rat -> Rat -> Rat
radd (n_1,d_1) (n_2,d_2) = (n_1*d_2+n_2*d_1,d_1*d_2)
Now try (2,1) `radd` (2,1) `rmult` (3,2)
Remove priority rules and try it again. What happens??
In general (2,1) `radd` (2,1) `rmult` (3,2) means(2,1) `radd` ((2,1) `rmult` (3,2))
Addition and multiplication
©University of Sheffieldcom2010
Converting integers into rationalsmkrat :: Int -> Int -> Rat
mkrat _ 0 = error "denominator 0"
mkrat n d = (n,d)
Check two rationals are equalinfix 4 `requ`
requ :: Rat -> Rat -> Bool
requ (n_1,d_1) (n_2,d_2) = (n_1*d_2==n_2*d_1)
Inverse of a rationalrinv :: Rat -> Rat
rinv (0,_) = error "no inverse"
rinv (n,d) = (d,n)
More functions
©University of Sheffieldcom2010
A module to define a data type Rat and the operations mkrat, radd, rmult, requ, and rinv will have the following layout:
module Rationals where
type Rat …
mkrat …
This module may also contain functions to subtract and divide rational numbers.
infixl 7 `rdiv`
infixl 6 `rdiff`
rdiv :: Rat -> Rat -> Rat
rdiv x y = x `rmult` (rinv y)
rdiff :: Rat -> Rat -> Rat
rdiff x y=x `radd` (mkrat (-1) 1) `rmult` y
Module Rationals
©University of Sheffieldcom2010
1. rmult and rdiv on the one hand and radd and rdiff on the other hand have the same priority level
2. all the operations radd, rdiff, rmult, rdiv are left associative
3. rdiv and rdiff are defined without referring to the specific representation of the type Rat
Our encoding of rational numbers is not an exact representation:
1. contains improper elements; the pairs (n,0) do not correspond to any rational number and some operations do not care about them (radd, rmult, requ) !
2. the representation is redundant; 1/3 has infinitely many representations, i.e. all the pairs (n, 3*n) !
Observations
©University of Sheffieldcom2010
To remove redundant representatives a function reduce may be used:
reduce :: Rat -> Rat
reduce (_,0) = error "denominator 0"
reduce (x,y) = (x `div` d, y `div` d)
where d= gcd x y
gcd is a built-in function that computes the greatest common divisor for two integers.
The module Rationals contains
the definition of a data type Rat and
the operations mkrat, radd, rmult, requ, rinv, reduce
Remove redundancy
©University of Sheffieldcom2010
Problem: build an application that uses Rationals to compute linear combinations of rationals
Given k integer numbers n_1, … n_k and k rational numbers r_1, … r_k , the following sum n_1 * r_1 +… n_k * r_k
is called a linear combination
Ex: 2*(1/2) + 3*(2/5)
Solution:module Application where
import Rationals
-- Haskell definition for functions
-- providing linear combinations of rationals
Appl: Linear combination of rationals
©University of Sheffieldcom2010
linComb :: [(Int,Rat)] -> Rat
linComb = foldl raddIntRat (mkrat 0 1)
where raddIntRat, given below, adds a rational and the product of an integer with a rational number:
raddIntRat :: Rat -> (Int,Rat) -> Rat
raddIntRat x (n,y) = radd x (rmult (mkrat n 1) y)
For example, the following linear combination, 1*1/2 + 1*1/2 = 1, may be computed as
linComb [(1,(1,2)),(1,(1,2))] ⇒ (1,1) ??
module Application imports Rationals, all the definitions made in this module. The details of all the data types defined in Rat may be used in Application. ((Int, Int) may be used as well) The solution is to treat Rat as an abstract data type
Linear combination functions
©University of Sheffieldcom2010
The Haskell module system allows definitions of data types and functions to be visible or hidden when a module is imported.
A module layout is split down into two parts:
• a visible part that is exported and which gives all the definitions that may be used outside of the module
• a hidden part that implements the types and the functions exported plus some other objects which are not visible
For example in the case of Rationals we may decide to export from it the data type Rat and the operations radd, rdiff, rmult, rdiv, requ, and mkrat.
Haskell modules
©University of Sheffieldcom2010
module Rationals
(Rat, -- data type
radd, -- Rat -> Rat -> Rat
rdiff, -- Rat -> Rat -> Rat
rmult, -- Rat -> Rat -> Rat
rdiv, -- Rat -> Rat -> Rat
requ, -- Rat -> Rat -> Bool
mkrat -- Int -> Int -> Rat
) where
The data type Rat is called an Abstract Data Type. The functions rinv, reduce have not been specified and consequently can not be used outside of Rationals.
If we try to use now rinv in the module Application then the error message
ERROR - Undefined variable "rinv"
Rationals as an ADT
©University of Sheffieldcom2010
Using abstract data types any application may be split down into a visible part (signature or interface) and a hidden part (implementation).
Changing the implementation without effecting the user.
For example Rat may be represented as an algebraic typedata Rat = ConR Int Int
or as a real typetype Rat = Float
If we use for Rat the implementation based on algebraic data types and in the module Application, the function linComb
linComb :: [(Int,Rat)] -> Rat
linComb = foldl raddIntRat (ConR 0 1)
Undefined constructor function “ConR“!! (look back)
Changing the implementation
©University of Sheffieldcom2010
Lazy evaluation
• Infinite computation
• List comprehension
• Regular expressions
Abstract data types
• Divide the system into visible and hidden parts
• Use Haskell modules
Conclusions
©University of Sheffieldcom2010
Available on Monday from
Mole
Lexical analyser, parser, evaluation functions at
http://www.dcs.shef.ac.uk/~marian/
Project
©University of Sheffieldcom2010