hawk presentation

15
Haskell-awk Haskell text processor for the command- line Mario Pastorelli

Upload: mario-pastorelli

Post on 18-Nov-2014

846 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hawk presentation

Haskell-awkHaskell text processor for the command-

lineMario Pastorelli

Page 2: Hawk presentation

Introduction

Page 3: Hawk presentation

awka generic text processor where

developed in 1977 by Alfred Aho, Peter Weinberger, and BrianKernighan @ Bell Labs

uses AWK as programming language

proceduralinterpreteda program is a series of pattern action pairs

“A file is treated as a sequence of records, and bydefault each line is a record.” - Alfred V. Aho

awk 'BEGIN { print "Hello World!" }'

Page 4: Hawk presentation

Why another awk?

avoid the AWK programming languageuse a generic language, not a DSL

procedural (imperative) vs functional programming for streamprocessing

“Whenever faced with a problem, some people say`Lets use AWK.' Now, they have two problems.” - D.Tilbrook

BEGIN{split("a b c c a",a);for(i in a)b[a[i]]=1;r="";for(i in b)r=r" "i;print r}

nub $ words "a b c c a"

Page 5: Hawk presentation

Haskell-awk (Hawk)a generic text processor where

the same philosophy of awk!

developed in 2013 by me and Samuel Gélineau, the name is a tributeto awk

uses Haskell as programming language

functional(incrementally) compileda program is a Haskell expression

“A stream is treated as a sequence of records, andby default each line is a record.”

hawk '"Hello World!"'

Page 6: Hawk presentation

Why Haskellexpressive, clean and concise

functions as composable building blocks

partial application

point-free style, laziness ...

> filter odd [1,2,3,4][1,3]

> let wordCount = sum . map (length . words) . lines

> :type wordCountwordCount :: String -> Int

> wordCount "1 2 3\n4 5 6\n7 8 9"9

> :type mapmap :: (a -> b) -> [a] -> [b]

> :type notnot :: Bool -> Bool

> :type map notmap not :: [Bool] -> [Bool]

> map not [True,False][False,True]

Page 7: Hawk presentation

Hawk

Page 8: Hawk presentation

Modesevaluate an expression

apply an expression to the input

map an expression to each record of the input

$ hawk '1'1

$ hawk '[1,2]'12

$ hawk '[[1,2],[3,4]]''1 23 4

$ echo '1\n2\n3' | hawk -a 'L.reverse'321

$ echo '1 2\n3 4' | hawk -m 'L.reverse'2 14 3

Page 9: Hawk presentation

IO formatThe input is, by default, a list of list of strings where lines areseparated by \n and words by spaces

Options -d/-D are provided to change delimiters or set them toempty

The output can be any type that instantiate the typeclass Rows

$ echo '1 2\n3 4' | hawk -a 'show'[["1","2"],["3","4"]]

$ echo '1,2;3,4' | hawk -a -d',' -D';' 'show'[["1","2"],["3","4"]]

$ echo '1 2\n3 4' | hawk -a -d'' 'show'["1 2","3 4"]

$ echo '1 2\n3 4' | hawk -a -d'' -D'' 'show'"1 2\n3 4\n"

class (Show a) => Rows a where repr :: ByteString -> a -> [ByteString]

Page 10: Hawk presentation

Examplesget all users of a UNIX system

select username and userid

sort by username (instead of pid)

get the number of users using each shell

$ cat /etc/passwd | hawk -d: -m 'L.head'rootdaemon...

$ cat /etc/passwd | hawk -d: -o'\t' -m '\l -> (l !! 0,l !! 2)'root 0daemon 1...

$ cat /etc/passwd | hawk -d: -a 'L.sortBy (compare on L.head)'bin:x:2:2:bin:/bin:/bin/shdaemon:x:1:1:daemon:/usr/sbin:/bin/sh...

> cat /etc/passwd | hawk -ad: 'L.map (L.head &&& L.length) . L.group . L.sort . L.map L.last'/bin/bash:1...

Page 11: Hawk presentation

ContextHawk can be customized using files inside the context directory (bydefault ~/.hawk)

The most important file is prelude.hs that contains the "runtimecontext"

for instance, we can add a function for taking elements in aninterval

$ cat ~/.hawk/prelude.hs{-# LANGUAGE ExtendedDefaultRules, OverloadedStrings #-}import Preludeimport qualified Data.ByteString.Lazy.Char8 as Bimport qualified Data.List as L

$ echo 'takeBetween s e = L.take (e - s) . L.drop s' >> ~/.hawk/prelude.hs$ seq 0 100 | hawk -a 'takeBetween 2 4'23

Page 12: Hawk presentation

Implementation

Page 13: Hawk presentation

Hawk must be fastcache the context

use the timestamp to check if the context is changed since lastruncompile it with ghc

use locks to compile only once when multiple Hawk instancesinstances are running

use ByteString instead of String...

hawk '[1..]' | hawk -a 'L.take 3'

Page 14: Hawk presentation

Parse and interpret HaskellHawk combines two Haskell libraries

haskell-src-exts to deal with haskell source code

hint to interpret the user expression

> import Language.Haskell.Exts.Parser> getTopPragmas "{-# LANGUAGE NoImplicitPrelude,OverloadedStrings #-}\n"ParseOk [LanguagePragma (SrcLoc {srcFilename = "unknown.hs", srcLine = 1, srcColumn = 1}) [Ident "NoImplicitPrelude",Ident "OverloadedStrings"]]

> import Language.Haskell.Interpreter> runInterpreter $ setImports ["Data.Int"] >> interpret "1" (as :: Int)Right 1> runInterpreter $ setImports ["Data.Int"] >> interpret "foo" (as :: Int)Left (WontCompile [GhcError {errMsg = "Not in scope: foo'"}])

Page 15: Hawk presentation

Thank you!https://github.com/gelisam/hawk