hawk presentation
DESCRIPTION
TRANSCRIPT
Introduction
awka generic text processor where
developed in 1977 by Alfred Aho, Peter Weinberger, and BrianKernighan @ Bell Labs
uses AWK as programming language
proceduralinterpreteda program is a series of pattern action pairs
“A file is treated as a sequence of records, and bydefault each line is a record.” - Alfred V. Aho
awk 'BEGIN { print "Hello World!" }'
Why another awk?
avoid the AWK programming languageuse a generic language, not a DSL
procedural (imperative) vs functional programming for streamprocessing
“Whenever faced with a problem, some people say`Lets use AWK.' Now, they have two problems.” - D.Tilbrook
BEGIN{split("a b c c a",a);for(i in a)b[a[i]]=1;r="";for(i in b)r=r" "i;print r}
nub $ words "a b c c a"
Haskell-awk (Hawk)a generic text processor where
the same philosophy of awk!
developed in 2013 by me and Samuel Gélineau, the name is a tributeto awk
uses Haskell as programming language
functional(incrementally) compileda program is a Haskell expression
“A stream is treated as a sequence of records, andby default each line is a record.”
hawk '"Hello World!"'
Why Haskellexpressive, clean and concise
functions as composable building blocks
partial application
point-free style, laziness ...
> filter odd [1,2,3,4][1,3]
> let wordCount = sum . map (length . words) . lines
> :type wordCountwordCount :: String -> Int
> wordCount "1 2 3\n4 5 6\n7 8 9"9
> :type mapmap :: (a -> b) -> [a] -> [b]
> :type notnot :: Bool -> Bool
> :type map notmap not :: [Bool] -> [Bool]
> map not [True,False][False,True]
Hawk
Modesevaluate an expression
apply an expression to the input
map an expression to each record of the input
$ hawk '1'1
$ hawk '[1,2]'12
$ hawk '[[1,2],[3,4]]''1 23 4
$ echo '1\n2\n3' | hawk -a 'L.reverse'321
$ echo '1 2\n3 4' | hawk -m 'L.reverse'2 14 3
IO formatThe input is, by default, a list of list of strings where lines areseparated by \n and words by spaces
Options -d/-D are provided to change delimiters or set them toempty
The output can be any type that instantiate the typeclass Rows
$ echo '1 2\n3 4' | hawk -a 'show'[["1","2"],["3","4"]]
$ echo '1,2;3,4' | hawk -a -d',' -D';' 'show'[["1","2"],["3","4"]]
$ echo '1 2\n3 4' | hawk -a -d'' 'show'["1 2","3 4"]
$ echo '1 2\n3 4' | hawk -a -d'' -D'' 'show'"1 2\n3 4\n"
class (Show a) => Rows a where repr :: ByteString -> a -> [ByteString]
Examplesget all users of a UNIX system
select username and userid
sort by username (instead of pid)
get the number of users using each shell
$ cat /etc/passwd | hawk -d: -m 'L.head'rootdaemon...
$ cat /etc/passwd | hawk -d: -o'\t' -m '\l -> (l !! 0,l !! 2)'root 0daemon 1...
$ cat /etc/passwd | hawk -d: -a 'L.sortBy (compare on L.head)'bin:x:2:2:bin:/bin:/bin/shdaemon:x:1:1:daemon:/usr/sbin:/bin/sh...
> cat /etc/passwd | hawk -ad: 'L.map (L.head &&& L.length) . L.group . L.sort . L.map L.last'/bin/bash:1...
ContextHawk can be customized using files inside the context directory (bydefault ~/.hawk)
The most important file is prelude.hs that contains the "runtimecontext"
for instance, we can add a function for taking elements in aninterval
$ cat ~/.hawk/prelude.hs{-# LANGUAGE ExtendedDefaultRules, OverloadedStrings #-}import Preludeimport qualified Data.ByteString.Lazy.Char8 as Bimport qualified Data.List as L
$ echo 'takeBetween s e = L.take (e - s) . L.drop s' >> ~/.hawk/prelude.hs$ seq 0 100 | hawk -a 'takeBetween 2 4'23
Implementation
Hawk must be fastcache the context
use the timestamp to check if the context is changed since lastruncompile it with ghc
use locks to compile only once when multiple Hawk instancesinstances are running
use ByteString instead of String...
hawk '[1..]' | hawk -a 'L.take 3'
Parse and interpret HaskellHawk combines two Haskell libraries
haskell-src-exts to deal with haskell source code
hint to interpret the user expression
> import Language.Haskell.Exts.Parser> getTopPragmas "{-# LANGUAGE NoImplicitPrelude,OverloadedStrings #-}\n"ParseOk [LanguagePragma (SrcLoc {srcFilename = "unknown.hs", srcLine = 1, srcColumn = 1}) [Ident "NoImplicitPrelude",Ident "OverloadedStrings"]]
> import Language.Haskell.Interpreter> runInterpreter $ setImports ["Data.Int"] >> interpret "1" (as :: Int)Right 1> runInterpreter $ setImports ["Data.Int"] >> interpret "foo" (as :: Int)Left (WontCompile [GhcError {errMsg = "Not in scope: foo'"}])
Thank you!https://github.com/gelisam/hawk