the guile 100 programs projectguile can be used as a scripting language. programs can be written as...
TRANSCRIPT
The Guile 100 Programs Project0.6 - Apr 22, 2013
Edited by Michael Gran
c© 2013 by Michael Gran100 Guile ProgramsThis work is licensed under GFDL 1.3+(GFDL 1.3+).A Lonely Cactus ProductionLos Angeles, California
i
Short Contents
1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Theme 1: “/bin” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Theme 2: Web 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A Other Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
ii
Table of Contents
1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Theme 1: “/bin” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.1 Problem 1: Echo and Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1.1 Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.1.2 Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Problem 2: ‘ls’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.1 An Implementation of ‘ls’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Problem 3: LZW Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4 Problem 4: tar file archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1 ustar Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.2 The rustar File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Theme 2: Web 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.1 Problem 5: PHP-Style GUILE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Problem 6: MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.3 Problem 7: Animated GIF Badges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Appendix A Other Examples . . . . . . . . . . . . . . . . . . 48A.1 ustar Archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1
1 Preface
his book aspires to be a useful set of examples about how one might use GNU Guile.One of the interesting things about the Scheme community is that they are perhaps too
clever. The depth and complexity of their thinking about computer languages is intenseand wonderful.
And yet, some times you just want to do something mundane. Where are the resourcesfor how to use Scheme – and specifically Guile – for quotidian tasks?
Well, this document will be it, if all goes according to plan.
2
2 Acknowledgements
hanks the many people who have helped us develop this book.• Chris K Jester-Young contributed the original version of the echo and cat scripts for
Problem 1.• Jez Ng contributed the original version of ls for Problem 2. He also contributed an
example ustar generation script for Problem 4.• Daniel Harwig contributed the LZW compression routines for Problem 3.• Mark Weaver contributed a feature complete ustar generation script for Problem 4.
3
3 Theme 1: “/bin”
very project has to start somewhere, so we may as well begin at the beginning.Guile can be used as a scripting language. Programs can be written as plain text files,
and then run from the command line by using the Guile interpreter. As such, most scriptsrun on Unix-like shells will begin with a sha-bang #! invocation. And most scripts muststart off doing the same chores: parsing the command line, acting on the options, andfinding the files whose names appeared in the command-line arguments.
To introduce these mundane concepts, our first theme is /bin, e.g. re-implementing somecommon Unix tools. This will get us warmed up.
These examples should demonstrate• How to set up the sha-bang invocation for Guile scripts run from Unix shells.• How to handle command line arguments• How to map file names given as command line arguments to their files• How to search for files and directories• How to open files, both as binary data and as encoded text data
To demonstrate some of these concepts, in the following sections you will find echoscript that prints out its own arguments; cat which concatenates files or standard inputto the standard output; ls which lists the files in a directory. There is also compress anduncompress which perform LZW compression on a file. And lastly there are scripts togenerate tar-conformant archives.
And so, without further ado, here are the examples.
3.1 Problem 1: Echo and Cat
In this problem, two venerable Unix commands are re-implemented in Scheme: echo andcat. echo prints out the command-line arguments, and cat prints a file to the terminal.
In this problem, like in many of the problems, we’ll lay out the requirements for aprogram, and then see how our volunteer implemented the requirements. For the purposeof this exercise, the requirements for echo and cat with be drawn from the Posix standard1,with a couple of minor modifications. Since these commands are implemented in differentways on different systems, a specification is given for the versions implemented here.
3.1.1 Echo
The echo script writes its arguments to the standard output, followed by a <newline>. Ifthere are no arguments, it just prints a <newline>.
echo has no command-line options. Even ‘--help’ and ‘--version’ are not treated ascommand-line options.
If any of the arguments contain the backslash character (\), the argument is modified.Backslash introduces an escape. These escapes are parsed from logical left to right.
1 [IEEE 2004], page 56
4
\a Write an <alert> in place of \a.
\b Write a <backspace> in place of \b.
\c Suppress the <newline> that would otherwise be written after the command-linearguments. The \c is not written, any remaining characters in this argumentare not written, and any remaining arguments are not written.
\f Write a <form-feed> in place of \f.
\n Write a <newline> in place of \n.
\r Write a <carriage-return> in place of \r.
\t Write a <tab> in place of \t
\v Write a <vertical-tab> in place of \v.
\\ Write a single backslash character in place of the pair of backslash characters.
\0num Write an 8-bit character corresponding to num, an octal number between octal0 and octal 377 (decimal 255) inclusive.
A backslash at the end of a command line argument will not be escaped. The backslashwill be written. However, the exit value will be 1 in this case.
A backslash followed by any other character not listed in the table, will will not beescaped. The backslash will be written, and the character that follows it will be written.However, in this case, the exit value will be 1.
For the octal escape \0, it is important to note that this value is not an ISO-8859-xposition or a Unicode code point, but, rather a raw 8-bit byte to be sent unencoded to thestandard output. It is up to the operator, not echo, to ensure that a character sequencethat is valid for the environments locale is being sent.
If a \0 escape is present, but is not followed by an number, the raw byte zero is written.If a \0 escape is present and is followed by an octal number of greater than 3 digits, only
the first 3 digits will be interpreted as being part of the escape.If a \0 escape is present and its octal value is greater than 377, print nothing. In this
case, the exit value will be 1.An octal escape may not have unnecessary initial zeros. For example• \01 should output raw byte 1• \001 should output raw byte zero followed by the string “01”• \0001 should output raw byte zero followed by the string “001”
The digits 8 and 9 are not part of an octal escape. For example, the string \018 shallbe output as the raw byte 1 followed by the character for the numeral 8.
Remember that command-line arguments and file names may contain any characterallowed by the current locale.
In all other cases, the exit value will be zero.
5
An implementation of ‘echo’
Chris K Jester-Young wrote the original solution to this problem.
#!/usr/bin/guile \-e main -s!#
(use-modules (ice-9 binary-ports))
;; The exit code for the program: #t == exit code 0, #f == exit code 1(define status #t)
(define (main args)(setlocale LC_ALL "")
;; Recursively loop over the list of command-line arguments(let loop ((args (cdr args))
(first-arg #t))(cond ((null? args)
(newline)(quit status))(else(unless first-arg
(write-char #\space))(let ((arg (car args)))
;; Take the current command-line argument and create a;; port from that argument. Pass that port as input to;; the procedure ‘initial’.(call-with-input-string arg initial)(loop (cdr args) #f))))))
;; ‘initial’ and ‘echo’ jointly form a recursive loop that reads;; characters one-by-one from the port and writes them to stdout.;; Backslash may introduce a string escape that needs special;; processing.(define (echo ch port)(write-char ch)(initial port))
(define (initial port)(define ch (read-char port))(cond ((eqv? ch #\\)
(backslash port))((not (eof-object? ch))(echo ch port))))
;; Special handling of backslash escape sequences
6
(define (backslash port)(define ch (read-char port))(case ch
((#\a) (echo #\alarm port))((#\b) (echo #\backspace port))((#\c) (quit status))((#\f) (echo #\page port))((#\n) (echo #\newline port))((#\r) (echo #\return port))((#\t) (echo #\tab port))((#\v) (echo #\vtab port))((#\\) (echo #\\ port))((#\0) (let ((next (peek-char port)))
(if (and (assv next octal-digits)(not (char=? next #\0)))
(octal port)(echo #\nul port))))
(else (set! status #f)(write-char #\\)(unless (eof-object? ch)
(unread-char ch port)(initial port)))))
;; Backslash 0 introduces the octal escape. Zero to three octal;; numbers are read and output as a raw (not locale encoded) byte.(define (octal port)(let loop ((value 0)
(waiting 3))(cond ((zero? waiting)
(if (< value 256)(put-u8 (current-output-port) value)(set! status #f))
(initial port))(else (let ((ch (read-char port)))
(cond ((eof-object? ch)(loop value 0))((assv ch octal-digits)=> (lambda (ass)
(loop (+ (* value 8) (cdr ass))(1- waiting))))
(else(unread-char ch port)(loop value 0))))))))
(define octal-digits’((#\0 . 0) (#\1 . 1) (#\2 . 2) (#\3 . 3)
(#\4 . 4) (#\5 . 5) (#\6 . 6) (#\7 . 7)))
7
3.1.2 Cat
Again, since cat is implemented differently on different systems, a specification of what wewere trying to accomplish is given here.
cat [OPTION]... [FILE]...cat concatenates files or standard input and prints it to the standard output.This version of cat supports three command-line options, each with a short and a long
form.
‘-u --unbuffered’Do no buffering. Write bytes from the input to the standard output withoutdelay as each character is read.
‘-h --help’Print out command help.
‘-v --version’Print out the program name and version number.
After the command-line options, a list of file names is expected. The contents of thefiles are printed to standard output. No character encoding or decoding of the contents ofthe files should be performed: they should be transmitted unmodified.
If the special file name ‘-’ (hyphen) is given, at that point the contents of the standardinput will be transmitted to the standard output.
If one of the files does not exist, or if it cannot be opened, the program will print adescriptive error message to the standard error and will return the exit code 1.
Otherwise, the exit code is zero.
8
An implementation of cat
Chris K Jester-Young wrote the original solution for cat as well. One interesting thing tonote in this example is the use of catch to catch system errors that may arise if files do notexist or cannot be opened.
#!/usr/bin/guile \-e main -s!#
(use-modules (srfi srfi-1)(ice-9 binary-ports)(ice-9 format)(ice-9 getopt-long))
;; The exit code of the script: #t == exit code 0, #f == 1(define status #t)
(define (main args)(define opts (getopt-long args (get-getopt-options)));; Handle the unbuffered flag(when (assq ’unbuffered opts)
(setvbuf (current-output-port) _IONBF))(let ((files (assq-ref opts ’())))
(if (null? files)(cat)(for-each (lambda (file)
;; If a filename is "-" get text from stdin(if (string=? file "-")
(cat)(cat file)))
files))(catch ’system-error force-output write-error-handler)(quit status)))
(define cat(case-lambda;; When called with no arguments, get data from stdin(()(catch ’system-error cat-port (read-error-handler "stdin")));; When called with one argument, read data from a file((file)(catch ’system-error
(lambda () call-with-input-file file cat-port)(read-error-handler file)))))
(define* (cat-port #:optional (in (current-input-port))(out (current-output-port)))
9
(define bv (get-bytevector-some in))(unless (eof-object? bv)
(catch ’system-error (lambda () put-bytevector out bv) write-error-handler)(cat-port in out)))
;; An error handler that catches system errors receives a list;; containing the errno.(define (read-error-handler label)(lambda args
(perror label (system-error-errno args))(set! status #f)))
(define (write-error-handler . args)(perror "write error" (system-error-errno args));; Don’t try to flush buffers at exit, since it’d obviously fail.(primitive-_exit 1))
(define (perror label errno)(format (current-error-port) "cat: ~a: ~a~%" label (strerror errno)))
(define (help _)(display "Usage: cat [OPTION]... [FILE]...\n")(display "Concatenate FILE(s), or standard input, to standard output.\n")(newline)(for-each (lambda (option)
(format #t " -~a, --~16a ~a~%"(cadr (assq ’single-char (cdr option)))(car option)(cadr (assq ’description (cdr option)))))
getopt-options)(quit))
(define (version _)(display "cat 0.1, for Guile100\n")(quit))
(define (get-getopt-options);; getopt-long doesn’t like extraneous option properties, so filter out(map (lambda (option)
(remove (lambda (prop)(and (pair? prop) (eq? (car prop) ’description)))
option))getopt-options))
;; Here is a list of all the command-line options(define getopt-options‘((unbuffered (single-char #\u) (value #f)
10
(description "do not buffer standard output"))(help (single-char #\h) (value #f) (predicate ,help)
(description "display this help and exit"))(version (single-char #\v) (value #f) (predicate ,version)
(description "output version information and exit"))))
11
3.2 Problem 2: ‘ls’
In this section, we investigate the most famous Unix command of all time: ls. ls lists filesor directories, and displays their properties.
However, ls has accumulated dozens of options over the past decades. A feature-complete ls would be too long to make a usable example. So, this script is constrained tothe most important command-line options.
The command ls lists information about files, directories, and the contents of directories.Basically, for this challenge, the script should operate like a limited functionality version ofPosix ls1.
The Requirements for a Limited ls
This script only recognizes a limited set of command-line options:
• ‘-a’ - display all matching files, including those whose name begins with a period
• ‘-l’ - use the long output format
• ‘-R’ - recursively descend into subdirectories
Any other command-line arguments that begin with a hyphen should cause an “invalidoption” error, and the program will be terminated with a non-zero exit code.
The command-line option ‘-R’ will recursively print the contents of any subdirectoryencountered.
The command-line option ‘-l’ has two effects. One, information about the files will beprinted in the long format. Two, when given a symbolic link to a directory, the commandwill print information about the symbolic link itself and not the file or directory to whichit points.
Operands
If a command-line argument does not begin with a hyphen, it is treated as an operand.
When called without operands, the contents of the current directory are printed.
Operands must be either the names of files, directories, or symbolic links. When anoperand that is not one of the above is encountered, the script should print a descriptiveerror and exit with a non-zero return code.
If an operand is a file, ls will print the name of the file. If an operand is a symbolic linkto a file, the command will print the name of the link. If an operand is a directory, ls willprint out the contents of that directory. If an operand is a symbolic link to a directory, lswill print the contents of that directory, unless the ‘-l’ is given.
When printing the contents of a directory, files and directories that begin with <period>are usually not printed. If the command-line option ‘-a’ is given, files and directories thatbegin with <period> are printed.
1 The Posix spec for ls
12
Output
There are two output formats: the default format and the long format.
Within each directory, the files are sorted in case-insensitive alphabetical order accordingto the current locale.
In the default format, the filenames are output one per line. You can print them out ina columnar format if you like, though.
In the long format, the file information will be printed as follows
Field Length DescriptionType 1 ‘d’ for directory
‘-’ for regular file‘b’ for block special file‘l’ for symbolic link‘c’ for character special file‘p’ for fifo
User Read 1 ‘r’ if readable by the owner‘-’ otherwise
User Write 1 ‘w’ if twritable by the owner‘-’ otherwise
User Execute 1 ‘S’ if the file is not executable and the set-user-ID mode isset‘s’ if the file is executable and the set-user-ID mode is set‘x’ if the file is executable or the directory is searchable bythe owner‘-’ otherwise
Group Read 1 ‘r’ if readable by the group‘-’ otherwise
Group Write 1 ‘w’ if writable by the group‘-’ otherwise
Group Execute 1 ‘S’ if the file is not executable and the set-group-ID modeis set‘s’ if the file is executable and the set-group-ID mode is set‘x’ if the file is exectuable or the directory is searchable bymembers of this group‘-’ otherwise
Other Read 1 ‘r’ if readable by others‘-’ otherwise
Other Write 1 ‘w’ if writable by others‘-’ otherwise
13
Other Execute 1 +space
‘T’ if the file is a directory and the search permission is notgranted to others and the restricted deletion flag is set‘t’ if the file is a directory and the search permission isgranted to others and the restricted deletion flag is set‘x’ if the file is executable or the directory is searchable byothers‘-’ otherwise
Link Count For a directory, number of immediate subdirectories it hasplus one for itself plus one for its parent. The link count fora file is one.
Owner NameGroup NameFile Size in bytesDate & Time “month day hour:sec” format if the file has been modified in
the last six months, or “month day year” format otherwisePathname For non-links, the path
For links, “<link name> -> <path to linked file or directory>”The exit code should be zero except in those error cases described above.For more information about ls, you can consult The Open Group Base Specifications
Issue 6, or the documentation of any BSD or GNU version of ls.
14
3.2.1 An Implementation of ‘ls’
Jez Ng contributed a script to these specifications. It is an interesting solution.One thing to note is how he has decided to truly minimize the scope of the procedures
by declaring procedures within procedures.Unsurprisingly, the majority of the script involves getting the format right for long
output.#! /usr/local/bin/guile -s!#
;; A solution to Guile 100 Problem #2 ‘ls’;; Contributed by Jez Ng.
(use-modules (srfi srfi-1) ; fold, map etc(srfi srfi-26) ; cut (partial application)(srfi srfi-37) ; args-fold(ice-9 ftw)(ice-9 format)(ice-9 i18n))
(define perror (cut format (current-error-port) <...>))
(define (default-printer path st . rest)(format #t "~a~%" (basename path)))
(define* (long-printer path st #:optional(max-nlinks 0) (max-size 0)(max-uname-length 0) (max-groupname-length 0))
(let*((bits-set?
(lambda (bits . masks)(let ((mask (apply logior masks)))(= mask (logand bits mask)))))
(permission-string(lambda (perms)(let* ((setuid-bit #o4000)
(setgid-bit #o2000)(sticky-bit #o1000)(owner-read-bit #o400)(owner-write-bit #o200)(owner-exec-bit #o100)(group-read-bit #o40)(group-write-bit #o20)(group-exec-bit #o10)(other-read-bit #o4)(other-write-bit #o2)(other-exec-bit #o1)
15
(rwx-letter (lambda (bit letter)(if (bits-set? perms bit) letter #\-)))
(setid-letter (lambda (exec-bit setid-bit letter)(cond ((bits-set? perms exec-bit setid-bit) letter)
((bits-set? perms setid-bit)(char-downcase letter))(else (rwx-letter exec-bit #\x))))))
(string (rwx-letter owner-read-bit #\r)(rwx-letter owner-write-bit #\w)(setid-letter owner-exec-bit setuid-bit #\S)(rwx-letter group-read-bit #\r)(rwx-letter group-write-bit #\w)(setid-letter group-exec-bit setgid-bit #\S)(rwx-letter other-read-bit #\r)(rwx-letter other-write-bit #\w)(setid-letter other-exec-bit sticky-bit #\T)))))
(format-time(lambda (time)(if (and (<= time (current-time))
(< (- (current-time) time) (* 3600 24 30 6)))(strftime "%b %e %H:%M" (localtime time))(strftime "%b %e %_5Y" (localtime time)))))
(type (case (stat:type st)((directory) #\d)((regular) #\-)((symlink) #\l)((block-special) #\b)((char-special) #\c)((fifo) #\p)(else #\?)))
(digits (lambda (n) (if (= n 0) 1 (1+ (inexact->exact (ceiling (log10 n))))))))(format #t "~a~a ~vd ~va ~va ~vd ~a ~a\n"
type(permission-string (stat:perms st))(digits max-nlinks) (stat:nlink st)max-uname-length (passwd:name (getpwuid (stat:uid st)))max-groupname-length (group:name (getgrgid (stat:gid st)))(digits max-size) (stat:size st)(format-time (stat:mtime st))(if (char=? type #\l)
(format #f "~a -> ~a" path (readlink path))(basename path)))))
(define (ls-dir dir-name dir-stat recursive? all? print-header? printer)(let* ((not-hidden? (lambda (name) (not (string-prefix? "." name))))
(enter? (lambda (path st)(or (and (or all? (not-hidden? (basename path))) recursive?)
16
(= (stat:ino st) (stat:ino dir-stat))))))(let recurse ((tree (file-system-tree dir-name enter?))
(parent-path ‘(,(dirname dir-name)))(top-level? #t))
;; ‘file-system-tree’ returns a structure of the form;; (string basename, object stat, tree children)(let* ((path (cons (car tree) parent-path))
(path-string (string-join (reverse path) file-name-separator-string))(children(filter(lambda (tree) (or all? (not-hidden? (car tree))))(sort (let ((current-dir-path (in-vicinity path-string "."))
(parent-dir-path (in-vicinity path-string "..")))(cons (list current-dir-path (lstat current-dir-path))
(cons (list parent-dir-path (lstat parent-dir-path))(cddr tree))))
(lambda (a b) (string-locale-ci<? (car a) (car b))))));; ‘max’ throws an error if called without arguments;;; ‘max-above-0’ just returns 0(max-above-0 (lambda args (apply max (cons 0 args))))(stats (map cadr children))(max-nlinks (apply max-above-0 (map stat:nlink stats)))(max-size (apply max-above-0 (map stat:size stats)))(max-uname-length(apply max-above-0 (map (compose string-length passwd:name
getpwuid stat:uid) stats)))(max-groupname-length(apply max-above-0 (map (compose string-length group:name
getgrgid stat:gid) stats))))(if (or (not top-level?) print-header?) (format #t "~a:~%" path-string))(for-each (lambda (child)
(printer(in-vicinity path-string (car child))(cadr child)max-nlinks max-size max-uname-length max-groupname-length))
children)(if recursive?
(for-each (lambda (child)(if (and (eq? (stat:type (cadr child)) ’directory)
(not (or (equal? (basename (car child)) ".")(equal? (basename (car child)) ".."))))
(recurse child path #f)))children))))))
(let* ((program-name (car (program-arguments)))(make-bool-option(lambda (opt-name flag)
17
(option ‘(,flag) #f #f (lambda (opt name arg result)(acons opt-name #t result)))))
;; ‘getopt-long’ requires the long option name to be provided,;; but the real ‘ls’ does not use long names. srfi-37 does not;; have this restriction, so we use it instead.(args (args-fold
(cdr (program-arguments))(map make-bool-option ’(all? recursive? long?) ’(#\a #\R #\l))(lambda (opt name arg result)(perror "~a: illegal option -- ~a~%" program-name name)(perror "usage: ~a [-alR] [file ...]~%" program-name)(exit 1))
(lambda (opt result) (assq-set! result’paths(cons opt (assq-ref result ’paths))))
’((paths))))(paths (if (null? (assq-ref args ’paths)) ’(".") (assq-ref args ’paths)))(printer (if (assq-ref args ’long?) long-printer default-printer))(ls-dir-cut (cut ls-dir <> <>
(assq-ref args ’recursive?) (assq-ref args ’all?)(> (length paths) 1)printer))
(exit-code 0))(for-each(lambda (path)(catch ’system-error
(lambda ()(let ((st (lstat path)))(case (stat:type st)
((directory) (ls-dir-cut path st))((symlink) (if (assq-ref args ’long?)
(printer path st)(ls-dir-cut(let ((linked-path (readlink path)))(if (absolute-file-name? linked-path)
linked-path(in-vicinity (dirname path)
linked-path)))(stat path))))
(else (printer path st)))))(lambda args
(perror "~a: ~a: ~a~%"program-name path (strerror (system-error-errno args)))
(set! exit-code 1)))) paths)(exit exit-code))
18
3.3 Problem 3: LZW Compression
Good old LZW compression: a nice problem in every CompSci’s undergraduate classes.Lempel-Ziv-Welch compression is the basis of both the UNIX Compress program and ofGIF encoding.
The only problem with LZW is that it doesn’t actually to a very good job at compression,but, it is has an interesting logic and is familiar enough that it makes a good example.
This task has two parts.• Write ‘compress’ and ‘uncompress’ procedures for LZW compression.• Use them to make ‘compress’ and ‘uncompress’ scripts.
First up are the compression procedures.
lzw-compress and lzw-uncompress
[Guile Procedure]lzw-compress input-bv #:key table-size dictionaryThis procedure should take a bytevector presumed to contain 8-bit unsigned integers,and it should return a bytevector containing 16-bit unsigned integers in little-endianformat.input-bv is the input bytevector.table-size is an optional parameter that indicates the maximum number of entries inthe dictionary. This parameter is limited to the range 258 - 65536. The default valueof table-size is 65536.dictionary is an optional parameter that modifies the output. When true, the proce-dure shall return both the output 16-bit bytevector as well as the hash table createdby the compression routine that maps indices to codes.
Probably the best writup on LZW compression is the one by Mark Nelson over athttp://marknelson.us/2011/11/08/lzw-revisited/. Refer to that article for details onLZW compression.
It is possible to fill up the dictionary. In that case, one continues to use the dictionaryas it is, without adding new entries.
As I’ve noted, we’re focussing on the problem of encoding 8-bit binary data. Thus,the first 256 entries in the dictionary – entries #0 to #255 – are initialized to 0 to 255.Entry #256 is not used in this example, but, it is usually reserved for a special code thatempties the dictionary. Entries #257 to #(table-size - 1) contain the multi-byte entries inthe dictionary.
[Guile Procedure]lzw-uncompress input-bv #:key table-size dictionarySimilarly, this procedure takes input-bv the bytevector created by compress and anoptional table size and returns the 8-bit unsigned bytevector of uncompressed data.dictionary, when true, causes the procedure to also return its dictionary or hash table.
Daniel Hartwig contributed an implementation of these compression routines.There are a couple of interesting techniques of which to take note. First, if you C pro-
grammers have ever wondered how to create a static variable in a function, make-serial-number-generator show the Scheme analog of that technique.
19
;; Copyright (C) 2013 Daniel Hartwig <[email protected]>;;;; This program is free software: you can redistribute it and/or modify;; it under the terms of the GNU General Public License as published by;; the Free Software Foundation, either version 3 of the License, or;; (at your option) any later version.;;;; This program is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;; GNU General Public License for more details.;;;; You should have received a copy of the GNU General Public License;; along with this program. If not, see <http://www.gnu.org/licenses/>.
(define-module (lzw)#:use-module (rnrs bytevectors)#:use-module (rnrs io ports)#:use-module (srfi srfi-1)#:use-module (srfi srfi-26)#:use-module (ice-9 receive)#:export (lzw-compress
lzw-uncompress%lzw-compress%lzw-uncompress))
;; This procedure adapted from an example in the Guile Reference;; Manual.(define (make-serial-number-generator start end)(let ((current-serial-number (- start 1)))
(lambda ()(and (< current-serial-number end)
(set! current-serial-number (+ current-serial-number 1))current-serial-number))))
(define (put-u16 port k);; Little endian.(put-u8 port (logand k #xFF))(put-u8 port (logand (ash k -8) #xFF)))
(define (get-u16 port);; Little endian. Order of evaluation is important, use ’let*’.(let* ((a (get-u8 port))
(b (get-u8 port)))(if (any eof-object? (list a b))
(eof-object)(logior a (ash b 8)))))
20
(define (%lzw-compress in out done? table-size)(let ((codes (make-hash-table table-size))
(next-code (make-serial-number-generator 0 table-size))(universe (iota 256))(eof-code #f))
;; Populate the initial dictionary with all one-element strings;; from the universe.(for-each (lambda (obj)
(hash-set! codes (list obj) (next-code)))universe)
(set! eof-code (next-code))(let loop ((cs ’()))(let ((c (in)))(cond ((done? c)
(unless (null? cs)(out (hash-ref codes cs)))
(out eof-code)(values codes))((hash-ref codes (cons c cs))(loop (cons c cs)))(else(and=> (next-code)
(cut hash-set! codes (cons c cs) <>))(out (hash-ref codes cs))(loop (cons c ’()))))))))
(define (ensure-bv-input-port bv-or-port)(cond ((port? bv-or-port)
bv-or-port)((bytevector? bv-or-port)(open-bytevector-input-port bv-or-port))(else(scm-error ’wrong-type-arg "ensure-bv-input-port"
"Wrong type argument in position ~a: ~s"(list 1 bv-or-port) (list bv-or-port)))))
(define (for-each-right proc lst)(let loop ((lst lst))
(unless (null? lst)(loop (cdr lst))(proc (car lst)))))
(define (open-bit-output-port bits-per-entry)(let ((current 0)
(location 0))(call-with-values
21
(lambda ()(open-bytevector-output-port))
(lambda (port get-bytevector)(let ((write-to-bv (lambda (val)
;; (format #t "Entering write-to-bv: current ~a location ~a val ~a bpe ~a~%" current location val bits-per-entry)(set! current (logior current (ash val location)))(set! location (+ location bits-per-entry))(while (> location 8)
;; (format #t "Writing ~a~%" (logand current #xff))(put-u8 port (logand current #xff))(set! current (ash current -8))(set! location (- location 8)))
;; (format #t "Leaving write-to-bv: current ~a location ~a~%" current location)))
(get-bv (lambda ()(put-u8 port current)(get-bytevector))))
(values write-to-bv get-bv))))))
(define (open-bit-input-port bv bits-per-entry)(let ((current 0)
(location 0)(eof #f))
(call-with-values(lambda ()(open-bytevector-input-port bv))
(lambda (port);; Return the read procedure, which begins here(lambda ();; (format #t "Entering read-from-bv: current ~x location ~a~%" current location)(let loop ((u8 (get-u8 port)));; (format #t "Read ~a~%" u8)(if (eof-object? u8)
(if (> location 0)(begin
(let ((output (bit-extract current 0 bits-per-entry)))(set! current (ash current (- bits-per-entry)))(set! location (- location bits-per-entry));; (format #t "EOF Leaving read-from-bv: current ~x location ~a output ~x~%" current location output)output))
(begin;; (format #t "EOF Leaving read-from-bv: <eof>~%")(eof-object)))
;; else(begin
(set! current (logior current (ash u8 location)))
22
(set! location (+ location 8))(if (< location bits-per-entry)
(begin;; (format #t "Looping in read-from-bv: current ~x location ~a~%" current location)(loop (get-u8 port)))
;; else(let ((output (bit-extract current 0 bits-per-entry)))(set! current (ash current (- bits-per-entry)))(set! location (- location bits-per-entry));; (format #t "Leaving read-from-bv: current ~x location ~a output ~x~%" current location output)output))))))))))
#!(lambda ()(format #t "Entering read-from-bv: current ~x location ~a~%" current location)(if eof
(eof-object);;else(begin(while (< location bits-per-entry)
(format #t "Looping in read-from-bv: current ~x location ~a~%" current location)(let ((u8 (get-u8 port)))(format #t "Read ~a~%" u8)(if (eof-object? u8)
(begin(set! eof #t)(break))
;; else(begin(set! current (logior current (ash u8 location)))(set! location (+ location 8))))))
(format #t "After loop in read-from-bv: current ~x location ~a~%" current location)(let ((output (bit-extract current 0 bits-per-entry)))
(set! current (ash current (- bits-per-entry)))(set! location (- location bits-per-entry))(format #t "Leaving read-from-bv: current ~x location ~a output ~x~%" current location output)output))))))))
!#
(define (%lzw-uncompress in out done? table-size)(let ((strings (make-hash-table table-size))
(next-code (make-serial-number-generator 0 table-size))(universe (iota 256))(eof-code #f))
(for-each (lambda (obj)(hash-set! strings (next-code) (list obj)))
universe)
23
(set! eof-code (next-code))(let loop ((previous-string ’()))(let ((code (in)))(unless (or (done? code)
(= code eof-code))(unless (hash-ref strings code)
(hash-set! stringscode(cons (last previous-string) previous-string)))
(for-each-right out(hash-ref strings code))
(let ((cs (hash-ref strings code)))(and=> (and (not (null? previous-string))
(next-code))(cut hash-set! strings <> (cons (last cs)
previous-string)))(loop cs)))))))
(define (lzw-compress-inner bv table-size dictionary)(call-with-values
(lambda ()(open-bytevector-output-port))
(lambda (output-port get-result)(let ((dict (%lzw-compress (cute get-u8 (ensure-bv-input-port bv))
(cute put-u16 output-port <>)eof-object?table-size)))
(if dictionary(values (get-result) dict)(get-result))))))
(define* (lzw-compress bv #:key (table-size 65536) dictionary)(let ((bv (lzw-compress-inner bv table-size dictionary)))
(receive (write-to-bv get-bv)(open-bit-output-port (integer-length (1- table-size)));; (write (bytevector->uint-list bv (endianness little) 2)) (newline)(for-each write-to-bv (bytevector->uint-list bv (endianness little) 2))(get-bv))))
(define* (lzw-uncompress-inner bv table-size dictionary)(format #t "lzw-uncompress: table-size ~a~%" table-size)(call-with-values
(lambda ()(open-bytevector-output-port))
(lambda (output-port get-result)(let ((dict (%lzw-uncompress (cute get-u16 (open-bytevector-input-port bv))
24
(cute put-u8 output-port <>)eof-object?table-size)))
(if dictionary(values (get-result) dict)(get-result))))))
(define* (lzw-uncompress bv #:key (table-size 65536) dictionary)(let* ((get-val (open-bit-input-port bv (integer-length (1- table-size))))
(u16lst (let loop ((x (get-val))(lst ’()))
(if (eof-object? x)lst(loop (get-val) (append lst (list x)))))))
(lzw-uncompress-inner (uint-list->bytevector u16lst (endianness little) 2) table-size dictionary)))
25
The ‘compress’ and ‘uncompress’ scripts
Once the procedures are working, it is a simple task to write scripts that use them. Sowe’ll write scripts that are simplified versions Unix commands ‘compress’ and ‘uncompress’.These scripts will manipulate files with the following format.
Each file will begin with a 3 byte header.• Byte 1: #x1F• Byte 2: #x9D• Byte 3: Dictionary size, given as an 8-bit unsigned number between 9 and 16 inclusive.
The number indicates a dictionary size from between 2^9 and 2^16.
The rest of the file is the LZW-compressed 16-bit binary data stored in little-endianformat.
Note that this will not be compatible with your operating system’s version of compress.The compress file format is not consistent across platforms. Every current implementationof compress adds more functionality to squeeze more compression out of the vanilla LZWalgorithm.
compress [-v] [-b bits] [name ...]
For each filename, compress, will create a LZW-compressed version of an input file.The compressed file will have the same filename as the input file with the ".Z" extensionappended to it. If the compression is successful and the output file is successfully written,the input file will be deleted.
If no filenames are given, compress will take the contents of stdin and send the com-pressed data to stdout.
The optional ‘-b’ bits parameter will indicate the maximum size of the dictionary. Ifbits is given, it must be between 9 and 16, indicating maximum dictionary sizes of 2^bits.
If the optional ‘-v’ parameter is given, the script should print to stdout the compressionratio for each file processed. If no file was specified and this program is thus compressingstdin to stdout, this flag is ignored.
Compress should fail with appropriate error messages if any of the following problemsoccur• The command-line has unknown options or is otherwise incorrect• The command line argument after a ‘-b’ is out of range, non-numeric, or missing.• The file associated with an input filename does not exist or is unreadable• An input filename has a ".Z" suffix• Writing the output file would overwrite a file that already exists• Writing to disk fails for any reason• Erasing the input file on completion fails for any reason
If an error occurs, the script should return the error code 1. Otherwise it returns theerror code 0.
uncompress [-v] [name ...]
uncompress will create an uncompressed version of a file generated by compress. Theuncompressed file with have the same filename as the input file with the ".Z" extension
26
removed. If the uncompression is successful and the output file is successfully written, theinput file will be deleted.
Also, like compress, if no filenames are given, uncompress takes the contents of stdinand uncompresses them to stdout.
If the optional ‘-v’ parameter is given, the script should print to stdout the compressionratio for each file processed. If no file was specified and thus this program is compressingstdin to stdout, this flag is ignored.
Uncompress should fail with appropriate error messages if any of the following problemsoccur• The command-line has unknown options or is otherwise incorrect• The file header is incorrect• The bits parameter in the file header is out of range• The file associated with the input filename does not exist or is unreadable• The input compressed data is incorrect or corrupt, which can be detected by receiving
an index that is not yet in the dictionary, or if an index value exceeds the number ofentries in the dictionary as specified in the header, or if the last entry in the file not acomplete 16-bit integer
• The input file does not end in ".Z"• The output file would overwrite a file that already exists• Writing to disk fails for any reason.• Erasing the input file on completion fails for any reason
If an error occurs, the script should return the error code 1. Otherwise it returns theerror code 0.
compress and uncompress
Daniel Hartwig contributed compress and uncompress scripts. As you can imagine, themajority of the scripts do unglamorous tasks such as checking options, filenames and thelike.
27
Here’s compress
#!/usr/bin/guile \-L . -e main -s!#
;; Copyright (C) 2013 Daniel Hartwig <[email protected]>;;;; This program is free software: you can redistribute it and/or modify;; it under the terms of the GNU General Public License as published by;; the Free Software Foundation, either version 3 of the License, or;; (at your option) any later version.;;;; This program is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;; GNU General Public License for more details.;;;; You should have received a copy of the GNU General Public License;; along with this program. If not, see <http://www.gnu.org/licenses/>.
(use-modules (lzw)(ice-9 control)(ice-9 format)(ice-9 i18n)(rnrs bytevectors)(rnrs io ports)(srfi srfi-37))
(define *program-name* #f)
;; This form of ’gettext’ is helpful for longer messages. A single;; message id can be split and aligned across many lines, similar to;; the common usage in C.(define (_ msg . rest)(gettext (string-concatenate (cons msg rest)) "guile100-compress"))
(define (error* status msg . args)(force-output)(let ((port (current-error-port)))
(when *program-name*(display *program-name* port)(display ": " port))
(apply format port msg args)(newline port)(unless (zero? status);; This call to ’abort’ causes ’main’ to immediately return the
28
;; specified status value. Similar to ’exit’ but more;; controlled, for example, when using the REPL to debug,;; ’abort’ will not cause the entire process to terminate.;;;; This is also handy to attempt processing every file, even;; after an error has occured. To do this, establish another;; prompt at an interesting place inside ’main’.(abort (lambda (k)
status)))))
(define (make-file-error-handler filename)(lambda args
(error* 1 (_ "~a: ~a")filename(strerror (system-error-errno args)))))
(define (system-error-handler key subr msg args rest)(apply error* 1 msg args))
(define (compression-ratio nbytes-in nbytes-out)(exact->inexact (/ (- nbytes-in nbytes-out) nbytes-in)))
(define (write-lzw-header port bits)(put-bytevector port (u8-list->bytevector (list #x1F #x9D bits))))
(define (compress-port in out bits verbose?)#;(begin
(write-lzw-header out bits)(%lzw-compress (cute get-u8 in)
(cute put-u16 out <>)eof-object?(expt 2 bits)))
(let* ((in-bv (get-bytevector-all in))(out-bv (lzw-compress in-bv #:table-size (expt 2 bits))))
(write-lzw-header out bits)(put-bytevector out out-bv)))
(define (compress-file infile bits verbose?)(catch ’system-error
(lambda ()(let ((outfile (string-append infile ".Z")))
(when (string-suffix? ".Z" infile)(error* 1 (_ "~a: already has .Z suffix") infile))
(when (file-exists? outfile)(error* 1 (_ "~a: already exists") outfile))
(let ((in (open-file infile "rb"))
29
(out (open-file outfile "wb")));; TODO: Keep original files ownership, modes, and access;; and modification times.(compress-port in out bits verbose?)(when verbose?
(format #; (current-error-port)(current-output-port)(_ "~a: compression: ~1,2h%\n") ; ’~h’ is localized ’~f’.infile(* 100 (compression-ratio (port-position in)
(port-position out)))))(for-each close-port (list in out))(delete-file infile))))
system-error-handler))
(define (ensure-bits obj)(let ((n (or (and (integer? obj) obj)
(and (string? obj)(locale-string->integer obj))
(error* 1 (_ "bits must be an integer -- ~a") obj))))(unless (<= 9 n 16)(error* 1 (_ "bits must be between 9 and 16 -- ~a") n))
n))
(define (make-boolean-processor key)(lambda (opt name arg config . rest)
(apply values (assq-set! config key #t)rest)))
(define (make-option-processor key parse)(lambda (opt name arg config . rest)
(apply values (assq-set! config key (parse arg))rest)))
(define (usage status)(format (current-error-port)
(_ "Usage: ~a [-v] [-b bits] [FILE]...\n"" -v, --verbose show compression ratio\n"" -b, --bits bits maximum number of BITS per code [16]\n")
*program-name*)(abort (lambda (k)
status)))
(define options(list (option ’(#\h "help") #f #f
(lambda args(usage 0)))
30
(option ’(#\v "verbose") #f #f(make-boolean-processor ’verbose?))
(option ’(#\b "bits") #t #f(make-option-processor ’bits ensure-bits))))
(define (main args);; Establishing this prompt ensures that any call to ’abort’ will at;; most escape to the continuation of ’%’ here. In effect, calling;; ’abort’ causes ’main’ to stop what it was doing and continue with;; the procedure passed to ’abort’ instead.(% (call-with-values
(lambda ()(args-fold (cdr args)
options(lambda (opt name arg . rest)(error* 0 (_ "invalid option -- ’~a’") name)(usage 1))
(lambda (arg config infiles)(values config
(cons arg infiles)));; First seed: config (with default values).’((bits . 16)(verbose? . #f))
;; Second seed: infiles (initially empty list).’()))
(lambda (config infiles)(let ((bits (assq-ref config ’bits))
(verbose? (assq-ref config ’verbose?)))(for-each (lambda (infile)
(cond ((string=? infile "-")(compress-port (current-input-port)
(current-output-port)bitsverbose?))
(else(compress-file infile
bitsverbose?))))
(if (null? infiles);; No arguments, use stdin.’("-");; Process the files in the order given on;; the command line.(reverse infiles)))
;; Exit indicating success. If an error occured anywhere,;; the call to ’abort’ will produce a different status.0)))))
31
(when (batch-mode?)(setlocale LC_ALL "")(set! *program-name* (basename (car (program-arguments))))(exit (main (program-arguments))))
32
Here’s uncompress
#!/usr/bin/guile \-L . -e main -s!#
;; Copyright (C) 2013 Daniel Hartwig <[email protected]>;;;; This program is free software: you can redistribute it and/or modify;; it under the terms of the GNU General Public License as published by;; the Free Software Foundation, either version 3 of the License, or;; (at your option) any later version.;;;; This program is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;; GNU General Public License for more details.;;;; You should have received a copy of the GNU General Public License;; along with this program. If not, see <http://www.gnu.org/licenses/>.
(use-modules (lzw)(ice-9 control)(ice-9 format)(ice-9 i18n)(ice-9 match)(rnrs bytevectors)(rnrs io ports)(srfi srfi-37))
(define *program-name* #f)
(define (_ msg . rest)(gettext (string-concatenate (cons msg rest)) "guile100-compress"))
(define (error* status msg . args)(force-output)(let ((port (current-error-port)))
(when *program-name*(display *program-name* port)(display ": " port))
(apply format port msg args)(newline port)(unless (zero? status)(abort (lambda (k)
status)))))
33
(define (make-file-error-handler filename)(lambda args
(error* 1 (_ "~a: ~a")filename(strerror (system-error-errno args)))))
(define (system-error-handler key subr msg args rest)(apply error* 1 msg args))
(define (compression-ratio nbytes-in nbytes-out)(exact->inexact (/ (- nbytes-in nbytes-out) nbytes-in)))
(define (read-lzw-header port)(match (bytevector->u8-list (get-bytevector-n port 3))
((#x1F #x9D bits)(and (<= 9 bits 16)
(values bits)))(x #f)))
(define (uncompress-port in out verbose?)(let ((bits (read-lzw-header in)))
(unless bits(error* 1 (_ "incorrect header")))
#;(%lzw-uncompress (cute get-u16 in)
(cute put-u8 out <>)eof-object?(expt 2 bits))
(let* ((in-bv (get-bytevector-all in))(out-bv (lzw-uncompress in-bv #:table-size (expt 2 bits))))
(put-bytevector out out-bv))))
(define (uncompress-file infile verbose?)(catch ’system-error
(lambda ()(let ((outfile (string-drop-right infile 2)))(when (not (string-suffix? ".Z" infile))(error* 1 (_ "~a: does not have .Z suffix") infile))
(when (file-exists? outfile)(error* 1 (_ "~a: already exists") outfile))
(let ((in (open-file infile "rb"))(out (open-file outfile "wb")))
(uncompress-port in out verbose?)(when verbose?(format #; (current-error-port)
(current-output-port)(_ "~a: compression: ~1,2h%\n") ; ’~h is localized ’~f’.
34
infile(* 100 (compression-ratio (port-position out)
(port-position in)))))(for-each close-port (list in out))(delete-file infile))))
system-error-handler))
(define (usage status)(format (current-error-port)
(_ "Usage: ~a [-v] [FILE]...\n"" -v, --verbose show compression ratio\n")
*program-name*)(abort (lambda (k)
status)))
(define (make-boolean-processor key)(lambda (opt name arg config . rest)
(apply values (assq-set! config key #t)rest)))
(define (main args)(% (call-with-values
(lambda ()(args-fold (cdr args)
(list (option ’(#\h "help") #f #f(lambda args(usage 0)))
(option ’(#\v "verbose") #f #f(make-boolean-processor ’verbose?)))
(lambda (opt name arg . rest)(error* 0 (_ "invalid option -- ’~a’") name)(usage 1))
(lambda (arg config infiles)(values config
(cons arg infiles)));; First seed: config (with default values).’((verbose? . #f));; Second seed: infiles (initially empty list).’()))
(lambda (config infiles)(let ((verbose? (assq-ref config ’verbose?)))(for-each (lambda (infile)
(cond ((string=? infile "-")(uncompress-port (current-input-port)
(current-output-port)verbose?))
(else
35
(uncompress-file infileverbose?))))
(if (null? infiles);; No arguments, use stdin.’("-");; Process the files in the order given on;; the command line.(reverse infiles)))
;; Exit indicating success.0)))))
(when (batch-mode?)(setlocale LC_ALL "")(set! *program-name* (basename (car (program-arguments))))(exit (main (program-arguments))))
36
3.4 Problem 4: tar file archives
This challenge is to create a script that takes a list of filenames and that generates anustar-format archive file. This archive file format is compatible with common POSIX tools.
The ustar interchange format is one of the simpler formats used for archive files thatcontain multiple files along with their metadata.
To begin, we are going to create a script that creates ustar-format files. But, to keepthings simple, we are only going to use a small subset of the functionality that ustar filescan provide. The result should be readable by common tar and pax tools.
3.4.1 ustar Script
The ustar script will have a simple calling structure.ustar archive file1 .. filen
It will create a new archive containing the files indicated on the command line.The script will have to handle many error conditions, including but not limited to• filename contains characters not in the ustar-string’s character set• file part of filename is longer than 100 characters• path part of filename is longer than 155 characters• file is a symbolic link, fifo, directory or any othet type of non-normal file• file’s uname and gname contain characters not in ustar-string’s character set• file’s uname or gname are longer than 31 characters• file length is greater than 8,589,934,591 bytes, (octal 77777777777)• file’s UID or GID is greater than 2,097,151 (octal 7777777)• system errors about inability to open, write, or close files.
3.4.2 The rustar File Format
First, I will describe our restricted ustar file format, which, I’m going to dub rustar forrestricted ustar, just so that we’re clear that I’m talking about something more specific thanthe ustar format.
File Structure
A rustar file contains a set of logical records. Each logical record represents the contentsof a file plus its metadata. The logical records appear sequentially in the file, one afteranother, and there is no global header in the file. At the end of the file is a footer.
Logical Records
Each logical record consists of two parts, a header segment, and the contents of the filea.k.a the data segment. Of these, only the header requires a detailed explanation.
Header
The header segment is a 512 byte block that contains metadata for a file. The block isbroken up into 17 fields of fixed length. Each field contains data in one of three types.
37
Header Types
Here we describe the three types that can appear in a header. Each type has the annotation[N]. The N indicates that this field is a fixed-size that takes up N bytes.1. rustar-string[N] is a fixed-width string that contains only the codepoints listed
below. It is stored in the ASCII encoding, and, if necessary, is right padded withNULL bytes to ensure it occupies the whole of its N bytes. NULL bytes can onlyappear at the end of the string. The string need not end with NULL bytes if it fills thewhole of its fixed witdh.The list of allowed codepoints is• U+20 to U+22• U+25 to U+3F• U+41 to U+5A• U+5F• U+61 to U+7A• and U+00, but, U+00 can only be followed by more U+00.
2. rustar-0string[N] — note the ‘0’ — is a fixed-width string with the same formatand restrictions as a rustar-string[N] but with an addition restriction. It must endwith at least one NULL byte.
3. rustar-number[N] is an unsigned integer stored as a fixed-width string. The stringcontains the the text representation of the integer in octal format. The last byte (andonly the last byte) of the string must be NULL. The string is left-padded with the ‘0’character to ensure the number occupies the whole of its fixed width buffer.For example, a rustar-number[8] field for the integer 10 will be the string “0000012”followed by one byte of NULL. 12 octal equals 10 decimal.
Header Fields
The 17 fields in the 512 byte header block of a logical record are
Field Format DescriptionName string[100] The filename by itself, with no directory
information. The path separator character(U+2F), is not allowed.
Mode number[8] A bitfield of the permissions. See below.UID number[8] The User ID of the fileGID number[8] The Group ID of the fileSize number[12] The length of the file in bytesmtime number[12] The 32-bit integer modification time of the
file.Checksum number[8] 256 + the sum of all the bytes in this header
except the checksum field.Typeflag string[1] Always “0”.Link name string[100] Always 100 bytes of NULL.Magic 0string[6] The string “ustar” plus a NULL.
38
Version string[2] The string “00”.uname 0string[32] The uname of the file.gname 0string[32] The gname of the fileDev-Major number[8] Always zero.Dev-Minor number[8] Always zero.Prefix string[155] Path information for this file. If this file has
no additional path information, this is allNULL. Directory separation is representedby ‘/’ forward slash. The slash at the endis assumed, and should not be included ex-plicitly.1
Padding 0string[12] 12 bytes of NULL.The mode bitfield is a standard permissions bitfield:• 0x001 execute permission for ’other’• 0x002 write permission for ’other’• 0x004 read permission for ’other’• 0x008 exeute permission for ’group’• 0x010 write permission for ’group’• 0x020 read permission for ’group’• 0x040 execute permission for ’owner’• 0x080 write permission for ’owner’• 0x100 read permission for ’owner’• 0x200 (unused)• 0x400 if is setgid• 0x800 if is setuid
Data
After the 512-byte header block, the binary contents of the file are stored. The data segmentis NULL-padded so that it ends on a 512-byte block boundary.
Footer
The footer is 1024 bytes of NULL that appears at the end of the file.
1 For example: prefix “foo” + name “bar” forms “foo/bar”. Prefix “foo/” + name “bar” forms “foo//bar”.Don’t do that.
39
The Archive Script
Jez Ng contributed a script that meets the above requirements quite nicely. One thingto note here is the use of the procedures cut and cute. These let you, in effect, pass asubset of the required parameters to a procedure. In a later call, you can add the remainingparameters to the procedure and then truly call it.
#! /usr/bin/env guile \-e main -s!#
(use-modules (rnrs bytevectors)(rnrs io ports)(srfi srfi-1) ; map, reduce(srfi srfi-26) ; cut, cute(ice-9 format))
(define write-bytevector (cut put-bytevector (current-output-port) <...>))
(define block-size 512)
(define (cat)(define bv (make-bytevector block-size 0))(let ((read-count (get-bytevector-n! (current-input-port) bv 0 block-size)))
(unless (eof-object? read-count)(write-bytevector bv)(unless (< read-count block-size) (cat)))))
(define rustar-char-set(char-set-union(ucs-range->char-set #x20 #x23)(ucs-range->char-set #x25 #x40)(ucs-range->char-set #x41 #x5B)(char-set #\x5F)(ucs-range->char-set #x61 #x7B)))
(define (valid-rustar-char? c)(char-set-contains? rustar-char-set c))
(define (make-fixed-string length string)(let ((bv (make-bytevector length 0)))
(string-for-each-index(lambda (i)(let ((c (string-ref string i)))
(unless (valid-rustar-char? c)(throw ’ustar-error "encountered invalid character"))
(bytevector-u8-set! bv i (char->integer c))))string)
40
bv))
(define (make-rustar-string length string)(if (<= (string-length string) length)
(make-fixed-string length string)(throw ’ustar-error "’~a’ is too long for tar header" string)))
(define (make-rustar-0string length string)(if (< (string-length string) length)
(make-fixed-string length string)(throw ’ustar-error "’~a’ is too long for tar header" string)))
(define (make-rustar-number length number)(let* ((num (number->string number 8))
(padding (- length (string-length num) 1)))(if (>= padding 0)
(make-fixed-string length (string-append (make-string padding #\0) num))(throw ’ustar-error "~a is too large for tar header" num))))
;; Unlike dirname, this doesn’t return "." for files in the cwd.(define (raw-dirname path)(let ((last-separator-pos (string-rindex
path(string-ref file-name-separator-string 0))))
(if last-separator-pos(string-take path last-separator-pos)"")))
(define (write-file-header filename)(define st (lstat filename))(unless (eq? (stat:type st) ’regular)
(throw ’ustar-error "Only regular files are supported"))(let* ((uid (stat:uid st))
(gid (stat:gid st)); We only really need an a-list for the purposes of modifying; checksum in-place. The other keys are not used. However, they do; serve as documentation.(header‘((filename . ,(make-rustar-string 100 (basename filename)))(mode . ,(make-rustar-number 8 (stat:perms st)))(uid . ,(make-rustar-number 8 uid))(gid . ,(make-rustar-number 8 gid))(size . ,(make-rustar-number 12 (stat:size st)))(mtime . ,(make-rustar-number 12 (stat:mtime st)))(checksum . ,(make-bytevector 8 (char->integer #\space)))(typeflag . ,(make-rustar-string 1 "0"))(link-name . ,(make-rustar-string 100 ""))
41
(magic . ,(make-rustar-0string 6 "ustar"))(version . ,(make-rustar-string 2 "00"))(uname . ,(make-rustar-0string 32 (passwd:name (getpwuid uid))))(gname . ,(make-rustar-0string 32 (group:name (getgrgid gid))))(dev-major . ,(make-rustar-number 8 0))(dev-minor . ,(make-rustar-number 8 0))(path . ,(make-rustar-string 155 (raw-dirname filename)))(padding . ,(make-rustar-0string 12 ""))))
(sum (cut reduce + 0 <>))(checksum (sum (map (compose sum bytevector->u8-list cdr) header))))
(set! header (assq-set! header ’checksum (make-rustar-number 8 checksum)))(for-each (compose write-bytevector cdr) header)))
(define (tar archive filenames)(with-output-to-file archive
(lambda ()(for-each (lambda (filename)
(write-file-header filename)(with-input-from-file filename cat #:binary #t))
filenames)(write-bytevector (make-bytevector (* block-size 2) 0)))
#:binary #t))
(define (main args)(define perror (cut format (current-error-port) <...>))(define (system-error-handler . args)
(perror "error: ~a~%" (strerror (system-error-errno args)))(exit 1))
(define (ustar-error-handler . args)(perror "error: ")(apply perror (cdr args))(perror "~%")(exit 1))
(catch ’ustar-error(lambda ()(catch ’system-error
(cute tar (cadr args) (cddr args))system-error-handler))
ustar-error-handler))
42
Later, Mark Weaver contributed a more featureful script that handles almost all of thecapabilites of the ustar archive format. It does directories and links as well as files. Also,he uses a very common hack to allow longer path names. He puts whatever part of thepath that will fit within the 100 character field for the filename. You can find his script inthe appendix, See Section A.1 [ustar Archives], page 48.
43
4 Theme 2: Web 1.0
he second theme in this project is “Web 1.0”, where we’ll talk about interacting withthe Internet as it existed in the 1990s.
The 1990s began with emergence of Gopher clients and servers. The Internet Gopherprotocol visualized the world as a series of folders. The folders usually contained plain-textdocuments or media files like GIFs or AU audio. This was before both HTML and PDF, somixing text and graphics in a single file wasn’t as common, and, if it did occur, it was informats such as PostScript.
The HTTP-and-HTML-based internet is linked to the appearance of the NCSA Mosaicbrowser and the NCSA httpd server. There were precursors, but, as a practical matter, 1993was the beginning of the HTTP/HTML web.
But, in those days, before AJAX or Flash, most of the content was static HTML contentor dynamic content created by CGI scripts. In this context, before the concept of cookieswas developed in 1994, personalization of content for different users was not practical.
JavaScript appeared in Netscape Navigator 2.0 in 1995 and Internet Explorer 3.0, in late1996, but, with incompatibilities between the two implementations. Before 1996, almost allcontent was static and generated on the server side. This early Web had a more stronglydefined separation between client and server.
The early Web pages had stylistic quirks that are less common today. Before CSS2, Webpage layouts were often created by using tables. Blinking text, animated GIFs, embeddedMIDI tunes were common.
By the end of the decade, Linux, Apache, MySQL, and PHP were all quite functional.Those programs, in conjunction with Perl, which first appeared in the 80s, became thebuilding blocks of the famous LAMP stack. This free, open software stack allowed for someof the common types of interactivity to which we have become accustomed.
PHP used a model that allowed for rapid generation of Web pages, where code wasembedded within otherwise static HTML web pages. When those pages were requested, theembedded PHP code was run, and its output became HTML content.
So, in our second theme we’ll imagine what the world would have been like if GUILEwere part of the ecosystem that made up the 1990s Internet experience. Specifically, we’lltake a look at using Guile for
• on-the-fly evaluation of code embedded within HTML documents
• the Internet Gopher protocol
• CGI scripting
• the Linux Apache GUILE MySQL stack
• and the animated GIF format.
And away we go.
44
4.1 Problem 5: PHP-Style GUILE
This challenge is to write a CGI script that1. receives a filename as a parameter2. passes a file by that filename through a preprocessor called eguile3. and returns the output to the CGI client.
But why eguile? That script helps us mix HTML and Guile.One of the programming paradigms of Web 1.0 was the PHP programming model, where
code was embedded within HTML. The code was run when a client requested the file fromthe server, and any output printed by the execution of the code became embedded in theHTML when it was sent to the client. The code enclosed between the <?php and ?> tagsis evaluated when the file is requested. Anything printed to stdout appears in the HTMLdocument.<!DOCTYPE html><html><body>
<?phpecho "My first PHP script!";
?>
</body></html>
When it first arrived on the scene, PHP was CGI executable.The side effect of today’s challenge is to re-create the PHP programming model in Guile,
making something like the following possible.<!DOCTYPE html><html><body>
<p><?scm
(display "A Guile Script!")?>
</p><p><?scm:d "A string" ?>
</p></body>
</html>
Mixing HTML and Scheme
So I mentioned eguile. It is an abandonware project that, when given a file that is a mixof HTML and of Guile code, can run it PHP style. The HTML text will be passed throughunmodified to the output, and Scheme code will be executed, and anything that the Schemecode displays to the current output port will be passed through as well.
45
eguile does this by recognizing two new tags.• ‘<?scm’ and ‘?>’ enclose Scheme code, which eguile will pass to Guile for evaluation.• ‘<?scm:d’ and ‘?>’ also enclose Scheme code to be evaluated, just like the ‘<?scm’ tags.
Additionally, eguile will display the value of the last expression using the displayprocedure.
Making a CGI Script
eguile by itself is not a complete solution. It can run mixed HTML and Guile code throughthe Guile interpreter, but, it doesn’t have any hooks to connect it to the webserver.
To make this happen, we can add some framework code to have eguile run as part ofa CGI script.
The quickest way to make a CGI script is to use the functions provided by the Guile-WWW project. Guile-WWW has routines that provide CGI functionality.
Thus, we’ll be creating a Scheme script that uses Guile-WWW for CGI processing andthat includes an updated version of eguile.
URL Parsing
We’ll call this script ‘ghp.cgi’. ghp is short for Guile HTML Processor. For any basicwebserver, you can put the ‘ghp.cgi’ in the ‘cgi-bin’ directory, and run it by pointingyour browser to something like http://localhost/cgi-bin/ghp.cgi.
But wait! We have to tell ‘ghp.cgi’ what HTML-and-Scheme file it needs to processand output. One way is to have ‘ghp.cgi’ parse any extra path information at the end ofits URL.
That script can parse extra path information is given after the script name, like so:http://localhost/cgi-bin/ghp.cgi/FILENAME
Any normal webserver should put the extra path information for the CGI script in thePATH_INFO environment variable.
The ‘ghp.cgi’ script should load an HTML-and-Guile file named FILENAME from somesensible default path, process it through Eguile, and then serve it back to the client.
Like any sane CGI script that processes a URL, ‘ghp.cgi’ should strip out any ‘/../’ inthe path, or maybe just fail if there are ‘/../’ in the path.
The Task at Hand
The task is to write a CGI script that1. inspects its PATH_INFO to see if an extra filename appears at the end of the URL used
to call the script2. passes a file by that filename through the Eguile processing procedure3. and sends it back to the client
If a file by that filename doesn’t exist, the script should return a HTTP 404 “Not Found”error.
Truly, this statement of the problem will probably be much longer than the ‘ghp.cgi’ file
46
I’m asking you to create. But, once the CGI script is in place, we can serve up mixed HTMLand Scheme content just like PHP 3 did way back in 1997.
You can find Guile-WWW at http://nongnu.org/guile-www.For the moment, you can find Eguile at
https://github.com/spk121/guile100/blob/master/code/eguile.scm
The original source of Eguile is athttp://woozle.org/~neale/src/eguile/. Remember that it is abandonware, so don’tbug the owner with questions. We’re going to find a new home and maintainer for Eguilein the near future.
Eguile itself was based on other predecessors, like Shiro Kawai’s ESCM. You can findESCM athttp://practical-scheme.net/vault/escm.html.
4.2 Problem 6: MySQL
This challenge is to write one static HTML form page and one CGI script that will adddata to a MySQL database table.1. Create a static HTML page that has a form with a name text field and a
male/female/other gender radio button set. The form, when posted, will call a GuileCGI script as its action, posting the name and gender fields.
2. Create a CGI script that receives the form’s name and gender post data and adds itto a MySQL / MariaDB database. The script will then display the entire contents ofthe database as a table in HTML.
You may find Guile-WWW useful when creating CGI scripts. You can find Guile-WWWat http://nongnu.org/guile-www.
Guile-DBI is probably the best way to access MySQL databases in Guile. You can findit at http://home.gna.org/guile-dbi/.
4.3 Problem 7: Animated GIF Badges
A very important part of the Web 1.0 experience were the GIF badges. These 88 by 31 pixelimages were typically bright colors on a grey background with a border to give it a raisedbutton effect. They had text announcing one’s loyalty to a brand of webbrowser, computer,or political philosophy, or were used as download buttons. They were usually animated.
To create our Web 1.0 experience, we need animated GIF badges. So this week’s taskis to write a procedure that will create a GIF. The procedure will have to come in twoversions: one for animated GIF and one for static GIF.
For the static GIF case, you should assume that your input data is the following:• a filename for the output• a palette of 256 24-bit RGB colors, perhaps stored as a vector of unsigned integers• a two dimensional array of unsigned 8-bit indices to the palette colors
For the animated GIF case, you should assume that your input data is• a filename for the output• a palette of 256 24-bit RGB colors, as above
47
• a three-dimensional array of unsigned 8-bit indices• and a variable containing the desired millisconds per frame
The actual specification for GIF, GIF89a, can be found athttp://www.w3.org/Graphics/GIF/spec-gif89a.txt. This specification, how-ever, contains a lot of fields and features that won’t be needed for this specificcase. On the other end of the spectrum is the current Wikipedia page for GIF,https://en.wikipedia.org/wiki/Gif, which, at the time of this writeup, contains avery condensed and cryptic description of the file format and the fields contained therein.By merging information from the official specification and the condensed one, it should bepossible to write a legible function that creates GIFs for the two cases described above.
One of the trickier parts of the implementation is the LZW compression required. For-tunately, an implementation of LZW compression is handy, See Section 3.3 [Problem 3],page 18: LZW Compression.
These days, the giflib project is as close as we have to a canonical library for the Gifreading and writing. It can be referenced to help understand the places in the specificationthat are obscure. It is at http://sourceforge.net/projects/giflib.
An alternate strategy would be to wrap up giflib as a Guile extension using either itsFFI interface or its C interface.
48
Appendix A Other Examples
Here are some other examples for you
A.1 ustar Archives
Back in Section 3.4 [Problem 4], page 36, I defined a limited, reduced functionality versionof the ustar archive format. The limited version had just enough functionality to create avalid TAR file. After I received Jez’s solution, Mark Weaver sent an alternate script thathandles almost all of the capabilities of the ustar file format, including links and longerpath names. That script is below#!/usr/bin/guile \-e main -s!#;;; Copyright (C) 2013 Mark H Weaver <[email protected]>;;;;;; This program is free software: you can redistribute it and/or modify;;; it under the terms of the GNU General Public License as published by;;; the Free Software Foundation, either version 3 of the License, or;;; (at your option) any later version.;;;;;; This program is distributed in the hope that it will be useful,;;; but WITHOUT ANY WARRANTY; without even the implied warranty of;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;;; GNU General Public License for more details.;;;;;; You should have received a copy of the GNU General Public License;;; along with this program. If not, see <http://www.gnu.org/licenses/>.
(use-modules (srfi srfi-1)(ice-9 match)(ice-9 receive)(rnrs bytevectors)(rnrs io ports))
;; ’file-name-separator-string’ and ’file-name-separator?’ are;; included in Guile 2.0.9 and later.(define file-name-separator-string "/")(define (file-name-separator? c) (char=? c #\/))
(define (fmt-error fmt . args)(error (apply format #f fmt args)))
;; Like ’string-pad-right’, but for bytevectors. However, unlike;; ’string-pad-right’, truncation is not allowed here.(define* (bytevector-pad
49
bv len #:optional (byte 0) (start 0) (end (bytevector-length bv)))(when (< len (- end start))
(fmt-error"bytevector-pad: truncation would occur: len ~a, start ~a, end ~a, bv ~s"len start end bv))
(let ((result (make-bytevector len byte)))(bytevector-copy! bv start result 0 (- end start))result))
(define (bytevector-append . bvs)(let* ((lengths (map bytevector-length bvs))
(total (fold + 0 lengths))(result (make-bytevector total)))
(fold (lambda (bv len pos)(bytevector-copy! bv 0 result pos len)(+ pos len))
0 bvs lengths)result))
(define ustar-charset#;(char-set-union (ucs-range->char-set #x20 #x23)
(ucs-range->char-set #x25 #x40)(ucs-range->char-set #x41 #x5B)(ucs-range->char-set #x5F #x60)(ucs-range->char-set #x61 #x7B))
char-set:ascii)
(define (valid-ustar-char? c)(char-set-contains? ustar-charset c))
(define (ustar-string n str name)(unless (>= n (string-length str))
(fmt-error "~a is too long (max ~a): ~a" name n str))(unless (string-every valid-ustar-char? str)
(fmt-error "~a contains unsupported character(s): ~s in ~s"name(string-filter (negate valid-ustar-char?) str)str))
(bytevector-pad (string->utf8 str) n))
(define (ustar-0string n str name)(bytevector-pad (ustar-string (- n 1) str name)
n))
(define (ustar-number n num name)(unless (and (integer? num)
50
(exact? num)(not (negative? num)))
(fmt-error "~a is not a non-negative exact integer: ~a" name num))(unless (< num (expt 8 (- n 1)))
(fmt-error "~a is too large (max ~a): ~a" name (expt 8 (- n 1)) num))(bytevector-pad (string->utf8 (string-pad (number->string num 8)
(- n 1)#\0))
n))
(define (checksum-bv bv)(let ((len (bytevector-length bv)))
(let loop ((i 0) (sum 0))(if (< i len)
(loop (+ i 1) (+ sum (bytevector-u8-ref bv i)))sum))))
(define (checksum . bvs)(fold + 0 (map checksum-bv bvs)))
(define nuls (make-bytevector 512 0))
;; write a ustar record of exactly 512 bytes, starting with the;; segment of BV between START (inclusive) and END (exclusive), and;; padded at the end with nuls as needed.(define* (write-ustar-record
port bv #:optional (start 0) (end (bytevector-length bv)))(when (< 512 (- end start))
(fmt-error "write-ustar-record: record too long: start ~s, end ~s, bv ~s"start end bv))
;; We could have used ’bytevector-pad’ here,;; but instead use a method that avoids allocation.(put-bytevector port bv start end)(put-bytevector port nuls 0 (- 512 (- end start))))
;; write 1024 zero bytes, which indicates the end of a ustar archive.(define (write-ustar-footer port)(put-bytevector port nuls)(put-bytevector port nuls))
(define (compose-path-name dir name)(if (or (string-null? dir)
(file-name-separator? (string-ref dir (- (string-length dir) 1))))(string-append dir name)(string-append dir "/" name)))
;; Like ’call-with-port’, but also closes PORT if an error occurs.
51
(define (call-with-port* port proc)(dynamic-wind
(lambda () #f)(lambda () (proc port))(lambda () (close port))))
(define (call-with-dirstream* dirstream proc)(dynamic-wind
(lambda () #f)(lambda () (proc dirstream))(lambda () (closedir dirstream))))
(define (files-in-directory dir)(call-with-dirstream* (opendir dir)
(lambda (dirstream)(let loop ((files ’()))(let ((name (readdir dirstream)))(cond ((eof-object? name)
(reverse files))((member name ’("." ".."))(loop files))(else(loop (cons (compose-path-name dir name) files)))))))))
;; split the path into prefix and name fields for purposes of the;; ustar header. If the entire path fits in the name field (100 chars;; max), then leave the prefix empty. Otherwise, try to put the last;; component into the name field and everything else into the prefix;; field (155 chars max). If that fails, put as much as possible into;; the prefix and the rest into the name field. This follows the;; behavior of GNU tar when creating a ustar archive.(define (ustar-path-name-split path orig-path)(define (too-long)
(fmt-error "~a: file name too long" orig-path))(let ((len (string-length path)))
(cond ((<= len 100) (values "" path))((> len 256) (too-long))((string-rindex path
file-name-separator?(- len 101)(min (- len 1) 156))
=> (lambda (i)(values (substring path 0 i)
(substring path (+ i 1) len))))(else (too-long)))))
(define (write-ustar-header port path st)
52
(let* ((type (stat:type st))(perms (stat:perms st))(mtime (stat:mtime st))(uid (stat:uid st))(gid (stat:gid st))(uname (or (false-if-exception (passwd:name (getpwuid uid)))
""))(gname (or (false-if-exception (group:name (getgrgid gid)))
""))
(size (case type((regular) (stat:size st))(else 0)))
(type-flag (case type((regular) "0")((symlink) "2")((char-special) "3")((block-special) "4")((directory) "5")((fifo) "6")(else (fmt-error "~a: unsupported file type ~a"
path type))))
(link-name (case type((symlink) (readlink path))(else "")))
(dev-major (case type((char-special block-special)(quotient (stat:rdev st) 256))(else 0)))
(dev-minor (case type((char-special block-special)(remainder (stat:rdev st) 256))(else 0)))
;; Convert file name separators to slashes.(slash-path (string-map (lambda (c)
(if (file-name-separator? c) #\/ c))path))
;; Make the path name relative.;; TODO: handle drive letters on windows.(relative-path (if (string-every #\/ slash-path)
"."(string-trim slash-path #\/)))
53
;; If it’s a directory, add a trailing slash,;; otherwise remove trailing slashes.(full-path (case type
((directory) (string-append relative-path "/"))(else (string-trim-right relative-path #\/)))))
(receive (prefix name) (ustar-path-name-split full-path path)
(let* ((%name (ustar-string 100 name "file name"))(%mode (ustar-number 8 perms "file mode"))(%uid (ustar-number 8 uid "user id"))(%gid (ustar-number 8 gid "group id"))(%size (ustar-number 12 size "file size"))(%mtime (ustar-number 12 mtime "modification time"))(%type-flag (ustar-string 1 type-flag "type flag"))(%link-name (ustar-string 100 link-name "link name"))(%magic (ustar-0string 6 "ustar" "magic field"))(%version (ustar-string 2 "00" "version number"))(%uname (ustar-0string 32 uname "user name"))(%gname (ustar-0string 32 gname "group name"))(%dev-major (ustar-number 8 dev-major "dev major"))(%dev-minor (ustar-number 8 dev-minor "dev minor"))(%prefix (ustar-string 155 prefix "directory name"))
(%dummy-checksum (string->utf8 " "))
(%checksum(bytevector-append(ustar-number7(checksum %name %mode %uid %gid %size %mtime
%dummy-checksum%type-flag %link-name %magic %version%uname %gname %dev-major %dev-minor%prefix)
"checksum")(string->utf8 " "))))
(write-ustar-record port(bytevector-append%name %mode %uid %gid %size %mtime%checksum%type-flag %link-name %magic %version%uname %gname %dev-major %dev-minor%prefix))))))
54
(define (write-ustar-path port path)(let* ((path (if (string-every file-name-separator? path)
file-name-separator-string(string-trim-right path file-name-separator?)))
(st (lstat path))(type (stat:type st))(size (stat:size st)))
(write-ustar-header port path st)(case type((regular)(call-with-port* (open-file path "rb")
(lambda (in)(let ((buf (make-bytevector 512)))
(let loop ((left size))(when (positive? left)
(let* ((asked (min left 512))(obtained (get-bytevector-n! in buf 0 asked)))
(when (or (eof-object? obtained)(< obtained asked))
(fmt-error "~a: file appears to have shrunk" path))(write-ustar-record port buf 0 obtained)(loop (- left obtained)))))))))
((directory)(for-each (lambda (path) (write-ustar-path port path))
(files-in-directory path))))))
(define (write-ustar-archive output-path paths)(catch #t
(lambda ()(call-with-port* (open-file output-path "wb")
(lambda (out)(for-each (lambda (path)
(write-ustar-path out path))paths)
(write-ustar-footer out))))(lambda (key subr message args . rest)(false-if-exception (delete-file output-path))(format (current-error-port) "ERROR: ~a\n"
(apply format #f message args))(exit 1))))
(define (main args)(match args
((program output-path paths ...)(write-ustar-archive output-path paths))(_ (display "Usage: ustar <archive> <file> ...\n" (current-error-port))
(exit 1))))
55
;;; Local Variables:;;; mode: scheme;;; eval: (put ’call-with-port* ’scheme-indent-function 1);;; eval: (put ’call-with-dirstream* ’scheme-indent-function 1);;; End:
56
5 References
Allen, John. 1978. Anatomy of Lisp. New York: McGraw-Hill.ANSI X3.226-1994. American National Standard for Information Systems
—Programming Language—Common Lisp.The IEEE and The Open Group. 2001-2004. The Open Group Base Specifications Issue
6 IEEE Std 1003.1, 2004 Edition.
57
Index
Any inaccuracies in this index may be explained by the factthat it has been prepared with the help of a computer.—Donald E. Knuth, Fundamental Algorithms(Volume 1 of The Art of Computer Programming)
(Index is nonexistent)