batch file

A Modern Batch Programming Tutorial (Win 2k/XP)

Change the Look on This Site

This page teaches you some modern NT-based batch programming and has some fairly advanced and pretty useful batch scripts to help you get started. Batch files are the simple and rather archaic, interpreted scripting language of MS-DOS and it's derivatives and followers such as Windows 9x and NT (NT, 2k XP). Basic MS-DOS knowledge is assumed, and it does help if you know some programming language already, although this is not strictly necessary, I'll try to explain most of the terminology on the way. I've written this in tutorial form so be sure to check out other options and commands not covered here extensively. Although many of the commands and scripts represented here will work on MS-DOS and later, many Many of the newer command-options that are essential to advanced batch programming do require Windows NT, 2000 or XP (of course 2003 will also do as will probably LongHorn once it comes out).

Although I didn't know it when I started writing this tutorial, I've later on realized that the batch language is equivalent to the old late 70s basics having commands roughly equivalent to input, print, let, if and goto. And because repetition can be expressed with goto and if as in most assembler languages, you can build more elaborate programs out of these primitives.

You may wonder why one would write batch files these days as there are really good, real programming languages around. Things like Perl, CPP or Java to name a few. For full blown programs I prefer Java or CPP , true, and Perl needs to be installed separately. Nowadays I tend to do all of these little programs with Perl and use batch files just on systems that don't have Perl installed. But back to batches, I do find this an intellectual challenge, accomplishing as much with batches as is practical. Unlike Perl and many other languages, Batch files are native to DOS and you can count on them on every machine without having to download tens of megs of third-party software (well only about a floppy in case of tiny Perl). Batch syntax is also relatively easy and the language is interpreted. OK there are Visual Basic script and JavaScript but I don't really want to learn either of those for various reasons including platform dependence.

WARNING:

While I've been working with DOS since MS-DOS 5 and still do use the DOS BOX

http://www.student.oulu.fi/~vtatila/style.html

occasionally, it is still possible that some information in this document is not 100 percent accurate, as I don't know the formal syntax of batch files exactly, nor do I know everything about DOS or NT specific batch commands. So, not to be used for mission critical stuff, hehe. Any comments, additions and corrections would be welcome, though.

NOTE:

I tend to use the term NT to mean the set of operating systems based on the NT kernel (NT, 2000, XP and 2003 currently). In most cases it means Windows 2000 or XP as most of the command-extensions were added in 2k, I believe.

Contents

• The Basics of Batch Programming • Batch Syntax and Somee Practical Notes

• Relative Paths • Wildcards • Redirection • Piping • Variables • commands • Good Batch Programming Style • NT Specific Command Extensions

• String Input, Integer Arithmetic and Looping • The Else clause, String and Numeric Comparisons • The Indispensable For Command • More String Processing and Some Magic Variables • Bigger Example Programs

• Emulating Gosub • Random Lines • Array Emulation and Sorting • Epilog

The Basics of Batch Programming

this section introduces the basic MS-DOS batch commands and concepts like relative paths, redirection, pipes and variables (but some Windows pitfalls are also mentioned). If you are an experienced batch programmer, you might want

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#epilog

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#example_c

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#example_b

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#example_a

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#examples

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#magic

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#for

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#comparisons

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#set

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#extensions

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#style

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#commands

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#variables

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#pipe

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#wildcards

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#paths

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#syntax

http://www.student.oulu.fi/~vtatila/batch_tutorial.html#basics

to skip this section if you feel like it. However, if you've done batches but your batch knowledge is a bit rusty, for instance, acquired in the good old DOS days, you may wish to quickly browse through this in case there's anything new. Let me emphasize that I'm not trying to be complete and cover all the options and idioms, just the most useful you'll likely use most of the time.

Batch Syntax and Somee Practical Notes

Batch files are series of MS-DOS commands typed in a file, one command per line. The file uses the MS-DOS character set, has an extension of .bat and it is run automatically if you type it's base name without the extension. If an exe or com file with the same basename exists, you might want to explicitly specify the .bat extension to guarantee that the batch file gets executed. If you need to use high-ASCII characters in the file such as umlauts and special graphics symbols, you must save it as MS-DOS text otherwise the characters might be different from what you expected. Many of the better Windows text editors can save as MS-DOS text (even Wordpad can), and there's good old Edit which still writes out MS-DOS files.

The philosophy of batch programming is that nearly all of the batch constructs are ordinary commands that can also be used outside batch scripts in MS-DOS. Although some of these commands are virtually never used outside batches, they are still their in DOS. So most of the commands you'll be likely using are ordinary DOS commands and work just as you would expect them to.

By the way, if you run into problems in a batch file (e.g. it doing unwanted things, getting stuck in a loop etc...) you can in most cases quit the execution of a batch file by pressing ctrl+c and answering y when asked if you really want to terminate the batch job.

To get a list of all MS-DOS commands type in help in the prompt. For help on an individual command type it's name followed by a slash and a question mark (e.g. copy /?). I recommend getting to know most of the commands that look interesting, so you'll be familiar with the set of tools used in real world batch scripting.

In order for help to work in Windows 9X, you need to download this set of old DOS commands , extract it and run help in the current directory.

One excellent resource covering pretty much everything from ancient DOS

ftp://ftp.microsoft.com/Products/Windows/Windows95/CDRomExtras/OtherUtilities/olddos.exe

ftp://ftp.microsoft.com/Products/Windows/Windows95/CDRomExtras/OtherUtilities/olddos.exe

utilities like edlin to cmd extensions and little known commands such as findstr is Microsoft Windows XP Command Reference . I definitely recommend it even over the DOS command help pages.

Relative Paths

Although many people know the cd and md commands, it is not that common to use relative path names in DOS that much. The path names are called relative because they are specified relative to the current directory (.)). Here are some examples:

cd games\duke3d Will change to .\games\duke3d where . is the current directory. It is not necessary to type cd games cd duke3d separately, but on the other hand this means less re-typing if you make a mistake. I'd suggest using complete pathnames as much as possible to decrease the number of lines in your batch scripts, though.

Notice that md games\mygame Doesn't work as you would expect, unless you have NT command extensions on but that's another story.

Another way of using relative paths is to refer to directories that are on a lower level on the directory hierarchy. Say we are in \temp copy ..\autoexec.bat . Will copy autoexec from the root directory to the subdir temp. This can also be risky, don't ever type del . in temp like I once did. The two periods can be chained like this ..\..\..\ to refer to even lower levels on the directory tree. You can also continue directory specifications after a period. Supposing we are in \games\dukebacup xcopy /s ..\duke3d\* . Will copy everything under \games\duke3d to \games\dukebackup including sub-directories. Notice that \games\duke3d is actually also a relative path name (it's relative to the current drive).

Changing drives can also be done relatively, d: wil change to the previous directory in which you were on drive d where as d:\ switches to the root of drive

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds.mspx

d. Finally, environment variables (see the section on variables) can be used as pathnames. A common example is to refer to the windows system32 directory regardless of actual Windows directory name with: %windir%system32 The variable named windir can be used in NT and later on in which it's defined to contain the location of your Windows directory.

Wildcards

Wildcards are an extremely useful tool for processing sets of files. Most, though not all, DOS commands do support wildcards. The idea is that in stead of giving a name of a single file, you use some wildcards to make the name more generic. All the file names that match the given wildcard expression will be processed. The most common wildcard is the asterisk sign * which replaces any number (including 0) of any characters. Examples:

*.txt Will select all files that end in .txt for processing *.* and * are strictly speaking different. *.* selects all file names with extensions, that is, files that have a period in the name. Where as * will also select files that don't have any extension. Actually, doS doesn't seem to make the difference between the two, even if it should, although most Unix shells likely do. foo*bar* Will select everything starting with foo and also containing the text bar as well: foobar.aaa foobarb.ab foo.bar and fooblabar.txt would all be selected.

The other wildcard character is the question mark which replaces 0 or 1 instances of any characters. *.st? Would select all files whose extension starts with st, the third character may be anything and even files with extensions starting with just .st would be selected.

Redirection

Redirection is the process of directing command output to a file or reading keyboard input for a program from a file. It's a standard trick of Unix chunkies but not too well known in DOS. Here's how it works: command <input >output Will read lines of keyboard input from the file input and write the output of the command to the file output. Either of the redirection symbols (< or >) can be omitted at will. If the output file doesn't exist, it will be created. If it does exist, however, it is overwritten. To be able to append the output of several commands into one file use the form command >>output in stead.

Normally both error messages and ordinary output, say after an echo command, gose to the same output be it a file or screen. Sometimes it would be useful to separate the two streams like if you want to append errors into some kind of a log file while showing ordinary output to the user. I found some vague instructions on this on the MS Web site but cannot really fully figure it out. Here's the URL to the MS articble about advanced redirection in case you want to read up on it: About Redirection by Microsoft

I got some help with this problem. The following example will redirect the standard output and error streams to separate files: command 1>stdout.txt 2>strerr.txt

Piping

Pipes are used to pass the output of one program as the (standard) input of another. This could be done with: commandA >infile commandB <infile del infile

But pipes allow you to do the same on one line without user-managable temporary files: commandA | commandB Practically any number of pipes are allowed commandA | commandB | commandN

However, pipes and redirection cannot be combined the way you would expect

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/redirection.mspx

(neither in doS nor in most Unix shells). That is something like: dir /b >unsorted.txt |sort >sorted.txt will run but will not work as expected. one reason for this is that redirecting the output to a file doesn't output the redirected content on the screen so the next program in the pipe has nothing to process. In unix, there's a command called tee, which will output it's input both to a file and on the screen so the limitation can be worked around. although there are ports of this utility to DOS, there's no microsoft equivalent in DOS or Windows.

Variables

Another important concept in batch programming are environment variables. They are named placeholders for character data and are usually global in DOS (will persist till reboot and be accessible in all batch files). When running in DOS Box, all user-created variables will be lost upon exiting the command interpreter (in real DOS they persist till reboot). Environment variable names are not case sensitive.

You use the set command to manage variables like this: set name=value. If value is empty name is unset, that is it will be removed. Calling set with no parameters lists all environment variables in the system. Examples: set name=here's my name set prompt= The values of variables can be used in batch files by enclosing the variable name in percent signs. %name% will expand to value and it can be used almost everywhere in a batch. A classic example of this is appending something to the path: set path=%path%;newDirectoryEntry Variables are good for short term, local storage and particularly useful in NT-based batch commands.

Each batch file has a special set of variables named %0 to %9. %1 through %9 correspond to the command-line parameters passed to the batch program and %0 is the relative path to the batch file being run or simply the base name if the batch is in the path. Commandline arguments (parameters) are separated by spaces and get their values as follows: programname arg1 arg2 arg3 ... arg9 arg10 ... argN All arguments that are not explicitly specified are empty by default.

The memory (space) allocated to the variables is limiting. You can set the size in autoexec.bat in DOS but in 9X and later Windows won't usually run out of storage space (the virtual doS machine is able to allocate more memory on the fly). Usually you'll be using only a couple of variables in batch scripts so memory is not a problem.

Batch-specific Commands

Here's an overview of the most useful DOS commands, that are generally speaking thought to be batch file specific. I'll call them batch commands, for clarity. For more information I recommend checking the help for each of these commands in turn.

Before we delve into the various batch commands, here's a pair of short batches that don't require any batch specific syntax, yet manage to be fairly useful. Here you'll see redirection and environment variables in action.

The following pair of programs are a poor man's substitute of the Unix locate command. build builds an index of files that are found in the given path containing wild cards and locate will do searches in the file database that was build with build.bat.

build.bat:

dir /s /b %1 >%temp%\dbindex.dat

locate.bat:

find "%1" %temp%\dbindex.dat

If the use of find and locate seems alien to you find /? and dir /? |more should set some light on this matter.

Now on to the commands.

The cls command will clear the screen. It doesn't take any arguments and is mainly used to clear the screen before displaying any longer output to the user.

Echo can be used to print messages on the screen. echo this is a test To echo empty lines you need the period character echo. By default MS-DOS will echo all command output in the batch script on the screen, which can be handy in

debugging situations. If you want to get rid of command-echoing, though, it's done with the following: @echo off The at sign is needed to supress the message "echo is off". Command-echoing can also be turned on at any time by typing echo on.

The pause command prints a generic prompt and waits for some user keyboard input (a bit like getting a one character in C and other languages). It can be useful if you want to be interactive, but in most batch scripts that are ment to run unattended, it's rarely used.

The more command is a bit like pause but it will read input and stop only when a screenful is printed. usually you'll be piping stuff to more like type text.txt |more. More is rarely used in batch programs unless you need to display multiple pages of output for the user.

The infamous goto command will jump to a specified label in the batch file and continue execution from there. goto foo would jump to a lable titled :foo. Label names always start with a colon and the first eight characters of a label name must be unique inside a batch file. Goto is mainly useful in breaking up batches into clear, managable parts and is of special interest when used with the if command. It also provides the basis for proper, finite looping when we get to NT-specific batch commands.

The if command does simple conditional processing. If the condition is true the following command is executed. Only one command may be specified after if and there's no easy way of specifying an else-clause but if does support not (if not conditions). Normally one would use the goto command after an if to workaround the single-command limitation. If has four forms which are: string equivalence (if a==b), file existence (if exist a), directory existence (if dir\null) and error level check (if errorlevel number). Error levels are mainly useful with the choice command, and getting the most out of string equivalence requires the use of environment variables. To check if a string, like a command-line parameter, is empty use: if stringx==x. It will evaluate to x==x (true) if the string is empty. You can include the keyword not in the condition, to reverse the sense of the test e.g. testing if something doesn't exist.

The shift command shifts command-line variables %0 to %9 around replacing %n with %n+1. %9 is replaced by the 10th command-line argument (it can be a real argument or the variable is empty if there's no argument). A classic example of using shift is when you want to iterate through all command-line

arguments without knowing before hand how many there will be:

argiter.bat:

rem this is a comment,@echo off:startif%1x==x goto endrem doStuff to %1, only a single commandshiftgoto start:endrem The above is a lable.

Call will call another batch file inside the one you are currently in and return execution when the called batch file terminates. If you use a batch file name without call, execution will not return to the caller afterwords! Calling a batch file is simple:: call dostuff.bat and it is also a great way to modularize and reuse batch code. note that user-specified command-line arguments and the scratch variables in a for loop won't show up in called sub-batch scripts, only in the main script. To workaround this you have to pass the command-line arguments or for variables to the batch script you are calling explicitely. Here's a simple example passing the first command-line argument as %1 and the current value of the scratch variable in for as %2. for %%a in (*) do call dostuff.bat %1 %%a The for command does seem cryptic and it is what we'll be covering next.

calling another batch file to do part of the processing is very close to calling a sub-routine or a function in a real programming language. Parameters are passed by reference in general, meaning that if you modify a parameter that you got from another batch file, the changes will show up in the original variable. You can even return values from batch files but the only way to do it is through a (global) variable that both batches will see. One good name candidate is %retval% standing for return value.

The for command will iterate through a set of filenames executing the specified single command for each file. It's a nice way of emulating wildcards (* and ?) for programs that don't support them natively (e.g. type in DOS). The syntax is:

for var in (set) do command var

Where var is %%variable case sensitive %a to %z (you must use %%a to %%z inside batch files, single percent signs will only do on the command-line), set is a set of files specified with wildcards (e.g. sta*.sq?) and command is the command to be carried out. Note that you don't have to use the scratch variable (e.g. %%a) as a parameter for the command after do, although you'd probably want to in most cases. The single command limitation can be broken by calling a batch file in the for loop. You can also nest for-commands inside each other although you'll have to use a unique variable name for each loop. Variables in the other loops can be used in the inner.

One classic example of using for is to make the MS-DOS type support wildcards:

for %%a in (%1) do type %%a |more Notice the %1 which is the first command-line parameter passed to the batch script (e.g. *.txt). Pay special attention to double percent signs which are strictly necessary in batch files, as well as piping the result to more.

Some amazingly simple things can be done with just the call and for commands alone. Here's a simple example that will sort the files in the current directory into sub-directories by letter (a-z) and will put the rest in a subdir called 0-9. Here's how you do it:

_indexby.bat:

@echo offmd %1for %%f in (%1*) do move "%%f" %1

index.bat:

@echo offcall _indexby.bat acall _indexby.bat bcall _indexby.bat ccall _indexby.bat dcall _indexby.bat ecall _indexby.bat fcall _indexby.bat gcall _indexby.bat hcall _indexby.bat icall _indexby.bat j

call _indexby.bat kcall _indexby.bat lcall _indexby.bat mcall _indexby.bat ncall _indexby.bat ocall _indexby.bat pcall _indexby.bat qcall _indexby.bat rcall _indexby.bat scall _indexby.bat tcall _indexby.bat ucall _indexby.bat vcall _indexby.bat wcall _indexby.bat xcall _indexby.bat ycall _indexby.bat zmd 0-9move * 0-9

Here the auxiliary script _indexby.bat does all the work. It simply creates a directory whose name is the first command-line argument and then moves all of the files that start with the very same argument string to the directory which it created. The main program calls this auxiliary batch with strings from a to z and moves all the files still remaining in the current directory to the directory named 0-9. You could also move sets of files in the auxiliary script, but move will complain if it doesn't find the files. Where as if nothing matches the condition in for, it will never run the loop not even once.

You don't have to worry about case in Windos or DOS. The policy in at least Windows is that case will be recorded but it's ignored as far as file names are concerned in practise. Thus a?.txt would match both a.txt and A.TXT (in caps).

Now that Windows has long file names, it is often necessary to enclose file names in double quotes to prevent mis-interpretation when filenames contain spaces. This is what I also did in the previous example.

There's also a command called choice which is used to get single character input from the user. It returns an error level depending on which key the user pressed. Although the command can be handy, I won't go into the details here for a number of reasons . Firstly, choice is not supported under Windows NT

and later nor before DOS 6 so it won't be of use in modern batch programming. Besides, the functionality of choice can be pretty much duplicated with the new set options in NT. Still, choice can be a nice command so if you've got DOS 6 or Win 9x type in choice /? to see the help screen. Choice is about the only command with which the if errorlevel number goto label style idiom can be useful.

Good Batch Programming Style

As the batch language is very simple, there aren't many coding style issued to be considered in general. However, here are some guidelines to help you make batch files easier to read and maintain. These apply to both DOS and Windows in general.

• Use environment variables as quick scratch variables as much as possible. This should always be prefered to hard-coding values because it's easier to change an environment variable reference in one place than it is to do a find and replace operation. This also keeps the code more readable.

• Always make @echo off the first line of finished batch files. Normally the user of the batch doesn't really want to know what's going on under the hood.

• Initialize all of your temporary variables at the beginning of the batch file and clear them at the end. :Start set _foo=0 set _bar=temp.dat

:end set _foo= set _bar=

This can be partially automated with the _init script demonstrated later on.

• As far as naming convensions go, prefix your temporary batch-specific variables with an underscore character to avoid name clashes. You might want to extend this practise to temporary text files that are created by batch files as well.

• Take full advantage of variables specified on the command-line and prefer that approach to asking stuff interactively whenever you can. This is handy in unattended batch sessions and when batchfiles call each other.

• Don't use short cryptic comments inside the file, rather document the batch in a text file with the same basename if you feel like docs are necessary. Also, if your batch has no docs with it, be sure to print some usage screen if no parameters ar given or if /? is specified.

• If it seems you are using the same couple of commands again and again by copy pasting, it's a good idea to move that part in another batch file and call it in your main file. I prefer to prefix my auxiliary batches with an underscore to signal that it's private data. Not private in the OOP sense but rather in the sense that it something the batch file has to deal with and the user shouldn't know or care about. Just like the variables that start with _ are temporary variables that the user of the batch shouldn't see.

• If you want even more control, you can use several evaluated labels inside a batch to build small function libraries. See the example regarding gosub, for more information. It turned out there's one even cleaner alternative, the call command can take a label name as the very first argument like this:

< CALL :LABEL ARGS

:LABEL rem some code goto :EOF

The label should be in the same file. The arguments themselves get stored in the batch file's command line arguments %1, %2 etc... The goto :EOF bit in stead of ending a file merely returns from a GOSUB style label. You need another goto :EOF to exit a file.

• Prefer passing parameters to other batch files as command-line parameters, in stead of putting them in a global variable that both of the batches will see (all environment variables apart from for-command "temporaries" and command-line arguments are global). This approach minimizes dependencies and encourages reuse.

• If you take advantage of the ability to put multiple statements after the if else or do clauses (available at least in Win XP), be sure to indent the commands in parenthesis with a fixed number of spaces, I prefer three. Don't use tabs as there size depends on the text editor.

• For easy access to commonly used batch files, make a directory such as c:\bats\ and include it in your path so that all of your batches can be used in any directory. The procedure varies, in DOS and 9x, it's about adding a directory entry in the path environment variable set in autoexec.bat. If there are multiple paths, they must be separated by semicolons. In NT and later the same can be achieved graphically through system properties (right click on My Computer, choose properties etc...).

• If you want to distribute batch files to others, make sure you ad sufficient error checking to make batch files more reliable (OK I didn't do this in these samples to keep them brief, I admit it). A classic example is a plain move batch that copies the source and then deletes it. If the copy fails the source wil still be deleted. Always make backups when you are processing real important files, though, no matter how tough the error checking code.

NT Specific Command Extensions

This section introduces the most useful additions to the MS-DOS commands that are used in Windows 2000, XP and later. To get these new so called command extensions working, you should start cmd.exe in stead of the old command.com. Typing in cmd in the run box does the trick. This section is mostly about the new options for the if, for, set and findstr commands. For full details on these, check there help screens. The help is fairly long and it even has some examples in it.

String Input, Integer Arithmetic and Looping

The enhanced set command is an essential tool in modern batch programming because it allows you to do some of the most important feats in script programming, sadly missing in the batch scripts in the DOS days, at least without auxiliary programs. These two important features are, namely, signed integer arithmetic and string input.

Let's start with the input, first. The /p switch for set allows you to read a line of

input from the user (or file with set /p variableName=promptString Variable name is the name of the environment variable to which you store the input and prompt, which may be omitted as well, specifies what text to display for the user. Normally, white space at the end of the prompt string is eaten, so if you want a space after the prompt you have to enclose the prompt in double quotes (luckyly the quotes don't show up in the output).

The syntax looks a bit quirky at first, notice that you are not, I stress, not assigning variable name the string prompt, although it sure does look like it.

Now that we can do string input, I'll show you how the famous "hello " programming example goes as a batch script. It will ask the users name and print it on the screen.

hiname.bat:

@echo offset /p _name=Enter your name:echo Hello %_name%!.set _name=

The above input statement would be like this if you don't want a prompt being displayed: set /p _name= br> Note that the equals sign is mandatory. Again, as far as the old set syntax is concerned, it looks as though you are unsetting name, where as in reality you are reading a value from the standard input.

The other important feature of set is the ability to perform arithmetic with signed integers. This feature is activated with the /a switch and the syntax is set /a variable=expression. Naturally, it evaluates expression and assigns the result to variable.

Firstly, this might not seem that revolutionary as you'd hardly want to code any larger, complex math programs with batch scripts. However, set can be used to increment counters and thus provides a way of getting out of a loop made with goto by testing the value of a variable on each iteration and jumping out if the variable matches with the given string. Here's one of the simplest examples, it will count from 0 to 9:

to9.bat:

@echo offset _number=0set _max=10:Startif %_number%==%_max% goto endecho %_number%set /a _number=_number + 1goto start

:endset _number=set _max=

Variable names may generally speaking be specified without the percent signs in math expressions. They will automatically be expanded to their value in math. However, if a variable contains an expression you wish to evaluate, you must still use the percent signs around the variable name.

In stead of using an explicite end label and goto-ing to it, you can use. goto :EOF Notice the explicite colon in there which is normally not a requirement in label names after the goto command. You don't have to specify a label named EOF at all. For those of you who are interested, EOF is programming jargon and very common in languages like C. It stands for end of file. Ends of files are marked by a special character. One popular option in DOS apps was ctrl+z (ASCII value 27) which is also the keypress you need to enter in files that are read from the keyboard input (standard input that is STDIN).

NOTE ON STRINGS

OK I wonder why this took me so long to discover but I've noticed that there's a serious bugg or omition in the string handling of batch files. Consider the equality comparison:

if %_a%==%_b%

Unlike in even the most primitive of proper programming languages, strings are always embedded directly in expressions and as far as the parser (computer) goes, are indistinguishable from the code itself. Let me demonstrate. Suppose that the variable a has the text not in it. One would expect the comparison to be

evaluated as:

"not"==%b

However, in batches the text not is just text so it gets interpreted differently as not==%_b%. AS the keyword not can't be put right here, it generates a syntax error at runtime. The worst thing is that this exceptional situation cannot be handled or trapped in any way. Thus it is impossible to handle quite a large set of strings in a batch file. This would be a security problem if batch files were secure to begin with.

But what if we just stick quotes around the variables when they are compared, like this:

if "%_a%"=="%_b%"

Though this hack seems better now, there's still the problem of the double quotes themselves. Merely inputting one double quote character breaks the script and there are no escape constructs as in programming languages in general. Lastly, delayed variable expansion might be one way around the issue and I ought to include it to make this tutorial complete. However, by the time i had found out there's such a thing, I had already lost most of my interest in batches and so I've dropped it. You can read up on the subject in the set and cmd help screens. Do let me know if you're able to device a solution.

But despite the flawed string handling, let's dive into integer arithmetic now. The expression evaluator in set supports a number of operators including the usual: + - ( ) * / % (integer remainder, actually two %% signs in batches, expression must be enclosed in double quotes for this to work). In case the modulo operator is new to you, here's how it works with some example input (minus signs at the left side of the percent sign affect the result by "negating" it but those at the right have no effect):

left % right = modulo1%2 = 17%9 = 78%8 = 010/3 = 17/4 = 3

You get the picture. The modulo operator is useful for getting a given range out of random numbers among other things.

Other than these basic math operations, there are more exotic ones such as C-language like bit shifts and bitwise logical operators. However, we won't be going through them here as they are rarely needed as you don't usually work with individual bits or binary data in batch programs.

Just like in C and Java, the following shorthands for + - * and / are supported: += -= *= /= %=. They require only a single "value" at the right hand side and have the following equivalents:

a += b is a = a + b a *= b is a = a * b and so on.

These constructions are nice if you know you'll be using the previous value of the variable being assigned to in your calculations. Thus the counter in the previous program could have been increased with: set /a %_number%+=1

Again I strongly advice you to refer to the help screen of set for all the detais on the various operators.

Although the arithmetic in set can be tested on the command line by simply typing set /a expression, here's a little calculator program that will evaluate user specified expressions and show there results on the screen until the user types in exit. Note that the user may refer to the previous result by typing _result as part of the expression. Also, the percent signs around _string are still necessary when the result is defined in the calculations, without them set doesn't seem to evaluate _string as an expression.

calculate.bat:

@echo offecho Type exit when done.:startset /p _string="> "if /i "%_string%"=="exit" goto endset /a _result=%_string%echo %_result%

goto start

:endecho Bye.echo.

set _string=set _result=

The Else clause, String and Numeric Comparisons

Although the changes are not as revolutionary as in set, the if command has also acquired some new features in Windows 2000 and XP. One of the handiest is the else-clause. Here's a simple example:

IF EXIST filename. ( del filename. echo deleted) ELSE ( echo filename. missing. echo try again. )

The parenthesis are strictly necessary,

Because of DOS legacy the parser is really picky about the syntax, the format: if condition ( do stuff ) else ( do something else ) helps to minimize errors and avoids most of the common pitfalls.

Although I don't fully understand it, it seems that the DOS-like variable expansion is stupid and tends to expand variables unexpectedly and prematurely. It considers for and if blocks single statements and does variable substitution all at once in one quick swoop, giving unexpected and eronious results. The help screen for set warns that the following won't work: set LIST= for %i in (*) do set LIST=%LIST% %i

echo %LIST%

One way to workaround this is to enable delayed environment variable expansion (see set /? for details). The catch is that this requires either modifying the registry or spawning another instance of the command interpreter having delayed variable expansion on (cmd /v:on /c "batname or command"). When you are using delayed variable expansion, the variables whose contents you want to be evaluated dynamically every time in a block, must be enclosed between exclamation marks, that is: !variable! in stead of %variable%. The above list example could be written as: set LIST= for %i in (*) do set LIST=!LIST! %i echo %LIST%

Provided that delayed variable expansion is enabled iehter in the registry or by spawning a new cmd.exe as described above. Generally speaking, I've found that the default-style immediate variable expansion works just fine in most cases. You ought to consider using dynamic variable expansion if your batch scripts are giving eronious and counter-intuitive results or alternatively when-ever you are using more than one statement in an if or for-clause.

Further more, you can now properly compare strings in textual form like this: if left comparison right where left and right are strings or environment variables and comparison is one of the following EQU (equal), NEQ (not equal), LSS (less), GTR (greater), LEG (less or equal) or GEQ (greater or equal). comparisons are case sensitive by default but you may change this with the /i switch right after the command if.

The following example program keeps asking for passwords until they are equal. We'll be covering both string input and comparisons here, also notice the use of parenthesis:

password.bat:

@echo offset _original=aset _retry=b

:start

set /p _original="Type in your new password: "set /p _retry="Retype the password, please: "if %_original% equ %_retry% ( echo password changed succesfully. goto end) else ( echo The passwords don't match, please try again. echo. goto start)

:endset _original=set _retry=

Another useful property of the string comparison operators is that if the strings are numbers, they are compared as such and not as ordinary strings. To demonstrate this concept, here's a program that will print all integer powers of the first command-line argument until the exponent equals the second argument:

power.bat:

@echo offrem checking inputif %1x==x goto endif %2x==x goto end

rem variable initializationset /a _base = %1set /a _power = 0set /a _destination = %2set /a _result = 1

echo BASE POWER RESULT

rem the main loop:Startif %_power%==%_destination% goto endif %_result% LSS 0 goto overflow

echo %_base% %_power% %_result%set /a _result = _result * _baseset /a _power += 1goto start

rem In case the result overflows:overflowecho.echo Too large a value. Exiting....rem cleaning up.

:endset _base=set _power=set _result=set _destination=f

There are a number of key points I'd like to emphasize here. Firstly, pay attention to the easy way of determining if a command-line argument is not given by comparing it and x against x. If the argument is empty the condition is x==x which is always true. This way is also portable, also working in Win 9x and DOS.

Unlike in many of the other batches in which error checking was omitted completely for brevity, here we do check that the user suplies two command-line arguments to prevent an infinite loop.

In the strings passed to the echo statements a run of three spaces should be replaced by a real tab character (press tab in your text editor) so DOS will print the output in tabulated form.

We also use the LSS operator for arithmetic comparisons to prevent an overflow. An overflow is a situation in which the maximum range for some datatype is reached and the value wraps around to the minimum value of that datatype. The reason for this is deep in how numbers are represented in binary and we won't cover it in here. An extra bit: the opposite of overflow is called an underflow.

The Indispensable For Command

The for command has got some extremely useful additions to it that make it almost as worthwhile as set is, if not even better. Just like the if command does nowadays, for wil also accept more than one command if you use parenthesis after the keyword do. Although this capability comes in handy, it is sometimes preferable to use the call command and break up the action in several batch files. Remember the index builder example?

If you need to pass on all command-line arguments passed to a batch file, e.g. in a for-loop, there exists a special shortcut notation. In stead of saying %1 %2 .. %n-1 %n where n is the number of command-line arguments passed to a batch file, you can say %* which does exactly the same thing. The asterisk here means every element (except %0, which is batch specific anyway).

It is now quite possible to process not only the files in the current directory but also (recursivly) all the other files in the sub-directories matching the criteria you specified. You do this with the new /r switch, otherwise the syntax is unchanged.

for /r %%a in (c:\*.bak *.tmp *.) do del "%%a" Would delete all of your temporary files (ending in .bak or .tmp) in all of the directories of drive c. Remember that if you wish to do the same thing on the command-line, you must use only one percent sign. Similarly the switch /d can be used to process directory names rather than file names as usual. Another method to achieve the same thing would be to use the for command to process the output of dir with the /a switch. The /a switch let's you list files matching the given file attributes only specific types of files. The type d is a directory and minus d means the inverse i.e. a file.

In addition to iterating through a set of files or directories, for has several other tricks up it's sleeve. One of them is the /l swich which, in stead of going through a set of file names, goes through a set of values. The syntax is for /l %%a in (beginning step end) do command It's syntactically close to the previous for statements apart from the begin step end bit. Begin and end define an inclusive range of signed values (either or both may be negative) and step, which may also be negative, specifies a value which is added to begin in each iteration until end is reached. Perhaps a few examples will make things more clear: for /l %%a in (0 2 100) do echo %%a

Prints all, even whole numbers between one and one hundred. Similarly, for /l %%a in (1 2 100) do echo %%a Would print all odd ones (99 being last). Where as: for /l %%a in (10 -3 -10) do echo %%a Would sweep through the range 10 to - 10 in "decrements" of -3. for /l %a in (0 1 7) do @for /l %b in (0 1 7) do @echo %a%b Is a nested loop. It prints the numbers 00 to 77 that is two-digit octal numbers.

For has enough sense to reject potential over or underflow cases. Things like: for /l %%a in (1 -1 2) do echo %%a Will be rejected off the bat so nothing is echoed by echo %%a.

In addition, I just discovered that you may give the range or the step size in hexadecimal by prefixing the value with 0x. This is an undocumented feature. Octal will also probably work, though I haven't tried.

I haven't found any real use for this ability to generate increasing or decreasing sequences of numbers, however, here are some potential uses:

• The /l switch could be used like the conventional for-loop in other programming languages. However, I find manually increasing counters with set /a is both more flexible and easier in the long run. In fact. I'm not even sure if for commands can be nested, that is, if it is valid for the command part of for to contain another for command.

• Secondly, the counter could be used to generate a large number of variables. Although it would eat lots of memory, there's not that much for environment variables, this could be used to generate arrays as follows: for /l %%a in (0 1 100) do set _%%a=0 Creates 100 new variables titled _0 to _100 which can then be used like array indeces among other things. This array emulation does work in principle but accessing the elements is harder than one might think. Not to mention that the speed is terrible. For a working example program about arrays, see the end of this tutorial.

One of the most useful switches of the for command is the /f switch. It will process a file or the output of a command breaking it up into lines. Then it will break up each line into tokens as specified by the user and allocate the bits it tokenized into scratch variables. In most cases the syntax is

for /f "options" %%a in (fileSet) do command %%a %%b %%c ... The %%a and do command parts should already be familiar from other examples. Options is a double-quoted string of options that will determine how lines are tokenized and which tokens are put into scratch variables. The format is: "option[=attribute] option[=attribute] ..." Here are some options you'll probably use very often: eol=character If this character is encountered, ignore the rest of this line and move onto the next one. skip=number Number of lines to skip from the beginning. delims=characters each of these characters are taken as token delimiters, in other words, when any of these is encountered, it ends one token and starts another. These are tab and space by default. I recommend specifying the delimiters last as this avoids unambiguous bits if you want to include space in the set of token delimiters (space is also the delimiter of the option=attribute pairs in the option string). tokens=numbers Specifies which of the found tokens, per line, are to be allocated their own scratch variable names. Tokens 2,4 would only take the 2nd and 4th token and 1-5 would take the first five. The last character may be a star (*) and is taken as the rest (non-tokenized remains) of this line. usebackq If this option is present, a string enclosed in `` characters, found after the "in (" part, is executed as a command and the output of that command is tokenized.

Notice that the string in back quotes must be a simple command, ordinary piping, for example, is not allowed and is treated as a syntax error. However, if you escape the pipe symbol with an up arrow "^|" in stead of just '|', it does work. However, you still cannot use other redirection characters in there like < > and >>. Finally, piping stuff in the do part of a for loop is quite possible, too.

Don't be troubled if this /f swich seems cryptic at first. It's very hard to explain clearly without resorting to some examples. So a number of them will follow.

Firstly, here's a heavy one that will run the help command extracting all of the command-names from the listing. command-names are strings that are at the very beginning of the line and are delimited by spaces. Then it will run all of the commands listed in help with the /? option, appending their help screens together. In the process, lines that have only the word "command:" in them will be added to make finding the start of the next command's help screen easier.

The listing generated by these two batch files is a very nice and clear command reference. Certainly a lot more convenient than hunting around the MS Website or browsing the help screens separately.

gethelp.bat:

@echo offset _filename=help.datset _filename2=commands.dathelp |find /v "command-name" |findstr /r /v "^[^A-Z]" >>%_filename%for /f "tokens=1 delims= " %%a in (%_filename%) do call _helper.bat %%adel %_filename%set _filename=set _filename2=

_helper.bat:

@echo offecho. >>%_filename2%echo command: >>%_filename2%%1 /? >>%_filename2%

Phew, that seemed cryptic, didn't it? Let's go through the hardest parts separately: help |find /v "command-name" |findstr /r /v "^[^A-Z]" >>%_filename% Is a complex example of piping and redirection. It runs the help command, passing the output to the find command who, in turn, passes it to findstr. Finally, the output of findstr is appended in the file denoted by the variable %_filename%. The /v option tells find not to show the lines that contain the word command-name. This trick is necessary to get rid of the first line of the help command which reads: For more information on a specific command, type HELP command-name This is a safe operation because this is the only line containing that word.

The findstr command is not part of the DOS heritage but rather added in Windows 2000 I think. It's ke find except much more powerful. The /r switch in particular tells that in stead of looking for a single string, look for any one of a set of strings that are specified by a regular expression. Unless you are a programmer or have been working with Unix, you probably won't know what a regular expression is (basically a heavily boosted version of a DOS wildcard expression). The topic is fairly complex and I've just gotten started in it, so this

is totally out of the scope of this tutorial. For more information, see this regular expression tutorial .

To cut a long story short "^[^A-Z]" means a set of strings that have a non-letter character at the very beginning of the line. The /v switch tells findstr to display all but the lines that match so the end result is that only the lines having a command-name at the very beginning of the line are printed. Without this step, longer command descriptions that wrap to the next line and are indented with spaces, would also be processed.

After exhaustive filtering for /f "tokens=1 delims= " %%a in (%_filename%) do call _helper.bat %%a extracts the command-names from each line in %_filename%. Only the first token is extracted and tokens are delimited by spaces. In other words, only the first word on a line, that is the command name, is extracted and assigned to the variable name %%a. The variable %%a is then passed to _helper bat which will append the text: command: to the file as well as running commandname /? and appending the output to the file commands.dat. Although some of the boosted DOS commands for Windows will normally pause between screenfuls of output, they are smart enough not to pause if run in a batch file.

If other tokens are extracted by specifying a different set of tokens, then additional variables like %%b %%c and so on are allocated for the selected tokens, respectively. after %%z comes %%A I think and %%Z is the absolute last scratch variable that is token number 52.

As another, actually a lot simpler, example of tokenization consider a genral purpose environment variable cleaner. Knowing that all of our temporary batch file variables start with an underscore makes it relatively painless to device a general purpose temporary variable cleaner that can then be called at the end of most batch files. Here's how the script looks:

_cleanup.bat:

@echo offfor /f "usebackq tokens=1-2 delims==" %%a in (`set _`) do set %%a=

We'll take the first two tokens delimited by the equals sign. The usebackq option specifies in this case that we are dealing with command output, namely

http://etext.lib.virginia.edu/helpsheets/regex.html

http://etext.lib.virginia.edu/helpsheets/regex.html

the output of the set command.

One addition to the set command which wasn't mentioned earlier is that passing it an ordinary string will list all environment variables that start with that string. Thus we list everything starting with an underscore and for each line call the set command again passing it the first, extracted token (the variable name before the equals sign) and an another equals sign. Here's how we've effectively unset all variables starting with an underscore. Notice that we didn't actually use the second token anywhere so we might just as well have specified tokens=1 or omitted the tokens part because the default is only the first token.

Finally, if you are processing file names, the for command does support new variable substitution options. These options will only work for for scratch variables and arguments from percent 1 to percent 9. They will not work for ordinary environment variables. The syntax is %%~a Where %%a makes up the variable name and letters can be, among others, any of the following file properties (see for /? for help): f: the full path. d: drive letter p: path n: basename x: extension s: shortname a: attributes t: date and time z: size

Note that only including the tilde character (e.g. %~a) without any of these modifiers removes quotes around a file name string, which can come in quite handy.

Here's an example program that processes a set of files and adds sequential numbering to each file's base name:

number.bat:

@echo offset _counter=0for %%a in (%1) do call _rencount.bat %%acall _cleanup.bat

_rencount.bat:

@echo offfor %%a in (%1) do ren %%a %%~na%_counter%%%~xaset /a _counter+=1

Number bat simply iterates through the user-specified set of file names calling _rencount.bat and passing each file name to it as an argument. Notice the call to _cleanup at the end (see the previous example).

_Rencount takes the file's base name, adds the value of the _counter to it and appends the extension. Then it simply increments the counter.

The tricky part is: for %%a in (%1) do ren %%a %%~na%_counter%%%~xa As %1 is actually a single file name, the "inner" for-loop is run only once per file. Notice that environment variable names need percent signs around them where as the for scratch variables start with %% then a ~ and any of the attribute letters (n for base name, x for extension) and finally the letter identifier, in this case a. And to confuse matters more, command-line arguments are prefixed by only a single percent sign followed by a number. Thus: %%a: for scratch variable %%~na: base name of scratch variable %1: command-line argument (passed in call) %_counter%: ordinary environment variable.

There are pit falls when combining several file attribute tokens in printed output. To print the name, extension and size in bytes of each file in the current directory separated by spaces, you must use the following line (substitute % with %% if using this in a batch file): for %a in (*) do @echo %~na %~xa %~za It is likely you would have initially tried something like this: for %a in (*) do echo %~nxza

Here are the differences. Although %~nxza specifies explicitly the fields name, extension and size in this order, the output is quite different. Regardless of the order, a preset and undocumented field order is used. In the above example, the order is first the size (and a tab), followed by the file name and extension (the period is also taken to be part of the extension). In the first example we separated the arguments for echo by spaces and repeated the argument name

%a for each field. This way it is possible to always guarantee a certain desired order. Secondly, notice that, as we are running this in the command line, the first form includes an at sign before the echo statement. The at can also be used outside batches and without it all of the echo comamnds run by the for loop would also be shown.

If all of the fields are specified without repeating the environment variable for each field the full preset order and format is:

• Attributes (including NTFS-specific): a dash for an unset attribute and a respective letter for each set attribute like in the attrib command (r for read only. h for hidden etc...)

• Date and time: Most likely (if not always) in the format dd.mm.yyyy hh:mm

• Drive letter: In upper case folllowed by a colon.

• The path: Without the file name and starting and ending with a back slash. The full path option is different, it is equivalent to drive path base extension. If both the full path and the path is specified or if other path related fields are requested, the full path option will work exactly like the path option (the path option is ~p where as the full path is ~f).

• File size: In bytes, not rounded to the nearest cluster size and without the thousand separator or any unit indicator. base name: The file name without the last period and everything followed by it. Extension: The sub-string starting from the last period of the file name till the end.

More String Processing and Some Magic Variables

There's yet another way of processing environment variables that, although listed as part of set, is not specific to the set command. The form: %varname:expression% will snip or replace parts of a variable's value. %varname:first=second% will replace the substring first, if found in the value of the variable name, with second. Similarly, %varname:first=% would delete all occurances of first.

You can also extract sub-strings: %varname:~2% (notice the colon before the tilde) Would include the third character of the variable value and everything beyond that till the end of the value string (character indeces are counted from zero). %varname:~2,4% Would extract characters 3 to 7 (position 3, length 4). %varname:~0,-2% Would extract all but the last two characters. Where as: %varname:~-2% would extract only the last two.

To show how string substitution works in practise, here's a batch file that will convert spaces in file names to underscores (_). This might be handy if preparing files for the WWW, for instance.

nospaces.bat:

@echo offset _file=afor %%a in ("* *") do call _killspace.bat "%%a"set _file=

_killspace.bat:

@echo offset _file=%1ren %1 %_file: =_%

This is pretty simple in comparison to what you've gone through previously, and hopefully won't need much explaining. Notice that substituting ren %1 %_file: =_% to with %1 %_file: =% alters the behavior so that in stead of substituting spaces with underscores, the spaces are removed.

The most apparent limitation regarding strings in batch files is the lack of an indexing operator or command. There's no easy way of iterating through or picking each character in a string and doing something to it, nor is there any easey way of getting the length of a string, as ffar as I know. One way to indirectly calculate the length of a string, though, is to copy the contents to a file via output redirection and determine the length of the string by the file size. Usually the size in bytes is the same as the size in characters minus three (carriage return and line feed if using the echo command and then the end of

file character). Each line in DOS and Windows ends in two nonprintable ASCII characters as in a teletype, the carriage return for getting at the beginning of the line and the line feed for scrolling down the paper or screen if you will. Because of the overhead related to actually reading and writing files on disk, this is many times slower than in programming languages supporting strings properly.

Another serious limitation worhty of mention is that you cannot nest environment variables inside of each other. This means that if attempting to specify sub-string indeces or replacements using an environment variable, the batch interpreter will not do what you might expect. It would be really cool if there was a workaround for this. To demonstrate with a snippet:

set _string=%1set _index=%2set _find=%3set _replace=%4set _result=%_string:~-%_index%%echo %_result% Last %_index% characters of %_string%set _result=%_string:%_find%=%_replace%%echo %_result% with each %_find% replaced by %_replace% in %_string%

Finally, there's one additional set of magic variables originally documented as part of set but actually available almost everywhere. These variables should always be usable in batch files: %CD%: current directory. %DATE%: Date in the same format as given by the command. %TIME%: Time in the same format as given by the command. %RANDOM%: A pseudo-random, positive 15 bit integer. %ERRORLEVEL%: The most current error level (return value) from an application.

It is possible to assign to some of these variables. You can assign to errorlevel to signal a return value from a batch file being called, for instance. However, there's only one errorlevel variable so you might get odd results if you aren't careful enough. It is also a good idea to reset the error-level to zero in the last bat-file which usually signals succesful program execution (no errors).

Bigger Example Programs

To wrap up this tutorial, here are some slightly larger example programs to get you started. In addition to demonstrating batch programming, some new concepts are covered on the way.

Emulating Gosub

Unless you are familiar with some dialect of basic, the term gosub may seem alien. It is basically an enhanced version of goto which remembers the label from which you started so you can go back to where you left. It can even remember more than one label. A gosub is not nearly as useful as true functions in a programming language but it is the next step up from the goto command. The example that follows is actually redundant. I was informed lately that batches do support a gosub syntax. See the section about batch programming style.

So to implement a gosub we need a datastructure which let's you put in several labels, pieces of text, and retrieve the labels in the reverse order. That is the last label you put in is the first to come out, the second last is the second first to come out and so on. This is known as a last in first out order, LIFO, and the name for such a datastructure is a stack. The operations of putting in data and retrieving it are called push and pop respectively.

From the implementation point of view, we cannot use an array as it is not natively supported (but see the third example). In stead, we can put the pieces of text in a single environment variable and delimit them with semicolons just like the folder locations in the path environment variable (type in set path to see it). In order to make the stack more generally useful, we can also put in several related functions inside a single file. Here's how the stack code would look:

_stack.bat:@echo offgoto %1

:pushset _stack=%2;%_stack%goto :EOF

:pop

for /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set _retval=%%afor /f "usebackq tokens=2* delims=;=" %%a in (`set _stack`) do set _stack=%%bgoto :EOF

:peekfor /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set _retval=%%agoto :EOF

:clearset _stack=goto :EOF

The variable _stack holds the stack contents and persists as long as it has not been unset. The file can hold several independent functions because the first argument passed to it is the name of the function to call, that is the label to which we go to. Each label in turn goes to the end of the file as soon as it has done its job. This way the batch file name _stack serves as our little name space or module and each label corresponds to a function or procedure. The mechanism of passing back values is through a global variable called _retval short for return value.

As to handling the stack the push command appends the second argument in front of the current stack contents using the set command: set _stack=%2;%_stack% So the stack grows on the right and new values are entered on the left near the equals sign. Each value also ends in a semi colon to mark where the next one begins. This code will only work assuming our data has no semicolons but its a fair bet. And we could change the separator or even let the user define it. Poping the data is about filtering the output of the set command so that only things near the first equals sign and the first semicolon are assigned to _retval. Here's how that would look: for /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set _retval=%%a Then we'll need to remove the bit we just saved. The easiest option is to use another for, using the asterisk token to grab the rest of the line and assign it as the new stack contents as follows: for /f "usebackq tokens=2* delims=;=" %%a in (`set _stack`) do set

_stack=%%b note that as a special case when there's only one element left, there's nothing after its semicolon, so the stack gets automatically unset as a side effect, neat. Finally, the command peek is like pop Without changing the stack and clear simply unsets the _stack variable. It

The order in which you push and pop stuff in the stack is important. That is whether you'll read the list of values on the left or right side and whether it expands to left or right when you add to it. In case of our stack we add to and read the left side so the most recently added thing is the one read first. But merely adding on the right in stead, is enough to turn the datastructure into a queue. A queue is a data structure where the thing you put in first is read out last. The changes you would need to make it a queue in stead, were: set _stack=%_stack%;%2 rather than set _stack=%2;%_stack%

In the real world the terminology for a queue is unshift rather than push, and shift rather than pop, but that's beside the point. By the way, if the term shift rings a bell, you can think of the command-line parameters passed to a batch file as a queue.

As the stack is a relatively complex beast, it would be a good idea to have some code for testing and understanding its inner-workings. Here's a test script which you can run to see how the stack behaves:

test_stack.bat:@echo offcall _stack clearecho empty stack:set _stackcall _stack push 1call _stack push 3call _stack push 5call _stack push 6echo Stack after pushing 4 values:set _stackcall _stack peekecho Top item is:set _retval

call _stack popecho Popped the top which is:set _retvalecho The stack is now:set _stackcall _stack popecho After another pop the stack is:set _stackecho and the value popped wasset _retvalcall _stack popcall _stack popecho After two more pops the stack is:set _stackecho And the last value popped:set _retvalecho Push yet another item:call _stack push lastecho Which grows the stack to:set _stackecho But now we clear it:call _stack clearset _stack

There are several limitations in this stack. The most significant of these which I have not mentioned earlier, is that there can be only one stack in existence as the name of the stack variable is hard-coded. You might be able to have the user specify it on the command-line, too, but I've left it out for simplisity. Not hard-coding the stack would add the ability to create more complex data structures, which would be groups of simple variables that the user need not know about. The syntax of batchname, datastructureName, function, parameters does bring object-oriented programming and abstract datatypes in mind. Though truth be told the similarity is mostly superfluous, there's nothing even remotely object-oriented in batch programming.

But what's all this got to do with gosub. Well, gosub is simply a stack of labels or line numbers. When you go to something you make a label just before that point, and push it on the stack. Then when you need to go back you pop the top most label and go to it. Of course you might equally well hard code the same label name twice without using the stack but it would not scale as cleanly

and you wouldn't have a genrally usable stack then. Still, it is apparent the batch syntax doesn't support gosub as smoothly as even basic languages do. My understanding is that in basic a gosub is a stack of line numbers and the interpreter does the book keeping, pushing and poping for you without having to code manual labels like this. Poping a return label from the stack is similar to how functions are implemented in many programming languages. In stead of labels you have addresses of machine code in memory to which the program counter is set, but other than that it mostly works the same, ignoring local variables and pass by value here.

Random Lines

This second script will use the random number generator to append arbitrary lines from a set of user-specified files. It could be used to build a funny random word generator (the funniest I've seen so far is the random tech phrase generator). This batch script is stretching the limits of find and set a bit but it's not that long after all.

randlines.bat:

@echo offset _filename=index.datset _current=0set _output=set _linecount=1

:Startif %1x==x goto endcall _rndline.bat %1shiftgoto start

:endecho Output: %_output%del %_filename%call _cleanup.bat

_rndline.bat:

@echo off

for /f "usebackq tokens=2 delims=:" %%a in (`find /c /v "" %1`) do set _linecount=%%aset /a _current="%random% %% %_linecount% + 1"find /n /v "" %1 >%_filename%for /f "usebackq delims=] tokens=2" %%a in (`findstr /r "\[%_current%]" %_filename%`) do set _output=%_output% %%a

The main program initializes some variables and then calls rndline.bat for each output iteration. Finally, it prints the concatenated output generated by _rndline and cleans up whene all the arguments have been processed.

The second batch file, _rndline.bat is worthy of some more detailed scrutiny, though,

for /f "usebackq tokens=2 delims=:" %%a in (`find /c /v "" %1`) do set _linecount=%%a Is a clumsy way of getting the line count of a file. By looking for lines that don't contain the empty string, we get all of the lines in the file. Then we just tell find to count those lines and rip the 2nd token (the first that has a colon on the left).This is because the format of find /c is: ---------- filename: linecount

Next comes set /a _current="%random% %% %_linecount% + 1" which is a tricky bit of arithmetic. Notice the use of the modulo operator to limit the range of the random output values. It seems that the percent signs around random are mandatory, contrary to other, ordinary variables. I didn't get this example working without using two percent signs for the modulo operator, either. And the help for the set command states that double quotes must be used around the expression if the modulo operator is to be used. These unexpected limitations are unfortunate but can be worked-around.

Then, find /n /v "" %1 >%_filename% is used to make a numbered index of the lines in this file. The n option pre-pends bracketed line numbers, starting from one, before the found character strings. as there's only one greater than sign the file to which the output is directed is overwritten on every invocation of his auxiliary batch file.

Finally, for /f "usebackq delims=] tokens=2" %%a in (`findstr /r "\[%_current%\]" %_filename%`) do set _output=%_output% %%a finds the line having the chosen random number in the index file. Again some heavy use

of findstr, although the same could likely be achieved with find alone. We need to use for for tokenization to display only the matching string without it's bracketed number prefix. Notice that we are concatenating the result to the previous output to form a longer string.

Array Emulation and Sorting

As another, optional extra consider dealing with a number of related variables in the form of an array. Handling an array in a batch file is so slow and impractical that this example is more like a proof of concept. You have been warned. Given an array of random numbers, the goal is to sort and print those numbers from smallest to largest. In programming an array is a sequential collection of similar elements (of the same type e.g. integer). If we know the size of each element and the number of the element we want, it is easy, in some programming languages, to jump directly to the specified address in memory and get at the element number, array index, desired. However, in batch files the best you can do to my knowledge is to create a group of variables that have a common base name followed by a number that can be used as an array index. I call this technique array emulation as batches have no direct, built-in support for arrays. To make this discussion more concrete, here's the sample code followed by a rather longwinded explanation of what's going on:

_index.bat:

rem @echo offfor /f "usebackq tokens=2 delims==" %%a in (`set %1%2 ^|find "%1%2="`) do set %3=%%a

array.bat:

@echo offset _i=0set _max=%1set _retlabel=swapped1

:randomizeif %_i%==%_max% goto listbuiltset _item%_i%=%random%set /a _i+=1goto randomize

:listbuiltecho The %_max% unsorted elements are:@echo onset _item@echo off

set /a _last=%_max%-1

:shortenif %_last%==0 goto sorted

set _i=0

:swappingif %_i%==%_last% goto afterswapset /a _elem0index=%_i%set /a _elem1index=%_i%+1call _index.bat _item %_elem0index% _elem0call _index.bat _item %_elem1index% _elem1echo Comparing %_elem0% and %_elem1% at index %_i%.if %_elem0% gtr %_elem1% goto swap:swapped1set /a _i+=1goto swapping

:afterswapset /a _last-=1echo %_last% partially sorted elements left.goto shorten

:sortedecho And after sorting:set _item@echo off

goto end

:swap

set _temp=%_elem1%set _elem1=%_elem0%set _elem0=%_temp%set _item%_elem0index%=%_elem0%set _item%_elem1index%=%_elem1%echo Swapped. New order is %_elem0% and %_elem1%goto %_retlabel%

:endcall _cleanup.bat

The first thing we'll consider is representing the array. In this example it is a collection of environment variables from _item0 up to the maximum specified by the first command-lien argument minus one. So if you wanted an array of ten elements you would have variables _item0 to _item9 respectively. For the sorting we'd like to make these variables random. The variable random produces such numbers easily enough but the obvious choice of using a for /l loop and set for generating the numbers would fail. The reason has to do with immediate variable expansion namely that the variable random is evaluated only at the very beginning of the for command and thus all items get the very same random number. One fix is using another batch to generate the numbres but I've chosen a different approach relying on set, if and labels as expected. See the code associated with the label randomize for details.

When it comes to printing the produced numbers, I took the lazy route and merely used set to display all variables beginning with _item. Though easy, this approach will fail if some other variables start with the same name and you don't have too much control over the display format, either. The code is below the label listbuilt.

before we consider the sorting, there's the problem of accessing the newly created array elements. Unfortunately, I have not found a way that would let you get the element number specified by a variable, directly. Your first attempt might be something like this:

echo %_item%%_i%

In stead of getting the value of the variable, it prints the expression that would, if re-evaluated, produce the desired value. Even if you stored this expression in

yet another variable and tried running it as part of a batch file, it will not, oddly enough, work as expected. Also neither will the command line arguments nor for loop temporaries interpolate inside other environment variables, I've tried that, too. I've been told delayed variable expansion, which I don't use in this tutorial, may provide a solution: !_array%_index%!

The best solution I can offer so far is to resort to an external batch file again. Please let me know if you know of a better or faster way to index an emulated array. The auxiliary batch _index.bat does the job given the base name of an array, the index of the desired array element (starting from 0) and the name of the variable to which the array value should be copied. The code is the longest line in the example and it uses the output of set parsed through a for command. Yet again set fetches a list of all variables starting with the combination of the base name for the array and its index. I say the beginning because there can be many such variables. For example, if you called the batch as: call _index.bat _item 4 _fifth You would be requesting the 5th element, remember we count from 0, of the array named item and the result should appear in _fifth. However, if there are over 50 elements, _item50 also begins with _item5 and is listed in the set output. In the script the output of set is parsed through for to get at the variable value at the right side of the first equals sign. As all lines are processed, however, the last line and so the largest matching array element, sets the final variable value. Conclusion, multiple lines in set output must be eliminated and this is accomplished by piping the set output through find to also include the equals sign right after the variable name. In the above example only the set line matching _item5 followed by equals will be printed and so we have guaranteed that the correct array index is assigned to _fifth.

One common programming error in arrays is trying to get at array elements that do not exist. This is referred to as a buffer overrun as you run over the end of an array. Using my array indexing script this error is checked in set and you get a warning about a non-existant variable. To make this a more fatal error, you could check the error level of set which is non-zero if something went wrong. I've omited this check for brevity.

Now that we can access the array elements, the last problem is sorting the array. The algorithm used here is a Bubble Sort which is one of the simplest sorting methods out there. The basic idea is going through the array multiple times comparing successive elements e.g. element numbers 0 and 1, 1 and 2, 2

http://en.wikipedia.org/wiki/Bubble_sort

and 3 and so on. In the script the variable i is used for this purpose. If the elements are in the wrong order, the first (element0) is greater than the second (element1) if sorting ascendingly, then the two elements are swapped. Once the end of the array is reached, we know that the largest element has bubbled up to the end of the array.

Say we find an element in the middle of the array which we know to be the largest. We compare it with the next one and discover that the element is larger, and we should swap the two. After this we examine the next element, so the item that was the second one before swapping is the first one in our next comparison. This is why a single element can move more than one position during our pass through the list. This bubbling thing, if you will, made bubble sort hard for me to understand initially.

Once we have gone through the whole list, the largest element is at the very end and so we know we won't need to check that element again (the variable _last tracks this in the code). But we do need to go through the list again to make sure everything is in order (pun intended). After the second pass we have the second biggest item in place. This multi-pass approach continues until we have sorted the whole list. For more info and better explanations, Google or check the Wiki article to which I linkd earlier.

As far as the code itself is concerned, the swap procedure is worth checking out. To exchange two values, regardless of language, you need to save away one of the two (elem1), assign to the original the other variable (elem0) and lastly assign the saved value (temp) to the other variable (elem0) itself. In code this is expressed as:

set _temp=%_elem1%set _elem1=%_elem0%set _elem0=%_temp%

Despite the index numbers the two elements are not really treated as an array. Though conceptually, you could think of them as a 2-element array. To further confuse matters the changes are not made in the array directly but the swapping procedure, the label swap, works with copies of the elements namely element0 and element1 because it is easier syntactically. However, near the end of the label the changes are commited to the array directly to save the caller from doing that.

Though the program is not exactly modular, you could imagine a situation in

which one might need to go to the swap label twice. Thus, a question arises, how do you get back to where you left off? If the swap label had no goto, execution would fall through to end which is obviously undesirable. To remedy this we could add a goto to a fixed label as the last thing under the swap label. This does work but execution after the swap label will always return to the same point no matter where you started from. As a more general solution, I've used a variable called _retlabel which the caller can set to tell the swap to which label it should return. This is analogous to pushing the return address on the functions' stack when you call a function in a programming language. Without that return address the computer would forget where the function was called. This retlabel is still a bit clumsy and a solution including nested gotos soon gets very ugly, oh well.

Before I let you go, you might be wondering what the performance of this array emulation and sorting might be. I ran a benchmark and was shocked to find out that on a 1.8 GHz mobile Pentium with a gig of RAM sorting fifteen items took twenty seconds. This is real depressing and obviously with this mode of array emulation, working with larger data sets makes little sense. I next tried removing the debug prints (echo commands) but it had no effect on the performance. I reckon having to call, find, set and for for each array indexing operation takes its toll. As does managing numerous environment variables and doing all the swaps. The sorting method matters very little at this point as a bubble sort should be good up to a few thousand items. For those of you who are interested, the time it takes to run a bubble sort, regardless of language, is proportional to the square of the number of elements being sorted.

I started this tutorial by mentioning that I tend to use Perl for programs like this. As a teaser here's how you might write the above program, though the sorting algorithm is pre-supplied, which I've been going on about for pages, in Perl:

Perl code:

use strict;my $max = 100; # Maximum number.my $count = shift; # How many items.my @numbers; # The array of numbers.push @numbers, int($max * rand) for(1 .. $count); # Generate the numbers.print "$count unsorted elements: @numbers\n";@numbers = sort @numbers;

print "And afterwords: @numbers\n";

Or if you are a lazy typer:

push @nums, int(100 * rand) for(1 .. $ARGV[0]);print scalar(@nums) . " unsorted: @nums\n", "afterwords: ", join ' ', sort @nums;

As far as performance goes, Perl can randomize and sort ten million items in less than a minute, even though it is interpreted. Though the method here is merge sort which beats bubble sort quite easily. See the end of this page for a Perl tutorial.

Epilog

Now that this tutorial is drawing to a close, I'd like to wish you good luck with your own batch files. I truely hope this information was enough to get you started with both the more advanced DOS stuff and Windows NT specific additions. Lastly, it would be nice to hear what you think of this tutorial. Drop me a line in case of any questions, additions, corrections and so on.

Thanks to Petteri Järvinen for writing the most excellent DOS book in Finnish called PC käsikirja DOS 6.22. Also thanks to MS for writing a relatively friendly disk operating system and taking the time to improve it's shell in recent years.

Related Links

A Perl Tutorial in a Similar Style to This One

If you have any questions, comments or suggestions Drop Me a Line here

Back to Programming Back to Main Page

http://www.student.oulu.fi/~vtatila/index.html

http://www.student.oulu.fi/~vtatila/programming.html

http://www.student.oulu.fi/~vtatila/mail.html

http://www.student.oulu.fi/~vtatila/perltut.html

batch file

Documents

persist till reboot

set output output

delayed variable expansion

usebackq tokens 2 delims

usebackq tokens 2 delims

modern batch programming

bat call indexby

environment variable names