dcs/100: procedural programmingpc/teaching/introprogramming/week6/slides… · dcs/100: procedural...
TRANSCRIPT
DCS/100: Procedural ProgrammingWeek 6: File Input and Ouput
Queen Mary, University of London
DCS/100: wk 6 – p.1/51
Last week
By now you should be secure in writing simpleprograms that:
ask the user for inputcompute an answer based on the user’s inputprint out an appropriate response
You should understand branching and for loops
You should also be starting to get to grips with whileloops.
DCS/100: wk 6 – p.2/51
A program from last week
int ans=0;int count=0,sum=0;
out.write("Enter a number (eof to terminate): ");while (in.digit()){ ans = in.readint();
sum = sum+ans;out.write("Enter a number
(eof to terminate): ");in.readblanks();
}
out.writeln("The sum was "+sum);
DCS/100: wk 6 – p.3/51
This Week: Learning Outcomes
By the end of this week, you should be able to:
write programs that store information in files
write programs that retrieve information from files
explain how data can be stored persistently
explain different ways characters are stored
coursework
DCS/100: wk 6 – p.4/51
Files
Files are the operating systems mechanism forpermanently storing data.
...storing data persistently so that it survives a programterminating
Files are kept in directories, or folders.
Together these make up a filesystem.
Each operating system has its own form of filesystem.(Usually more than one).
But (these days) each operating system also has to beable to handle “foreign” filesystems as seamlessly aspossible.
DCS/100: wk 6 – p.5/51
Files
Files can contain either data (datafiles)
or text (textfiles).
You can read textfiles in a text editor, but you cannotusually read datafiles.
Each kind of file keeps its information in a particularlayout (or format).
Under Unix you can get a good guess at what kind ofinformation (really what kind of format a file is in) byrunning the file command.
DCS/100: wk 6 – p.6/51
Text Files
Not all text files are to be read or written by you.
Some are supposed to be read or written byprograms.
When you write a java program it is supposed to beread by the java compiler (javac).
A PostScript file (.ps) is usually written by a program(save as PostScript) and read by a printer ordisplay program. It is a text file, but it’s rarely looked atby a human.
Configuration files are often text files. They aregenerated by programs when you edit preferences, andread by programs to set them up to your liking.(.gnome/gedit).
DCS/100: wk 6 – p.7/51
Text Files
These days databases and spreadsheets are increasinglybeing saved in text-based formats:
csv: comma-separated value
XML: Extensible Markup Language
You will learn about XML in the Language course next term.
DCS/100: wk 6 – p.8/51
Accessing Files
You can read from a file.
You can write to a file.
It is a poor idea to try and do both at the same time!
It is also a poor idea to have more than one thing trying towrite to the same file!(Having more than one thing reading is OK.)
Typically you can only read a file sequentially, and anythingyou write goes at the end of it.
DCS/100: wk 6 – p.9/51
Text Files
Here is a little text file:
There was a young lady of Niger,Who went for a ride on a tiger.They returned from the ride,With the lady inside,
And a smile on the face of the tiger!
(traditional limerick)
As far as the computer is concerned, it doesn’t see it as
nicely laid out in lines like this.
DCS/100: wk 6 – p.10/51
Text Files
What the computer sees is
TheretwastatyoungtladytoftNiger,←↩Whotwenttfortatridetontattiger.←↩ttTheytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
The computer sees it as one long stream of characters.Some of which you can’t really see on the screen.
t is a space
←↩ is a new line
� is an end-of-file character
DCS/100: wk 6 – p.11/51
Text Files
t is a standard character (ASCII 32)
←↩ can be represented in different ways by differentoperating systems:
Unix: newline (ASCII 10) separates linesMicrosoft: carriage return - linefeed (ASCII 13 10) separates
lines (also used for some mail file attachments)MacOs: carriage return (ASCII 13) separates lines
� can be handled in different ways:Unix: ctrl-d (ASCII 4) is eof character.
DCS/100: wk 6 – p.12/51
Brinch Hansen’s File Output
Writing to files is very similar to writing to the screen.
In order to write to a file you need an output streamtargeted at that file:
output outstream = new output("filename");
creates such an output stream connecting variableoutstream with the file called filename.To write to it, you just use
outstream.write("blah");
and
outstream.writeln("blah blah");
DCS/100: wk 6 – p.13/51
Exercise: What does this do?
class hello extends basic{
public static void main (String param[])throws Exception
{output out = new output("morgan.txt");
out.writeln("Hello World!");out.close();
}}
DCS/100: wk 6 – p.14/51
Exercise
Write a program that saves your name into the file called
wombat.txt
DCS/100: wk 6 – p.15/51
Brinch Hansen’s File Output
The argument taken by output is a String.
You can put anything that gives a String there.
For example a string variable filled using keyboard inputso the user can specify the file.
DANGERIf you close a file for output, and then open it again,
you will erase the previous contents
you will not simply add things at the end of the file!!
DCS/100: wk 6 – p.16/51
class hello2 extends basic{public static void main (String param[])
throws Exception{String mystring="file";output out = new output(mystring+".txt");
out.writeln("Hello World!");out.close();
}}
DCS/100: wk 6 – p.17/51
Brinch Hansen’s File Input
Reading from a file is also similar to reading from thekeyboard.
In order to read from a file you need an input stream whosesource is that file:
input instream = new input("filename");
creates such an input stream connected to the file calledfilename.To read a single character from it, you use
mychar = instream.read();
and to read a line:
mystring = instream.readline();
etc.DCS/100: wk 6 – p.18/51
Brinch Hansen’s File IO
Think of an input file as having a pointer pointing at whereyou are in the file.
TheretwastatyoungtladytoftNiger,←↩Whotwenttforta
↑tridetontattiger.←↩ttTheytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.19/51
Brinch Hansen’s File IO
mychar=infile.read()
reads the next character (t), and moves the pointer on.
TheretwastatyoungtladytoftNiger,←↩Whotwenttforta
↑tridetontattiger.←↩ttTheytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.20/51
Brinch Hansen’s File IO
mychar=infile.next()
reads the next character (t), but does not move the pointeron.
TheretwastatyoungtladytoftNiger,←↩Whotwenttforta
↑tridetontattiger.←↩ttTheytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.21/51
Brinch Hansen’s File IO
infile.readnext()
moves the pointer on one character, but does not tell youwhat that character was.
TheretwastatyoungtladytoftNiger,←↩Whotwenttforta
↑tridetontattiger.←↩ttTheytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.22/51
Brinch Hansen’s File IO
mystring=infile.readline()
reads the file up to the next newline character, and movesthe pointer just past it (the String it returns does not containthe newline).
TheretwastatyoungtladytoftNiger,←↩Whotwenttfortatridetontattiger.←↩ttThe
↑ytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.23/51
Brinch Hansen’s File IO
infile.readln()
moves the pointer just past the next newline character, butdoes not give you the string of characters you’ve justpassed.
TheretwastatyoungtladytoftNiger,←↩Whotwenttfortatridetontattiger.←↩ttThe
↑ytreturnedtfromtthetride,←↩ttWithtthetladytinside,←↩Andtatsmiletontthetfacetoftthettiger!←↩←↩tttttt(traditionaltlimerick)←↩�
DCS/100: wk 6 – p.24/51
Brinch Hansen’s File IO
infile.more()
returns true just when the next character is not � the endof file (but does not move the pointer).
DCS/100: wk 6 – p.25/51
Brinch Hansen’s File IO
There are many more functions. See Brinch Hansen p228.These allow you to
infile.blank(): check if the next character is ablank
infile.digit(): check if the next character is a digit
infile.letter(): check if the next character is aletter
infile.readboolean(): read the next boolean(skipping blanks)
infile.readboolean(): read the next boolean(skipping blanks)
infile.close(): close the file for further input
DCS/100: wk 6 – p.26/51
Good idioms
read filewhile (infile.more()){ body containing read}
and more
These are fragments of code or sketches or programs thatyou will use again and again.
At a higher level they are called “design patterns”.
DCS/100: wk 6 – p.27/51
File Copy
class copy extends basic{public static void main (String param[])
throws Exception{input infile = new input("in.txt");output outfile = new output("out.txt");
while (infile.more()){ char next = infile.read();
outfile.write(next);}infile.close(); outfile.close();
}}
DCS/100: wk 6 – p.28/51
Bug Warning
There is a bug in readline, readln() and read().
If you are reading a textfile, and the last line does NOTend in a newline character, then readline will hangwaiting for that newline, and the program will notterminate. Some editors automatically put in that lastnew line, others do not.
readln behaves similarly.
If you attempt to read() beyond the last character of afile, it will also hang.
I haven’t tested other file programs.
DCS/100: wk 6 – p.29/51
Newline characters
editor gedit does not put a newline at the end of a text fileby default.
To check if you do have a newline:
cat <file>
(cat is short for “concatenate”).
DCS/100: wk 6 – p.30/51
Characters
Computers need some way of representing letters (anddigits and punctuation) as bit patterns. You need to knowabout two of these:
ASCII
Unicode
DCS/100: wk 6 – p.31/51
ASCII
American Standard Code for Information Interchange
For many years this has been the standard. (It wasoriginally set up when the only printers wereline-printers).
It will still be used by any computer you have.
It has 128 characters (0-127: 7 bits):
0-9 are numbers 48-57A-Z are numbers 65-90a-z are numbers 97-122
See Chapter 6.6 of Brinch Hansen.
DCS/100: wk 6 – p.32/51
Drawbacks of ASCII
ASCII has only 128 characters.
It only has basic Anglo-Saxon characters. There’s noroom for other stuff.
Could use full 8-bit, but then would have to use different8-bit set for different languages. (This is done...)
DCS/100: wk 6 – p.33/51
Unicode
Unicode is a newer system:www.unicode.org
It uses 32bit characters: \uxxxxwhere each x is a digit in hexadecimal (base 16).(0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f)
It extends ASCII (first 128 characters are as in ASCII).
But it can represent huge numbers of characters.
Nevertheless there are still problems. . .
DCS/100: wk 6 – p.34/51
Unicode
Standard English characters are sequential inalphabetical order.
But some languages use some of the same charactersas others.
This means you have a choice.Either you have different representations of the samecharacterOr you have alphabets that go all over the place.Or you have both.
DCS/100: wk 6 – p.35/51
Unicode
In ASCII some simple tricks work:
newchar = mychar + ’A’ - ’a’;
capitalises mychar
This won’t work reliably in unicode (not across all scripts).
Instead there is the function:
Character.toUpperCase(char c)
newchar = Character.toUpperCase(mychar)
DCS/100: wk 6 – p.36/51
Unicode
Similarly:
Character.toLowerCase(char c)
Character.isUpperCase(char c)
Character.isLowerCase(char c)
Character.isDigit(char c)
Character.toLetter(char c)
If you want your program to work reliably internationally, youhave to use these.
DCS/100: wk 6 – p.37/51
Standard character escapes
You can get any character using its unicode form:
\uxxxx
where xxxx is its four-digit hexadecimal unicode number.
Examples:newline: ’\u000a’
space: ’\u0020’
tab: ’\u0009’
But many of them have more convenient mnemonicescapes, in particular:newline: ’\n’
tab: ’\t’
DCS/100: wk 6 – p.38/51
Program Design
See Brinch Hansen pp70-73Example: Week 5 Exercise 8:
Copy a file to another file removing all blank lines.
DCS/100: wk 6 – p.39/51
Program Design
Typical InputHere is an
input text with just two
blank lines.
Typical OutputHere is aninput text with just twoblank lines.
DCS/100: wk 6 – p.40/51
Program Design
Don’t think that because this is “programming” the way youdo this will be radically different from the way a humanwould do it, say in a text editor.In a text editor you would:
read in the file
go through the file line by linestarting at the beginningdeleting any blank lines
save the file
DCS/100: wk 6 – p.41/51
Program Design
You can’t quite do this in a program: you don’t have thedata structures to read in a whole file yet.So you have to make a slight transformation to thealgorithm to cope:
go through the file line by linestarting at the beginning
read in a lineif it is not blank, save it to the target fileotherwise do nothing
Put this in a template file.
DCS/100: wk 6 – p.42/51
Program Design
At this point you realise there is some bureaucracy youhave to do about opening and closing files.That may as well go in (so you have a working program).
DCS/100: wk 6 – p.43/51
Program Design
Then start to fill in the design.//go through fileexpands to
while (infile.more()) {
DCS/100: wk 6 – p.44/51
Program Design
Now we have to check whether the String line containsonly blanks.This means pulling it apart.We don’t know how to do that!!!Two possibilities:
1. RESEARCH! Find out how to pull strings apart.
2. Find another way (reconsider using readline).
DCS/100: wk 6 – p.45/51
Program Design
Instead of reading in the line using readline(), read itcharacter by character checking if they’re blanks.
DCS/100: wk 6 – p.46/51
Program Design
We’ll need to do two things:
1. keep assembling the line
2. keep track of whether we’ve seen a non-blank (for thiswe use a “flag”: a boolean variable).
DCS/100: wk 6 – p.47/51
Program Design
To assemble the line, keep adding the current character at
the end of a string line, initially empty.
DCS/100: wk 6 – p.48/51
Program Design
To check if non-blank seen, use boolean blank_line,initially true but becomes false if non-blank seen.
DCS/100: wk 6 – p.49/51
Program Design
After we have completed this loop, we need to writenon-blank lines to the output file.“if line not blank” becomesif (!blank_line) {}and“write to output file” is
outfile.writeln(line);
DCS/100: wk 6 – p.50/51
This Week: Learning Outcomes
By the end of this week, you should be able to:
write programs that store information in files
write programs that retrieve information from files
explain how data can be stored persistently
explain different ways characters are stored
Reading
Computing Without Computers Chapter 8
Brinch Hansen chapter 6.
DCS/100: wk 6 – p.51/51