# fast I/O

Henry Cejtin henry@sourcelight.com
Tue, 19 Sep 2000 01:23:01 -0500

I  have  thought some more about the general problem of mutable vs. immutable
I/O stuff, and here is the problem as I  see  it:  First  note  that  reading
almost  anything  requires  at  least  one character of look ahead.  Consider
reading an integer: you don't know it is done until you  look  one  character
too far.

This fact does not, of course, require immutable objects marking positions in
I/O streams of course.  It doesn't technically require any  support  at  all.
Unfortunately,  without  any support from the underlying I/O system, you will
be forced to hold the looked ahead character in a variable  and  to  pass  it
around.  What's worse, some times you won't have it, so you will have to read
it just to pass in.

Rejecting that approach, we have to add some support in the I/O system.  What
Pascal  did,  which was quite elegant, but kind of a pain to use, was to have
the notion of 1 character look ahead built in by providing the equivalent  of
a  `peek'  function.   This returned the next character, but did NOT advance.
The reason why this is nice is because all combinations of  calls  are  legal
(see  the  C  solution  for a system where this is not the case), but it is a
pain to use.  Consider using this the code to read  in  a  (weakly)  positive
integer (no white space skipping):

fun getuint (stream): int =
let fun loop ac =
let val ch = peek stream
in if isDigit ch
loop (10*ac + ord ch - ord #"0"))
else ac
val ch = peek stream
in if isDigit ch
then loop (ord ch - ord #"0")
else raise ???
end

Note  the  need  to  advance  the stream.  2 calls per character, and even if
advance returns the next character, it is a pain to use.

The approach used in C is to provide ungetc instead of peek.  This  is  quite
ugly  because:  it  isn't  legal  to  call  unget  twice  in a row without an
intervening getc call, and also what happens if  you  unget  something  other
than  the last character read?  What does ungetting EOF mean?  Ignoring that,
it is easy to use and a bit more compact than peek.

You could, of course, generalize unget  so  that  you  could  unget  as  many
characters in a row as you wanted (and then you would get them in the reverse
order).  This is all quite consistent and nice, but note it  isn't  quite  as
modular  as the functional approach.  The reason is that anything that does a
read which may have to be pushed back has to know that so that it  will  save
the  characters.  With the functional approach, no one has to know except for
the routine that is going to decide to go back.  The down  side  of  this  is
that only the GC will know that some save set of characters can be gotten rid
of now.  The code generator is almost always going to have  to  generate  the
saved  state  even  though  the code typically only wants 1 character of look