new new TextIO

Fri, 29 Sep 2000 14:48:11 -0700 (PDT)

I finished the implementation of the new functional streams underlying text-io.
The basic goal is as before -- if you never use functional streams (via
scanStream or otherwise) you shouldn't have to pay for them.  In the spirit of
"show me the data structures and I'll show you the code", here are the relevant
datatypes.

--------------------------------------------------------------------------------

datatype Buf.t = 
	 T of {buf: Array.array,
	       closed: bool ref,
	       eof: bool ref,
	       fd: FS.file_desc,
	       first: int ref, (* index of first character *)
	       last: int ref  (* one past the index of the last char *)
	       }

datatype StreamIO.t =
	 T of {buf: Buf.t,
	       chars: {string: string,
		       pos: int,
		       next: t} option ref}

datatype t' =
   Buf of Buf.t
  | Stream of StreamIO.t

datatype TextIO.instream = T of t' ref

--------------------------------------------------------------------------------

In the typical case of no functional streams, an instream will just be a Buf.t
ref, and all the operations will side effect the array in the Buf.t.  In fact,
if MLton were smart enough, it could get rid of the Buf.t ref in this case, but
it doesn't know to remove the indirection for refs that are !'ed but not :='ed.
In any case, if functional streams are used, then the instream is represented as 
a StreamIO.t ref, which is essentially a linked list of strings.

So, the per character overhead should be small.

Now, for the benchmarks.  For a loop that counts the number of newlines in a
file, the imperative IO runs at 7M/s on my machine and the functional IO runs at
3M/s.  Unfortunately this change has cost a factor of 2 slowdown in the
imperative IO (it was up to 15M/s a few days ago).  This is due to a weakness in
MLton's elimination of useless constructors that causes it to not eliminate the
Stream constructor.  The problem comes from code like the following:

type a = ...
val fA: a -> a = ...
type b = ...
val fB: b -> b = ...
datatype t = A of a | B of b
val f = fn A a => A(fA a) | B b => B(fB b)
val _ = f(A ...)

In this code, MLton will not eliminate the B constructor, because it sees the
explicit construction of a B object.  It does not notice that this construction
could only happen if there was already another B object, which there isn't.
Anyways, it should be easy to catch this case -- it's on my todo list.

Anyways, the zulu4 benchmark now runs on my machine.  Here is the gc-summary
data.

max semispace size(bytes): 183,500,800
GC time(ms): 27,730 (53.3%)
maxPause(ms): 2,600
number of GCs: 45
bytes allocated: 2,179,577,480
bytes copied: 666,763,184
max bytes live: 70,040,432