[MLton-devel] Fwd: Mark Tuttle

Matthew Fluet fluet@cs.cornell.edu
Fri, 2 May 2003 17:18:17 -0400 (EDT)


> It is super important for my stuff that I/O is fast.  Is there any fix to the
> slow I/O in the new implementation?

Time and effort would probably get them up to snuff.  If you stay entirely
in the imperative I/O (without ever extracting the underlying stream),
then the new IO should be pretty close to the old IO.  Here are some notes
I was working on on the IO (which apply to the benchmarks I sent out
yesterday):

With the exception of wc-inputLine.long.sml, all benchmarks:

1. create a 100000 byte file with
   with a newline every 10 chars and #"a" elsewhere
2. for 8000 iterations
   2.1 open file
   2.2 use input fnct to read some data
   2.3 if EOF, goto 2.6
   2.4 count newlines in data
   2.5 goto 2.2
   2.6 close file

wc-*.sml benchmarks have:
   2.1  val inp = openIn f
and 2.2 input fncts are from Imperative IO.

wc-*F.sml benchmarks have:
   2.1  val inp = openIn f
        val _ = getInstream inp
and 2.2 input fncts are from Imperative IO.

The getInstream inp forces the underlying stream to be realized.  For
Imperative IO implementations that use both Buffer IO and Stream IO,
this has the effect of always using the Stream IO, but the compiler
isn't able to eliminate the datatype that distinguishes between Buffer
IO and Stream IO, so each input function requires an (extraneous) case
dispatch.  This essentially corresponds to the case when one primarily
uses Imperative IO, but occasionally needs Stream IO.  I.e., how much
does a single use of Stream IO cost.

wc-*S.sml benchmarks have:
   2.1  val inp = getInstream (openIn f)
        open StreamIO
and 2.2 input fncts are from Stream IO.

All input will be using the Stream IO, but no (extraneous) case
dispatch is needed at each input.  This essentially corresponds to
pure Stream IO input.  I.e., how much does always using Stream IO cost.

wc-scanStream.sml benchmark uses the Imperative IO scanStream
function, which essentially uses StreamIO.input1 to repeatedly fill
the stream.  Hence, it's almost identical to wc-input1S.sml.

wc-inputN*.short.sml benchmarks take input in 255 byte chunks.  This
is relatively "short" compared to the input buffer of 4096 bytes.

wc-inputN*.long.sml benchmarks take input in 16000 byte chunks.  This
is relatively "long" compared to the input buffer of 4096.  (New IO's
Stream IO could be improved by extending with inputN by the needed
bytes, instead of the buffer size.  But, New IO's Stream IO is already
quite fast compared to the Old IO on inputN.long.  The current
implementation and benchmark essentially simulates the effect of doing
"long" inputN's on an input stream that has already been forced with
buffers of default size.)

wc-inputRand*.sml randomly chooses an input function for each input.  This
eliminates a super "fast path" by constantly looping through exactly one
input function.

wc-inputLine*.long.sml benchmarks change steps 1 & 2 to
1. create a 160000 byte file with
   with a newline every 16000 chars and #"a" elsewhere
2. for 2000 iterations
Thus, a line is relatively "long" compared to the input buffer of 4096
bytes.



There is a little extra overhead in order to really support transitioning
between all the I/O layers.  For example, as Henry notes, we maintain the
file input and output position in the PrimitiveIO.{reader,writer}.  There
are some other overheads (e.g., the open/closed status of the file is
maintained both in the PrimitiveIO.{reader,writer} and in the Stream
abstraction).  As I wrote quite some time ago on the Basis library, there
are some ambiguities that make it unclear whether we can drop some of
those redundancies.  Maybe we can pin Reppy down later this month.


In any event, Henry, I'd love it if you pulled down the experimental
release and try it out on one of your programs where you need fast IO.
While the wc-* benchmarks are good, they are a little artificial, so I'd
be interested in seeing some real-world results.




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel