[MLton-devel] Re: SML Basis review

Wed, 20 Aug 2003 09:54:00 -0400 (EDT)

> I have a slew of benchmarks that perform a line count on a file using each
> of the ImperativeIO and StreamIO input functions, comparing with and
> without forcing the underlying stream, and using various implementations.
> I could run them again if anyone wants hard numbers.

Out of curiosity about the current state of affairs, I ran these
benchmarks last night.  I've written up some notes on the benchmarks and
hints on MLton's implementation.  There's probably nothing there that
directly impacts the Basis Library, other than a word of caution about
making generalizations about the relative efficiencies of various I/O
functions which may not hold true in all implementations.  As always, your
mileage may vary.  Probably the most important thing to take away from
this is that the MLKit doesn't support StreamIO at all, and neither did
MLton until the 20030312 release (and I only added for completeness; no
user has ever requested the Stream IO functionality).  So, despite the
Basis Library's slant on Stream I/O, it doesn't seem as though the world
at large is taking that view.

Benchmarks:

All benchmarks:

1. create a 100000 byte file with
   with a newline every 10 chars and #"a" elsewhere
2. for 3000 iterations
   2.1 open file
   2.2 use input fnct to read some data
   2.3 if EOF, goto 2.6
   2.4 count newlines in data
   2.5 goto 2.2
   2.6 close file

wc-*.sml benchmarks have:
   2.1  val inp = openIn f
and 2.2 input fncts are from Imperative IO.

This essentially corresponds to only using Imperative IO.  I.e., how
much does always using Imperative IO cost.

wc-*F.sml benchmarks have:
   2.1  val inp = openIn f
        val _ = getInstream inp
and 2.2 input fncts are from Imperative IO.

The getInstream inp forces the underlying stream to be realized.  For
Imperative IO implementations that use both Buffer I and Stream IO,
this has the effect of always using the Stream IO, but the compiler
isn't able to eliminate the datatype that distinguishes between Buffer
I and Stream IO, so each input function requires an (extraneous) case
dispatch.  This essentially corresponds to the case when one primarily
uses Imperative IO, but occasionally needs Stream IO.  I.e., how much
does a single use of Stream IO cost (in impact on of the Imperative IO
operations).

wc-*S.sml benchmarks have:
   2.1  val inp = getInstream (openIn f)
        open StreamIO
and 2.2 input fncts are from Stream IO.

All input will be using the Stream IO, but no (extraneous) case
dispatch is needed at each input.  This essentially corresponds to
pure Stream IO input.  I.e., how much does always using Stream IO cost.

wc-scanStream.sml benchmark uses the Imperative IO scanStream
function, which essentially uses StreamIO.input1 to repeatedly fill
the stream.  Hence, it's almost identical to wc-input1S.sml.

wc-inputN*.short.sml benchmarks take input in 255 byte chunks.  This
is relatively "short" compared to the input buffer of 4096 bytes.

wc-inputN*.long.sml benchmarks take input in 16000 byte chunks.  This
is relatively "long" compared to the input buffer of 4096.

wc-inputRand*.sml benchmarks use MLton.Random.rand to choose a random
input function from among {input, input1, inputLine, inputN}.  This
simulates mixed inputs and prevents the compiler from over-optimizing
a single read loop.

mlton.A -- FastImperativeIO
mlton.B -- NaiveImperativeIO

NaiveImperativeIO
      datatype instream = In of StreamIO.instream ref

 * I.e., all input uses Stream IO

FastImperativeIO
      datatype instream' = Buffer of BufferI.inbuffer
	                 | Stream of StreanIO.instream
      datatype instream = In of instream' ref

 * I.e., use Buffer I unless stream realized

(Latest public releases of all compilers, except MLton which is the
current CVS.  Benchmarks run on a dual 1.1GHz Pentium III, 1GB real
memory, 2GB swap, RedHat 9.0 Linux.)

run time
benchmark        MLtonA MLtonB ML-Kit Moscow-ML Poly/ML SML/NJ
wc-input          24.07  35.20 291.89         *   68.13 159.40
wc-input1         29.38  31.17 399.18         *  205.10 615.57
wc-inputLine     118.93 131.75 915.31         *  249.53 351.97
wc-inputN.short   42.84  73.13 305.99         *   84.43 298.50
wc-inputN.long    31.12  78.50 302.44         *   95.26 322.90
wc-inputAll       80.95  81.09 350.24         *  146.05 171.85
wc-inputRand      47.19  62.13      *         *       *      *
wc-inputF         37.56  42.68      *         *   78.62 166.42
wc-input1F       484.38  38.23      *         *       * 593.75
wc-inputLineF    225.72 166.98      *         *       * 363.18
wc-inputNF.short  68.63  72.45      *         *   92.30 299.18
wc-inputNF.long   78.84  77.42      *         *   94.74 322.17
wc-inputAllF      79.28  81.29      *         *  144.91 171.52
wc-inputRandF     60.88  60.92      *         *       *      *
wc-inputS         40.95  43.12      *         *   80.74 169.41
wc-input1S        34.87  38.59      *         *       * 443.49
wc-inputLineS    197.37 165.89      *         *       * 334.89
wc-inputNS.short  66.96  73.44      *         *  102.70 306.54
wc-inputNS.long   78.34  77.70      *         *  101.44 375.27
wc-inputAllS      84.40  83.57      *         *  155.16 179.98
wc-inputRandS     64.23  66.86      *         *       *      *
wc-scanStream     33.00  37.60 996.88         *       * 451.94

I haven't investigated MoscowML.  There is clearly something wrong
with the benchmark program.  ML-Kit doesn't appear to implement the
Stream IO layer at all.  (This certainly belies the Basis Library's
stance that the only implementation of Imperative IO is using Stream
IO.)  Likewise, I haven't investigated Poly/ML; the last benchmarks
that Stephen ran and posted to the MLton website have Poly/ML numbers
for wc-scanStream.  Obviously, no other compiler supports
wc-inputRand*, since it uses the MLton.Random structure.

In any event, comparing wc-* with wc-*S under MLtonA shows the
sometimes dramatic speedup of staying entirely within Imperative IO.
Comparing wc-* under MLtonA and MLtonB shows that lifting the Stream
IO operations to Imperative IO isn't as efficient as the Buffer I
implementation.

I see that SML/NJ does pay a heavy price for input1 operations.  I
suspect that the effect is magnified under Imperative IO where the
indirection and ref update on ever character adds up.

-Matthew

-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel