[MLton-devel] Re: SML Basis review

Stephen Weeks MLton@mlton.org
Fri, 22 Aug 2003 19:24:58 -0700


> Comparing wc-* under MLtonA and MLtonB shows that lifting the Stream
> IO operations to Imperative IO isn't as efficient as the Buffer I
> implementation.

I don't see this in the numbers.  Here are the two best cases for
Naive.

> mlton.A -- FastImperativeIO
> mlton.B -- NaiveImperativeIO
...
> run time
> benchmark        MLtonA MLtonB
...
> wc-input1F       484.38  38.23
> wc-inputLineF    225.72 166.98

In one case, Naive is more than 10X faster than Fast.  The only
benchmark where Fast beat Naive by more than 2X was

> wc-inputN.long    31.12  78.50

And on what I expect to be a common case, input1, the two are almost a
wash.

> wc-input1         29.38  31.17
> wc-inputF         37.56  42.68
> wc-input1S        34.87  38.59

So, it's not clear to me why we use Fast instead of Naive.

> In any event, comparing wc-* with wc-*S under MLtonA shows the
> sometimes dramatic speedup of staying entirely within Imperative IO.

It's not as dramatic as I would expect.  Here are the benchmarks
grouped for easier comparison.

> benchmark        MLtonA MLtonB
> wc-input          24.07  35.20
> wc-inputF         37.56  42.68
> wc-inputS         40.95  43.12

> wc-input1         29.38  31.17
> wc-input1F       484.38  38.23
> wc-input1S        34.87  38.59

> wc-inputLine     118.93 131.75
> wc-inputLineF    225.72 166.98
> wc-inputLineS    197.37 165.89

> wc-inputN.short   42.84  73.13
> wc-inputNF.short  68.63  72.45
> wc-inputNS.short  66.96  73.44

> wc-inputAll       80.95  81.09
> wc-inputAllF      79.28  81.29
> wc-inputAllS      84.40  83.57

> wc-inputRand      47.19  62.13
> wc-inputRandF     60.88  60.92
> wc-inputRandS     64.23  66.86

None have more than a 2X slowdown going from wc* to wc*S.  Even
input1, where the effect should be most pronounced, only goes from
29.38 to 34.87 seconds.  Similarly when comparing wc* to wc*F.  The
dispatch doesn't really hurt -- except in input1, where it kills.  I
find the MLtonA number for wc-input1F really suspect.  Why would
adding the dispatch cost so much and why would going to streams
wc-input1S recover almost all that cost?  We've certainly seen weirder
with whole-program optimization, but this one might be nice to figure
out.  It might also give more justification for using Fast instead of
Naive.


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel