[MLton-devel] Re: SML Basis review

Matthew Fluet fluet@cs.cornell.edu
Sat, 23 Aug 2003 13:16:19 -0400 (EDT)


> > Comparing wc-* under MLtonA and MLtonB shows that lifting the Stream
> > IO operations to Imperative IO isn't as efficient as the Buffer I
> > implementation.

By wc-*, I meant the first set of benchmarks (i.e., with neither F nor S
suffix) where only Imperative IO is used.  MLtonA always beats MLtonB in
this case.  That was the source of my claim that lifting the Stream IO
operations to Imperative IO isn't as efficient as the Buffer I
implementation (so long as you stay within Imperative IO).

> So, it's not clear to me why we use Fast instead of Naive.

In Oct 2002, when I first introduced the StreamIO functor, Stephen
claimed:
I'm not convinced that even with a lot of effort you can get the
imperative-layered-on-functional approach to be as fast as the current
imperative approach.

Granted, since then, I have put a lot of effort into improving the
StreamIO layer, mostly trying to bring it up to speed with the old stream
IO layer.  In any event, I'm perfectly happy using Naive instead of Fast.
There are probably still a few improvements that could be done on the
StreamIO layer.

> None have more than a 2X slowdown going from wc* to wc*S.  Even
> input1, where the effect should be most pronounced, only goes from
> 29.38 to 34.87 seconds.  Similarly when comparing wc* to wc*F.  The
> dispatch doesn't really hurt -- except in input1, where it kills.  I
> find the MLtonA number for wc-input1F really suspect.  Why would
> adding the dispatch cost so much and why would going to streams
> wc-input1S recover almost all that cost?

Look at the ssa for wc-input1{,F,S}.  wc-input1F.ssa has the following
datatype:

instreamP_0 = Buffer_0 of ...
	    | Stream_0 of ...

while neither of the other two have such a datatype.  In the wc-input1
case, we never use the Stream constructor.  Likewise, in the wc-input1S
case, we never use the Buffer_0 constructor (or, rather, it is ephemerally
used between the openFile and getInstream, so it optimized away by
known-case.)  So, I think we are able to completely eliminate the ref
update in wc-input1, but not in wc-input1F.

> We've certainly seen weirder
> with whole-program optimization, but this one might be nice to figure
> out.  It might also give more justification for using Fast instead of
> Naive.

Well, I wouldn't give too much credit to the wc-*{,S,F} benchmarks.  The
"work loop" is too small and MLton just optimizes the tight loop.  I don't
think it is representative of a "real" I/O intensive application.  Mind
you, I don't know what is, but I've looked at the profiling when I was
working on the Stream IO functor, and the entire hot loop ends up in one
SSA function, which is probably unlikely.





-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel