[MLton] new TextIO

Wed, 14 Jan 2004 19:49:45 -0800

> I  tested  out  a  mod  on January 5th and it sped things up by 17%.  Are you
> talking about that or the new experimental MLton (2004-01-06) with the single
> test in the common case?  I didn't do that but I will do so now ...
> 
> I  just  tried  it  and  it  seems to have made no difference compared to the
> change I tested on January 5th.

Yeah, I was referring to the checkin I made on Jan 12.  Like I said,
I'm not surprised you didn't see an improvement.

> As a comparison, I just tried the tweaked and an indentical thing where  each
> call  to  getchar  and putchar called a C function which then did the inlined
> getchar/putchar.  Here are the timings:
> 
>         new MLton       15.385 seconds
>         fast C           9.000 seconds
>         slow C          19.770 seconds
> 
> I claim (weakly) that this argues that it is the function call that has to be
> eliminated from MLton for the common input1/output1 case.

Plausible.  I am unsure why you are seeing less than a factor of 2
here but more than a factor of 3 in your other tests.

> I haven't looked at the new one-test source yet, but that probably
> makes it easier to arrange for the inliner to not inline the
> uncommon case by just puting that code in a function and then
> calling the function many times from other places in an un-taken
> branch of a top-level if.  I looked at doing this before this final
> mod, but the code was a bit too spread out.

Yes, it is easier now.  I just did so using the following function.

val dontInline: (unit -> 'a) -> 'a =
   fn f =>
   let
      val rec recur: int -> 'a =
	 fn i =>
	 if i = 0
	    then f ()
	 else (recur (i - 1)
	       ; recur (i - 2))
   in
      recur 0
   end

I put the dontInline around the else branch in the input1 function in
imperative-io.fun.  In looking at some SSA, it looks like it did the
right thing.  While doing so, I noticed that there was still an extra
bounds check in the hot code -- I've already checked in a fix.

I am curious to see if you can achieve any speedup using the new code
without the bounds check and in addition using dontInline.

One note, you may need to massage input1 further to avoid a closure
allocation for the dontInline stuff.