Some results

Fri, 21 Jul 2000 16:51:41 -0700 (PDT)

Wow!  Great!  Yippee!  :-)  Maybe a self-compile by mid-August is
possible after all.

Actually, you might look into trying to modify MLton just enough so
that it doesn't use any Real_ prims.  It might just be possible try a
self-compile without floating point.

>    The files thread1.sml and thread2.sml both segfault apparently due to
>    some corruption of the gcState.currentThread variable (one of them
>    fails in an assert when the expression s->currentThread->stack->used
>    goes through an invalid memory address, the other one fails when the
>    instruction pointer is set to 0x0 -- probably because some
>    currentThread is pointing to a piece of stack with 0x0 where a return
>    address was expected).  Maybe someone else has an idea why I'm getting
>    these errors.  There are very few thread primitives that get compiled
>    to native assembly and as best I can tell, they are working correctly.

My buest guess is Thread_switchTo is screwed up.  Which MLton version
of machine.h did you base Thread_switchTo on?  If you wanna make a
snapshot up, I'd be happy to investigate.

> 2. The (sort of) bad news.  Performance isn't really where we would like
>    it to be.  The results of all of the integer benchmarks are included
>    below, but here's a typical entry:
... 
> Let's start with compile time.  (Obviously, this is using an SML/NJ
> version of mlton.)

This will skew the times a lot, since it makes the SML part of the
MLton compile twice as slow, and hence the gcc time less noticable.

Here's a condensed version of your compile time info.  I just took the
user + sys, and put everything in one chart.

			mlton	x86
		mlton	global	mlton
checksum	   7.3	   7.4	  8.2
count-graphs	  30.5	  31.0	 29.8
fib		   5.3	   5.3	  6.1
knuth-bendix	  42.2	  42.7	 39.1
life		  18.9	  19.3	 19.0
logic		  96.3	 132.5	 90.8
mlyacc		1030.1	1092.2	983.3
mpuz		  12.3	  12.2	 12.8
ratio-regions	  55.3	  56.4	 55.2
tak		   5.7	   5.8	  6.5
wc		  23.5	  24.0	 23.6

> So, you can see that for a native compile, we're really doing three
> invocations of mlton (two of which make calls to gcc), plus two
> invocations of the assembler.  I suspect that adds up.

Yeah.  One thing that would be interesting to see would be the time
for your pass to generate assembly from machineOutput.  That should
give us a good feel for the speedup.  

> Looking at the
> compile time for some of the larger benchmarks, there is some decent
> improvement.  I think we'll see even better performance when we can make
> the assembler call from within mlton. 

Separate assembly will help a lot too.

> Runtime performance isn't really that impressive right now.  However,
> there are a number of different factors at work, and I think I can explain
> some of them.

Here's a condensed version of your run time info.  I just took the
user + sys, and put everything in one chart.

			mlton	x86
		mlton	global	mlton
checksum	 11.7	 31.3	 28.2
count-graphs	 19.1	 43.4	 27.6
fib		 21.9	 32.6	 22.2
knuth-bendix	 37.1	 43.6	 42.7
life		100.3	178.7	138.0
logic		 91.8	107.2	 89.5
mlyacc		 41.6	x	 46.9
mpuz		 76.2	145.1	113.6
ratio-regions	 41.9	110.0	 65.6
tak		 46.5	 65.2	 53.1
wc		 25.1	 46.6	 42.1

I think the times are pretty good for round one.  My impression from
what you wrote is that you feel the lack of liveness information is
killing us.  There are three solutions to this that I see:

(1) Do some local liveness analysis on your IL
(2) Propagate liveness information from the Cps IL down.
(3) Change the register allocator (backend/allocate-registers.fun) to
    enforce some invariants that let you know when you can throw away
    pseudo-regs.

(1) seems silly, since we already computed the information on the Cps
IL, and I don't see how you could compute anything better.

(2) seems feasible to me.  The Cps IL has liveness information at
every label, and so should be able to give you liveness information at 
every block.

(3) Might be ok for a quick hack.