[MLton-user] Questions on MLton

shivers@cc.gatech.edu shivers@cc.gatech.edu
Sat, 4 Oct 2003 17:17:05 -0400


Stephen-

Thanks for the near-instantaneous reply!

   Also, the write barrier in MLton is very cheap.  First off, we use
   card marking, which is only a few instructions.  Second, because MLton
   has the whole program and knows the types of everything, it doesn't
   have to insert card marks at every ref update, only at pointer
   updates.  That cuts it down on the number of writes.

What I am doing is hacking linked lists destructively -- so type analysis
won't help, right? Doing a SET-CDR! in SML smashes a list into memory, not a
bool or char or something where you could sleaze the write barrier. Alas.

   I'd be interested to find what you learn with your application.

I will report!

   > 3. Is there a way simply to time code? The profiler seems like a very
   >    hairy thing, and I essentially just want to have a little timer
   >    that I can click on & off when I run the function being timed.
   >    Or am I fighting the system, and should just suck it up & figure
   >    out how to extract this functionality from the profiler?

   You can use the Timer structure from the basis library to do timing in
   source code.

   However, I think you can get what you want easily from the profiler.
   If you want the total amount of time spent in a function and all of
   its callees, you do the following three steps.

	   % mlton -profile true -profile-stack true z.sml
	   % z
	   % mlprof -raw true z mlmon.out

   Then, look for the line with the function you want in the output of
   mlprof.

OK, let me unburden myself with the whole plan. I am timing a slew (~ 10-15)
of sorting algorithms. Each algo gets executed on a mess of random lists.
But these lists are produced by about 12 different "list generators."
  - One generator just makes random lists. 
  - One makes lists that tend to have short increasing runs of data.
  - One makes lists that tend to have short increasing & decreasing runs.
  - Others ramp up the length of these runs.
So I end up with a matrix:
  - the rows are the sort algos
  - the columns are the various kinds of lists.

Hence, I split out timing info for each elt of the matrix. Note that it's
interleaved like this: generate a list, then sort it with each algo. Then
make up another list, rinse, repeat.

Now, I see in the doc that one can allocate different profiler structures
and somehow do some sort of thing or another with these things. But it's
a bit confusing. No, actually, it's totally confusing -- I really am
having a lot of trouble figuring out how to work the profiler stuff.

   I would also guess that the profiler would perturb the program less
   than using Timer would.

BTW, one of the functions that typically runs inside the top call to the sort
is list merge... which can push *thousands* of stack frames. Is this going
to make the stack profiler unhappy? (There's a note about increased overhead
in the doc.) I have no interest in tracking time spent inside the merge
call (which is lexically nested inside the sort function). I am *only*
interested in the time spent dynamically inside the initial call to the
top-level sort function -- and the entry points for the sort functions 
are *not* recursive.

Can you advise me further?
    -Olin

P.S. While I'm flaming, it is super-annoying to run carefully hand-tensed code
on an X86 because there are no frigging registers. I need multiple regs
available across call *and* across return *and* during loops to hold values.
(E.g., returning a three-tuple from a local recursion, or a loop with six or
seven iteration vars.) On a Sparc, this is a given. But, on the one hand,
Mlton doesn't produce asm for Sparc, and, on the other, I don't have faith in
SML/NJ's optimisation powers.

Someone really needs to put a stake through the x86's heart.