[MLton-user] Problem with Timer.checkCPUTimer on Sparc
Mon, 10 Nov 2003 14:27:07 -0500
> - Clearly, the recursive impl is the big winner on the SPARC. But it's such a
> huge winner it's suspicious.
Indeed. If your program doesn't inspect the output of merge1, then
MLton's optimizer will be smart enough to avoid allocating the result.
At that point, merge1 becomes a simple tail recursive walk down two
lists until either one is empty. Sadly, MLton's optimizer is not
smart enough to know that lists are of finite length, and so cannot
optimize away that loop :-). In any case, I suspect that this is
what's happening. You can confirm by compiling -profile alloc and
checking that no allocation is attributed to merge1, or by looking at
the .ssa if you really want.
Argh. OK, I'll try to fake out the optimiser and see if that restores
sanity to that measurement.
> - First, the measurements in the sexp in the left columns are
> #usr(CheckCPUTimer tm) (usr time -- gc & non-gc)
> #sys(CheckCPUTimer tm) (sys time -- gc & non-gc)
> CheckGCTime tm (gc time -- usr & sys)
> in usec. The *meaning* of these things is, as indicated to the side,
> your older semantics (as opposed to your brand-new fixup).
To make sure we're on the same page, here is my understanding of how
the timings in 20030716-2 work. Let's allocate every unit time spent
in the program to one of the four bins.
user A B
sys C D
Then, the Timer structure gives you
#usr(CheckCPUTimer tm) = A + B + C
#sys(CheckCPUTimer tm) = C + D
CheckGCTime tm = A + C
Ohhhhh. We're *not* on the same page. I assumed no double-entry bookkeeping,
i.e., when you said C appeared in the #usr time, I assumed that meant it
*wasn't* in the #sys time.
So, your non-gc = (A + B + C) + (C + D) - (A + C) = B + C + D.
I.E. it includes gc system time.
OK. I'll redo my measurements. But, here's a question: why would the
gc rack up *any* system time *at all*? It never does a system call, does
> - It's kind of suspicious that the usec-precise times are always in
> 10msec units. 1/100th of a second is pretty coarse!
Timer is implemented using getrusage(), which appears to return values
with this granularity, at least on my Linux and Sparc machines. I'd
be interested to hear from any experts out there who can tell how to
get more precision out of getrusage or some other function.
But, I always like to run benchmarks for at least 5-10 seconds anyways,
to try to avoid other noise and make this issue moot.
OK, so I just heard you say that this is The Way Things Are, so I won't treat
this as weird, just sleazy. It's surprising that the OS can't do better.
I'll report on new timings in a bit.