Time bug is kernel bug, not MLton

Stephen Weeks MLton@sourcelight.com
Mon, 16 Jul 2001 10:11:06 -0700


> Yay:  the Time bug is definitely a kernel bug bug, not a MLton bug.  The hint
> was they mail from Matthew on the .99.  I have C code that will cause the bug
> to show up.  It only seems to happen on an SMP machine, but if I run 3 copies
> of the program, one usually gets the axe within a few minutes.  The bug seems
> to  be  a  race between storing the seconds part and the microseconds part in
> the struct rusage.  It reads the counter kept in the  process  struct  (which
> keeps  times in `ticks', (1/100 of a second)), extracts the seconds parts and
> stores that in the tv_sec field, then  re-reads  the  counter,  extracts  the
> microseconds  part  and stores that in the tv_usec field.  The result is that
> if the other CPU increments the counter  between  the  two  reads,  and  that
> increment  causes  it  to  wrap,  then  you  get  the old seconds and the new
> microseconds, with the latter being 0.

BTW, why can't they get it right in the kernel by reading the counter once
instead of twice.  That seems like a trivial fix.