[MLton] C codegen and world* regressions

Matthew Fluet fluet@cs.cornell.edu
Fri, 16 Jun 2006 23:23:42 -0400 (EDT)


> In any case, I'm getting errors on all of the world* regressions:
>
> unhandled exception: SysErr: Success [<UNKNOWN>]
> unhandled exception: Fail: child failed
> Nonzero exit status.
>
> I'm guessing this is due to the reorganization of world save/restore in r4642 
> and r4648; it'll cause problems with bootstraps on non-x86 systems.

The problem was introduced at r4642:

   The second change is how the SML code saves the world:
   Now the filename is passed to C, and a failure does not abort the program.
   Instead, we check return codes and propogate the error code back to an SML
   exception, raised with the correct error status.

   To effect this change, I had to change the save primitive's prototype.
   Also, I had to output nicer C code in the codegen to keep error status.

The problem is due to changing the primtive from
   val save = _prim "World_save": C.Fd.t -> unit
to
   val save = _prim "World_save": NullString8_t -> bool C_Error.t

What happens is that the late stages of the compiler turn this primitive 
into a C call.  When the program is saving a world, the ML code calls 
GC_saveWorld, and, in the absense of errors, control returns to the ML 
code _with a true return value_.  When the program loads the saved world, 
the runtime initialization prepares the ML heap and then control returns 
to the ML code _without a return value_.  Luck of the draw, it appears 
that the main function in x86-main.h happens to leave a non-zero value in 
%eax.  In the C-codegen, however, we trampoline into the ML code at a 
point where it expects the C variable CReturnW32 to have been assigned the 
return value of GC_saveWorld; this time, we're reading a 0 value from the 
unitialized CReturnW32, so the ML code thinks there was an error saving 
the world.

See the comment in prim-mlton.sml about Thread_copyCurrent for a similar 
situation.

I think we should go back to the -> unit behavior.  That leaves two 
choices:
  1) abort the program if saving fails; this isn't quite the old behavior,
     because the old version opened and closed the file in ML, so access
     errors were reported as SysError exceptions, but would now have abort
     semantics.
  2) return the status of the save out-of-band; this is essentially the
     solution used by the thread primitives; we could add a
       val getSaveStatus =
          _import "GC_saveWorldSatus" : GCState.t -> bool C_Errno.t
     and check the status after the save.  I think this is the way to go.