[MLton] Code being dropped that shouldn't

Matthew Fluet fluet at tti-c.org
Thu Oct 9 20:00:13 PDT 2008


On Wed, 8 Oct 2008, Wesley W. Terpstra wrote:
> I've been trying to track down the last bug in the library regression
> test, but I think now it's not a bug in the test at all. Try an i386
> compiler (windows and linux both show the same behaviour). Run:
> ./library-test -debug true -keep g -codegen c -profile time
...
> I know there's some sort of basis-specific code
> elimination. Is it possible that librarySuffix is getting hit by this?

No.

Turns out that it was a (long-standing) bug with 'MLton_callFromC' in 
c-main.h.  In the library test, there is a point when the call-stack for 
libm5 looks like:
   m5_open()
   libm5.sml (* top-level SML-code, via m5_open trampoline *)
   libm5confirmC()
   libm5smlFn{Private,Public}() (* _export-ed from libm5.sml *)
   MLton_callFromC()
   libm5.sml (* libm5smlFn{Private,Public}, via MLton_callFromC trampoline *)

The _export-ed 'libm5smlFn{Private,Public}' functions terminate the 
'MLton_callFromC' trampoline by setting the global 'returnToC' to 'TRUE'. 
This returns to the top-level SML-code from libm5.sml, executed via the 
'm5_open' trampoline.  However, the 'returnToC' global is set to 'TRUE', 
so the trampoline terminates at the first inter-chunk transfer, without 
completing the top-level SML-code from libm5.sml; in particular, it 
doesn't execute to the 'suffixArchiveOrLibrary' and doesn't suspend the ML 
stack at the point just before the 'clean atExit'.  Rather, it suspends 
the ML stack much earlier in the execution, and when the top-level C-code 
(as executed by check.sml) executes 'm5_close', it starts a new trampoline 
(setting 'returnToC' to 'FALSE') and resumes the ML code somewhere shortly 
after the call to 'libm5confirmC()', at which point it executes to the 
first 'Primitive.MLton.Thread.returnToC' in 'suffixArchiveOrLibrary', 
returns to 'm5_close'.  That explains why the 'clean atExit' never 
executed.

The reasons that this erroneous behavior doesn't occur with all of the 
C-codegen library tests is that it depends upon how the program is 
chunkify-ed.  This is simply the coarse grouping of RSSA IL functions into 
larger 'chunks' to create larger/fewer .c files.  However, it is a 
sized-based grouping, so small changes to the RSSA IL (such as the extra 
code inserted for time profiling) can change the grouping and cause an 
inter-chunk transfer to manifest the bug.  One can cause the bug to 
manifest in the other C-codegen tests by compiling with '-chunkify func', 
which uses the finest chunking (increasing the number of inter-chunk 
transfers).

The fix is trivial; set 'returnToC' to 'FALSE' after completing a 
'MLton_callFromC' trampoline:

Modified: mlton/trunk/include/c-main.h
===================================================================
--- mlton/trunk/include/c-main.h        2008-10-09 21:35:20 UTC (rev 6918)
+++ mlton/trunk/include/c-main.h        2008-10-10 02:22:13 UTC (rev 6919)
@@ -39,6 +39,7 @@
          do {                                                            \
                  cont=(*(struct cont(*)(void))cont.nextChunk)();         \
          } while (not returnToC);                                        \
+        returnToC = FALSE;                                              \
          s->atomicState += 1;                                            \
          GC_switchToThread (s, GC_getSavedThread (s), 0);                \
          s->atomicState -= 1;                                            \



More information about the MLton mailing list