[MLton] Crashes with 64-bit native code generator on Windows

Mon Nov 30 09:14:33 PST 2009

On Mon, Nov 30, 2009 at 11:00 AM, David Hansel
<hansel at reactive-systems.com>wrote:

> 1) In my previous post I included a disassembly of the location where the
> crash happens.  With some creative grep-ing I was able to find the location
> of that code within the assembly code that MLton produces for our program:
>
> I was able to reproduce this with several examples for which the crash
> occurs (all of which unfortunately include a large part of our code so
> I can not make them available here).  The crash always occurs in the
> "*applyFFTempFun" call and always because applyFFTempFun is NULL.
>

I agree that that seems to pinpoint the source of the crash.

> 2) As I mentioned before,  if I compile the program from the SML
> code and just insert a "print" statement in function "get" within
> MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml,  the crash
> also does not occur.  Interestingly,  the MLton-produced assembly code
> for that version (only change is the "print" statement) does not contain
> ANY calls to "applyFFTempFun".
>

More evidence.  Although, in this case, I suspect that there is still an
indirect call in the assembly code.  It simply doesn't go through the
temporary variable --- gets allocated and stays in a register.

3) Looking at the MLton source code (amd64-generate-transfers.fun),  I can
> see
> that calls to "applyFFTempFun" seem to be inserted for "Indirect" FFI
> calls.
> I do not know enough about the code generator or the FFI interface to make
> much sense out of this.
> However,  I can see that the MLTon-produced code with the crash only
> contains
> a call to "applyFFTempFun" (which I assume is created in line 1566 of file
> amd64-generate-transfers.fun) but never any code that would set the value
> of "applyFFTempFun" (which I assume should be created in line 1183 of file
> amd64-generate-transfers.fun).
>
> Given these observations,  does anyone have any suggestions about MLton
> debugging options or other ways to shed more light on what might be going
> wrong here?
>

Sounds like a bug in the amd64 codegen simplifier and/or register
allocator.  It seems that somewhere along the line, the definition of the
applyFFTempFun variable is being dropped, but the use in the indirect call
is being retained.  When the register allocator comes along, when it doesn't
locally find the def point of applyFFTempFun, it has to fetch the value from
the (uninitialized) variable.

Could you compile with "-native-commented 3 -native-split 0 -keep g" and
post the basic block that has the call through applyFFTempFun?  It will be
pretty noisy, but should shed some light on what the native codegen is doing
(wrong).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20091130/85dfebb7/attachment.html