[MLton] Memory problems with latest CVS?

Matthew Fluet fluet@cs.cornell.edu
Thu, 9 Sep 2004 14:51:45 -0400 (EDT)


> Given how sensitive this bug is, I'm not sure it's worth spending more
> effort trying to recreate your environment on my machine.  Perhaps we
> could proceed with debugging on your machine.  In mail last week, you
> mentioned that you could reliably get the segfault during a gc, while
> in the mail from last night, the segfault was after a gc.  It might be
> easier if you can get back to the situation where the segfault is
> during the GC, since we may be able to more easily understand the code

I may have been mistaken about the seg fault during the GC.  I think I was
assuming it was part of the GC because there was no MLton output between
the finished GC and the seg fault.  I'll see what I can turn up.

> In any case, have you tried bootstrapping with -debug true, so the
> second round of compilation (the one that is failing) is using a
> runtime with asserts on?

Here's what I did:  bootstrapped with mlton-20040819, but rather than
  mlton @MLton ram-slop 0.7 gc-summary  -- \
        -default-ann 'sequenceUnit true' \
        -target self -verbose 2 -output mlton-compile \
        mlton-stubs.cm
I did
  mlton @MLton ram-slop 0.7 gc-summary  -- \
        -default-ann 'sequenceUnit true' \
        -target self -verbose 2 -output mlton-compile \
        -keep machine -stop g \
        mlton-stubs.cm
So that I could experiment with the resulting code.

So, next I did:
  mlton @MLton ram-slop 0.7 gc-summary  -- \
        -default-ann 'sequenceUnit true' \
        -target self -verbose 2 -output mlton-compile \
        mlton-stubs.*.c mlton-stubs.*.S
and verified that the resulting compiler would segfault in the manner I
described last night.

Then, I did:
  mlton @MLton ram-slop 0.7 gc-summary  -- \
        -default-ann 'sequenceUnit true' \
        -target self -verbose 2 -output mlton-compile \
        -debug true \
        mlton-stubs.*.c mlton-stubs.*.S
which links with the debug runtime.  I verified that the resulting
compiler there would segfault in the manner I described last night.  I get
no assertion failures from the runtime system.

Finally, started the segfault-ing command and then attached gdb to the
running process and let it go.  As I said before, sometimes instead of a
segfault, the process eventually just "hangs", which is what happened in
this case:

[fluet@tiger lib 27]% ps -u fluet | grep mlton
11175 pts/1    00:00:10 mlton-compile
[fluet@tiger lib 28]% gdb mlton-compile 11175
GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program:
/amd/panda/a/fluet/mlton/mlton.cvs.HEAD.TEMP/build/lib/mlton-compile,
process 11175
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x088d870f in foreachPointerInObject (s=0x88d8c72, p=0x8a93360 "\214\037",
    skipWeaks=-1531246708, f=0) at gc.c:1010
1010    gc.c: No such file or directory.
        in gc.c
(gdb) cont
Continuing.


Program received signal SIGINT, Interrupt.
0xb7565579 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
(gdb) bt
#0  0xb7565579 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
#1  0xb74f100c in _L_mutex_lock_2660 () from /lib/tls/libc.so.6
#2  0x088e427b in __gmp_default_reallocate ()
#3  0xb74f0303 in realloc_hook_ini () from /lib/tls/libc.so.6
#4  0xb74ed06c in realloc () from /lib/tls/libc.so.6
#5  0x088e427b in __gmp_default_reallocate ()
#6  0x088dc407 in __gmpz_realloc ()
#7  0x088dc248 in __gmpz_mul ()
#8  0x088c9821 in binary (lhs=0x2d5ba35c "\t    ",
    rhs=0x14 <Address 0x14 out of bounds>, bytes=235703092,
    binop=0x88cd03c <Posix_IO_write+28>) at basis/IntInf.c:193
#9  0xbfffc410 in ?? ()
#10 0xbfffc430 in ?? ()
#11 0xbfffc420 in ?? ()
#12 0x2d5ba368 in ?? ()
#13 0x3b9aca00 in ?? ()
#14 0x00000011 in ?? ()
#15 0x139fc2f3 in ?? ()
#16 0x00000014 in ?? ()
#17 0xe1bb8f0b in ?? ()
#18 0x00000000 in ?? ()
#19 0x2d5ba3d4 in ?? ()
#20 0x08a93360 in globalWord16 ()
#21 0x00000002 in ?? ()
#22 0x00000001 in ?? ()
#23 0xbfffc400 in ?? ()
#24 0x00000014 in ?? ()
#25 0x00000002 in ?? ()
#26 0x00000001 in ?? ()
#27 0xbfffc408 in ?? ()
#28 0x2d5ba35c in ?? ()
#29 0x00000014 in ?? ()
#30 0x0e0c8b34 in ?? ()
#31 0x088cd03c in Posix_IO_write (fd=16,
    b=0x88dc030
"U\211WVS\203<\213E\f\213}\020\213p\004\213_\004\2111\205\211U\017\210\003\002",
i=-1375503360, s=153313034) at Posix/IO/write.c:6
#32 0x088c98d9 in IntInf_mul (
    lhs=0x88dc030
"U\211WVS\203<\213E\f\213}\020\213p\004\213_\004\2111\205\211U\017\210\003\002",
rhs=0xae037c00 "", bytes=153313034)
    at basis/IntInf.c:215
#33 0x273f85e7 in ?? ()
#34 0x77359401 in ?? ()
#35 0x00000010 in ?? ()
#36 0x088dc030 in __gmpz_ior ()


I note a couple of items:
1) We seem to hang in the gmp code,
   where it's trying to do some reallocation.
2) The last call mlton was responsible for was the call to IntInf_mul in
   runtime/basis/IntInf.c, which was tail-call elimated by gcc to the call
   to binary in runtime/basis/IntInf.c.  You'll note that the rhs doesn't
   seem to be a valid pointer.