[MLton] Crash during cheney-copy on Windows

Nicolas Bertolotti Nicolas.Bertolotti at mathworks.fr
Tue Sep 1 10:17:00 PDT 2009


I am facing a sporadic signal 11 (segmentation fault) on Linux which could be caused by the same bug.

After enabling assertions, I could identify that there was a sporadic assertion failure in updateCrossMap() :
void updateCrossMap (GC_state s) {
  GC_cardMapIndex cardIndex;
  pointer cardStart, cardEnd;
…
  cardEnd = cardStart + CARD_SIZE;
loopObjects:
  assert (objectStart < oldGenEnd);     <= this assertion may sporadically fail 
  assert ((objectStart == s->heap.start or cardStart < objectStart)
          and objectStart <= cardEnd);
…

The assertion fails during the execution of a minor cheney copy that occurs after 2 calls to GC_pack() and the heap size did not change during the execution of the 2nd GC_pack().

I finally identified that the SVN revision r6776 introduced a change that was motivated by the fact we need to clear the cross map after every major GC. But, if we look at the code, we can see that the cross map is only cleared when the ‘mayResize’ flag is set (and, as a matter of fact, this flag is not set by GC_pack()) :
void majorGC (GC_state s, size_t bytesRequested, bool mayResize) {
…
  if (mayResize) {
    resizeHeap (s, s->lastMajorStatistics.bytesLive + bytesRequested);
   setCardMapAndCrossMap (s);
}
…
}
As the revision r6776 also introduces the removal of some calls to clearCrossMap() which were performed systematically at the end of a major cheney-copy or major mark-compact, it seems to me that the call to setCardMapAndCrossMap(s) should actually always be performed (or maybe adding a else { clearCrossMap(s); } would be enough).

I moved the call to setCardMapAndCrossMap(s) after the if and it seems to solve the issue (anyway, as it was sporadic, I am not so sure)

What do you think ?

Nicolas

> -----Original Message-----
> From: mlton-bounces at mlton.org [mailto:mlton-bounces at mlton.org] On
> Behalf Of Nicolas Bertolotti
> Sent: Thursday, February 12, 2009 7:56 PM
> To: Daniel Spoonhower
> Cc: mlton at mlton.org
> Subject: RE: [MLton] Crash during cheney-copy on Windows
> 
> > It's not clear to me exactly what debugging you are able to enable
> and
> > still observe the problem.  I believe the most important check would
> > be "invariantForGC" which is run at the beginning and end of each
> > collection.  Are you able to run this function and still observe the
> > problem?  (It is in the debug version of the runtime and is also
> > enabled by -DASSERT=1.)
> 
> The assertions are enabled so invariantForGC() is called and does not
> reveal anything.
> 
> >
> > If you are suspicious of old data in the heap, you could try
> > explicitly clearing the new heap at the beginning of a Cheney copy
> > (i.e. in majorCheneyCopyGC).
> 
> I examined the array allocation routine and it definitely properly
> resets the contents of all the cells.
> 
> It is also not a limit case (such as a '<' instead of a '<=' somewhere)
> either as the crash appears to occur around the cell 15000 of a 32768
> cells array.
> 
> Still investigating ...
> 
> >
> >
> > --djs
> >
> > Nicolas Bertolotti wrote:
> > > Hello,
> > >
> > >
> > >
> > > I am currently facing a crash of my product during the cheney-copy
> > > operation on Windows. This crash is very hard to reproduce as it is
> > very
> > > volatile (some slight changes in the SML code make it disappear ;
> it
> > > depends on the memory amount etc…).
> > >
> > >
> > >
> > > I finally could activate some debug messages and assertions (it is
> > not
> > > full the debugging mode because enabling it causes the issue to
> > disappear) :
> > >
> > > [GC: Starting gc #73; requesting 512 nursery bytes and 0 old-gen
> > bytes,]
> > >
> > > [GC:    heap at 0x31880000 of size 710,967,296 bytes,]
> > >
> > > [GC:    with nursery of size 617,405,820 bytes (86.8% of heap),]
> > >
> > > [GC:    and old-gen of size 93,561,476 bytes (13.2% of heap),]
> > >
> > > …
> > >
> > > [GC: Starting major Cheney-copy;]
> > >
> > > [GC:    from heap at 0x31880000 of size 710,967,296 bytes,]
> > >
> > > [GC:    to heap at 0x08fc0000 of size 710,967,296 bytes.]
> > >
> > > …
> > >
> > > [GC: Finished major Cheney-copy; copied 97,279,788 bytes.]
> > >
> > > …
> > >
> > >  [GC: Starting gc #77; requesting 512 nursery bytes and 0 old-gen
> > bytes,]
> > >
> > > [GC:    heap at 0x08fc0000 of size 710,967,296 bytes,]
> > >
> > > [GC:    with nursery of size 612,833,480 bytes (86.2% of heap),]
> > >
> > > [GC:    and old-gen of size 98,133,816 bytes (13.8% of heap),]
> > >
> > > …
> > >
> > > [GC: Starting major Cheney-copy;]
> > >
> > > [GC:    from heap at 0x08fc0000 of size 710,967,296 bytes,]
> > >
> > > [GC:    to heap at 0x31880000 of size 710,967,296 bytes.]
> > >
> > > …
> > >
> > > foreachObjptrInObject (0x318c2318)  header = 000004c7  tag = ARRAY
> > > bytesNonObjptrs = 0  numObjptrs = 1
> > >
> > > forwardObjptr  opp = 0x318c2318  op = 0x00000000091b7714  p =
> > 0x091b7714
> > >
> > > forwardObjptr --> *opp = 0x319f57dc
> > >
> > > …
> > >
> > > forwardObjptr  opp = 0x318d1354  op = 0x00000000091d78c4  p =
> > 0x091d78c4
> > >
> > > forwardObjptr --> *opp = 0x31a173ac
> > >
> > > forwardObjptr  opp = 0x318d1358  op = 0x00000000318da094  p =
> > 0x318da094
> > >
> > > Assertion failed at line 58 of file gc/object.c
> > >
> > > ===> corresponds to the “assert (1 == (header &
> > GC_VALID_HEADER_MASK));”
> > > at the beginning of the splitHeader() function
> > >
> > >
> > >
> > > Using those debug messages, we can see that all calls to
> > forwardObjptr()
> > > are performed on objects whose address is, as expected, in the
> “from”
> > > heap whereas the last call that leads to a crash receives an
> invalid
> > > pointer whose address is in the “to” heap.
> > >
> > >
> > >
> > > We also see that the 2 heaps have been allocated during a previous
> > call
> > > to the garbage collector and a previous “cheney-copy” has already
> > been
> > > performed between those.
> > >
> > >
> > >
> > > I suspect that maybe a previous GC operation left some old pointers
> > in
> > > one of the heaps and those have not been properly cleared during an
> > > object allocation or so on.
> > >
> > >
> > >
> > > In any case, I simply don’t know what to do in order to identify
> the
> > > root cause of the issue. Any hint ?
> > >
> > >
> > >
> > > Best regards
> > >
> > >
> > >
> > > cid:image001.gif at 01C7BFD3.87CF8F80 <http://www.mathworks.fr/>
> > >
> > >
> > >
> > >
> > >
> > > Accelerating the pace of  engineering and science
> > <http://www.mathworks.fr/>
> > >
> > > *Nicolas Bertolotti*
> > > Senior Development Engineer
> > >
> > >
> > >
> > > 2 Rue de Paris
> > > 92196 Meudon Cedex
> > >
> > > France
> > >
> > > Nicolas.Bertolotti at mathworks.fr
> > <mailto:Nicolas.Bertolotti at mathworks.fr>
> > >
> > >
> > >
> > > tel:
> > > fax:
> > > mobile:
> > >
> > >
> > >
> > > +33.1.41.14.88.55
> > >
> > > +33.1.55.64.06.64
> > >
> > > +33.6.86.41.87.15
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------------------
> -
> > > -
> > ---
> > >
> > > _______________________________________________
> > > MLton mailing list
> > > MLton at mlton.org
> > > http://mlton.org/mailman/listinfo/mlton


More information about the MLton mailing list