mlprof

Wed, 21 Mar 2001 20:36:42 -0800 (PST)

So I understand what's going on, I read through the mail from last summer
regarding signal stacks.  I've appended the mail to the end of this message.
Here's my understanding.

The kernel uses %esp, which normally corresponds to the C stack, to allocate
stack space for signal handlers.  Because MLton uses %esp as a general purpose
register, we need to tell the kernel to use an alternate space for handlers.
So, when a mlton executable starts, the runtime system calls sigaltstack to tell
the kernel to use an alternate signal stack, which the runtime allocates.  The
runtime allocates twice as much space as is necessary for the evaluation of a
signal handler, and wraps the space with dead zones on each side that will cause
a SEGV if touched.  The runtime tells the kernel to use the upper half of the
space it has allocated as the alternate signal stack.  When a signal arrives, if
%esp points into what the kernel thinks is the alternate stack, i.e. the upper
half of the mmap'ed space, then the kernel will use that as the stack.  Because
we have doubled the space, even if %esp points at the bottom of the space the
kernel knows about, we have an extra chunk below that it can safely use as the
stack.

Until today, there were no known problems with this scheme.

That's the end of my summary.  Please correct any of my misconceptions.

Here's some further evidence that the profiling signal handler is doing
something weird.  I prefixed the following onto logic.sml and compiled -p.

local
   fun handler () = TextIO.output (TextIO.stdErr, "got a sigprof\n")
   open MLton
   open Signal
   val _ = Handler.set (Signal.prof, Handler.simple handler)
in
end

With this prefix, the program ran fine.   As far as I can tell, the only
difference between this and the buggy version is that GC_handler is running
instead of the profiling signal handler.  Henry, can you mail out whatever code
is getting run by the profiling signal handler?  Thanks.

Here's all that old mail.

--------------------------------------------------------------------------------

From: Henry Cejtin <henry@sourcelight.com>
To: fluet@research.nj.nec.com, MLton@sourcelight.com
Subject: Re: Some results
Date: Fri, 21 Jul 2000 17:01:39 -0500

The signal stuff I understand (I think).  I'm pretty sure that unless you do
some magic, the signal handler stuff is going to use the running processes
stack.  This clearly is going to fail if %esp isn't pointing to a convenient
place.  There are system calls (again, I'm pretty sure) to force a switch to
a different stack, which we will have to use for the small code that actually
runs in responce to a Unix signal.  I'll investigate and send more after I have
checked it out.

--------------------------------------------------------------------------------

From: Henry Cejtin <henry@sourcelight.com>
To: MLton@research.nj.nec.com
Subject: alternate stacks for signal handlers
Date: Fri, 21 Jul 2000 20:25:19 -0500

Ok, here is the story with signal stacks.

First,  you  must  allocate a chunk of memory to be used as the signal stack.
It's size must be at least MINSIGSTKSZ, and really should be atleast SIGSTKSG
(these  are defined by including signal.h, and the latter is 8K).  (Actually,
I would make it 2 or 3 times that just to be safe  against  multiple  signals
coming  in really fast.)  Note, for protection against overflowing the signal
stack (not really a problem for us, but we should be safe) you should use the
mmap trick to surround it with regions mapped with no permissions.  Note, the
Intel stack grows by decreasing, so the most important boundary is on the low
side.

You  now  have  to execute a system call to indicate that the alternate stack
should be used.  Ths system call is sigaltstack() and it takes  2  arguments.
The  second  one  is how it returns the current state, and NULL is fine.  The
first is a pointer to a `stack_t', which is a  typedef'd  struct  having  the
following members:

    ss_sp           base address of signal stack
    ss_size         size of signal stack
    ss_flags        0

Next,  in  every  call to sigaction you should set the sa_flags member of the
struct sigaction to
    SA_RESTART | SA_ONSTACK
The SA_RESTART makes sure that any system call going on when  the  signal  is
delivered  will be restarted (instead of returning failure with errno EINTR).
The SA_ONSTACK is the flag that says to use the signal stack.

With this in place, there should be no  problem  with  interrupts.   When  an
interrupt  happens,  if  the  stack pointer does NOT point into the alternate
stack, it will be set to point there.  Then all the registers (including  the
original  stack  pointer)  will  be  saved on that stack and the process will
start running the signal handler code.

Note, if a signal comes in after sigaction has been called with SA_ONSTACK in
the  sa_flags  member  and  before sigaltstack() has been called, the process
will get a SIGSEGV, so the moral is call sigaltstack() first.

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@research.nj.nec.com>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Tue, 25 Jul 2000 10:39:38 -0400 (EDT)

The alternate stacks for signal handlers doesn't seem to be the problem
with threads and signals.  I made the changes that Henry outlined, but
they haven't changed any behavior -- either in the C-codegen (which still
works) or in the x86-codegen (which still doesn't work).

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@research.nj.nec.com>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Tue, 25 Jul 2000 19:12:55 -0400 (EDT)

> Well that is certainly disappointing.  Could you add something to the C
> signal handler so that we could see if it is making it into the signal handler?
> I.e., just
> 	{
> 		static char	msg[] = "Signal handler entered\n";
> 
> 		write(2, msg, sizeof(msg) - 1);
> 	}
> Actually, the best thing would be to print out the stack pointer at that
> stage and make sure that it really is switching to the new stack.

I think I finally figured the thread/signals stuff out.  Turned out to be
two additional bugs.  One was a problem with register allocation, which
I'm surprised wasn't flushed out by something else.  It was related to
having a long basic block after a C-call and ending up with one memory
location being cached by more than one register.  The second bug was in
the way I was implementing Thread_switchTo.  I haven't coded up the
"inline" version that appears in the current version of machine.h;
instead, I've been doing an invokeRuntime of GC_switchToThread.  Contrary
to the comment in machine.h, this method isn't entirely correct.  After
some playing around, I figured out that the crucial difference is that
GC_switchToThread did not perform the atomicEnd operation s->canHandle--.  
So, everything got screwed up because the GC always thought it was already
handling a thread operation.

Just for kicks, I tried turning off the alternate signal stack handler to
see what happens.  Stuff still seems to work, but then again, the only
example program I've got to check signal handling is the signals.sml file.
I think that makes sense, since the child thread is spending most of it's
time in a sleeping -- which gets turned into a C-call, so when it receives
the signal, it can use the C-stack that got set up for the call.  But, I
if I turn the sleep into a busy loop (i.e., no C-call), then I get 
erroneous behavior.  So, the alternate signal stack is important.

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@research.nj.nec.com>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Wed, 26 Jul 2000 12:04:11 -0400 (EDT)

> Excellent  bug tracking.  Looking at the Linux kernel source, I'm still a bit
> nervous about the alternate stack stuff.  It seems that the kernel decides if
> you  are  already  using  the alternate stack by looking at the current stack
> pointer.  If this is true, then it means that you can't use the stack pointer
> register  as  a general purpose register.  If it just so happens to contain a
> value which looks like a pointer into the alternate  stack  then  the  kernel
> concludes  that  it  is  already in the alternate stack and doesn't reset the
> stack pointer.  This  is  really  horrible,  and  instead  the  kernel  could
> maintain  a  flag  (really  a  counter)  if  it  has  already switched to the
> alternate stack.  I don't know what SML/NJ does.  At any rate, this shouldn't
> be  a  problem  if  you always have a pointer to some other data in the stack
> pointer register.

I think we should be o.k. with using the stack pointer.  Most of the time,
%esp will look like gcState.frontier.  It will be a pointer into memory
and potentially memory "near" to where I mmaped the alternate signal
stack.  But, if the kernel is just doing a range check, that will be fine.
Just before a C-call, %esp is restored to the original C-stack pointer.
After a C-call, %esp is used more as a general purpose register.
Immediately after the call, it will still point to the C-stack, but the
register allocator is free do do anything it wants with it.  I don't
immediately re-cache gcState.frontier, because sometimes we get a lot of
C-calls in a row (particularly for IO) and we end up shuffling values into
and out of memory for no reason.

I guess there is a small chance that %esp could end up being a value that
happens to fall into the alternate signal stack range.  But, I think that
falls back on the original point that the kernel should really track
whether or not it is currently handling a signal or not.

--------------------------------------------------------------------------------

From: Henry Cejtin <henry@sourcelight.com>
To: fluet@research.nj.nec.com, MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Thu, 27 Jul 2000 01:44:24 -0500

I  don't  see  how,  given  the  Linux  kernel's cheap test to see if you are
already using the alternate stack, that you can ever view esp  as  a  general
register.   I  just did a test, with a bit of tweaked assembler code, and the
kernel does just what it appears to do: If a signal  comes  in  and  the  esp
register  happens  to  point  in  the  range of locations where you said your
alternate stack was, then the stack pointer is not changed, it is  just  used
as  is.   This `small' chance that esp happens to point into that region WILL
happen eventually, and you won't be able to duplicate the failure.  I  REALLY
hate that kind of bug.

Ah,  here  is a really grotesque hack.  If the size you decide you need for a
signal stack is N, then you allocate 2*N space and in the call to sigaltstack
you say that the alternate stack is
    start of space + N
and  N bytes long.  If a signal comes in and you were NOT using the alternate
stack, but the esp register happens to be in  this  range,  then  the  kernel
won't  bother  changing it and you will start to use it, but since you have N
bytes below (below because the stack grows on Intel chips by decreasing) that
you can safely write in, you still have your N bytes at least of stack.

Of  course  you also need to allocate some dead space so that if you overflow
the signal stack you  will  die  instead  of  silently  corrupting  yourself.
Again,  since  the  Intel stack grows by decreasing, you have to put the dead
page before the stark of the 2*N bytes.

I did a quick check of SML/NJ to see what they do, and I am  confused.   They
don't  ever  seem  to  even call sigaction, but they must be doing something.
I'll investigate more later.  I'm quite curious what they do.

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@research.nj.nec.com>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Thu, 27 Jul 2000 09:23:51 -0400 (EDT)

That's really unfortunate the way that Linux handles alternate signal
stacks.  I'll set up the register allocator so that %esp always
corresponds to either the C-stack or to gcState.frontier.  I'm going to
arrange it as follows: on entry to a block, %esp will be gcState.frontier.
At the first C-call, %esp will be the C-stack.  On return from the C-call,
%esp will continue to correspond to the C-stack (so a subsequent C-call
doesn't incur the overhead of reloading the C-stack).  At the end of the
block, %esp will be forced to correspond to gcState.frontier (which will
do nothing if it never got bumped for a C-call).  I think that should
solve the general problem.  We lose out a little bit if we need
gcState.frontier sometime after a C-call, but that should be o.k.

--------------------------------------------------------------------------------

From: "Stephen Weeks" <sweeks@intertrust.com>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Thu, 27 Jul 2000 10:22:50 -0700 (PDT)

> That's really unfortunate the way that Linux handles alternate signal
> stacks.  I'll set up the register allocator so that %esp always
> corresponds to either the C-stack or to gcState.frontier.  

Just curious -- why did you choose this instead of Henry's double the
space hack?  I think the space cost of his is negligible, and it does
let you have one more register.

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@CS.Cornell.EDU>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Mon, 31 Jul 2000 17:37:45 -0400 (EDT)

In cleaning up gc.c, I wanted add in the dead-zones around the alternate
signal stack.  Henry, does this look sufficient for mmapping an alternate
signal stack with dead-zones?  Also, how big dead zones should we need?
Currently, I'm mmaping a space 2 * 4 * SIGSTKSZ -- a 4 * SIGSTKSZ "real"
stack with the doubling trick.  Should just 1K or 2K be sufficient arround
the stack?  Or do I need any other page alignments?

/* A super-safe mmap.
 *  Allocates a region of memory with dead zones at the high and low ends.
 *  Any attempt to touch the dead zone (read or write) will cause a
 *   segmentation fault.
 */
static void *ssmmap(size_t length, size_t dead_low, size_t dead_high) {
  void *base,*low,*result,*high;

  base = smmap(length + dead_low + dead_high);
  smunmap(base, length + dead_low + dead_high);

  low = mmap(base, dead_low, 0, 
	     MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0);
  if (low == (void*)-1)
    die("mmap failed");

  result = mmap(low + dead_low, length, PROT_READ | PROT_WRITE, 
		MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0);
  if (result == (void*)-1)
    die("mmap failed");

  high = mmap(result + length, dead_high, 0, 
	      MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0);
  if (high == (void*)-1)
    die("mmap failed");

  return result;
}

--------------------------------------------------------------------------------

From: Matthew Fluet <fluet@CS.Cornell.EDU>
To: MLton@sourcelight.com
Subject: Re: alternate stacks for signal handlers
Date: Mon, 31 Jul 2000 19:40:18 -0400 (EDT)

O.K.  The mprotect version works fine.

On Mon, 31 Jul 2000, Henry Cejtin wrote:

> The  hardware  is incapable of doing anything that isn't a multiple of a page
> size (4K on Intel).
> 
> Also, instead of using mmap/mmunmap to get where to put it, why not just  use
> mprotect()?   I.e.,  mmap  one  region  with  all  the  protection  you want,
> including the 2 dead  zones,  and  then  use  mprotect()  to  make  the  ends
> unaccessible.  (Mind you, I think I'm responsible for the original version of
> this in MLton.  I must have forgotten about mprotect at the time.)
> 
> As to sizes, I would think that 8K of stack should be  plenty  (since  it  is
> just for the C routine that sets some flag).  This I would say that the total
> mmap would be for 2*8K (2 times the stack because of the hack) plus 2*4K (one
> page at each end).  Actually, you don't need the page at the high end because
> the stack grows by using lower addresses, so I would go for a total  of  20K,
> with  the  bottom 4K being inaccessible and the region officially used as the
> alternate stack being from base+12K to base+20K.
> 

--------------------------------------------------------------------------------