[MLton] interrupted system call

Matthew Fluet fluet@cs.cornell.edu
Thu, 25 Mar 2004 09:50:03 -0500 (EST)


> Good question.  I think the right thing to expect is that the behavior
> is the same whether or not any signals arrive during the critical
> section.  That could be achieved by

> 1. Making atomicBegin block all signals and atomicEnd unblock all
>    signals.
>    (the observation that this lets us get rid of canHandle seems correct)
> The problem with 1 is that it is slow, making a system
> call for both atomicBegin and atomicEnd.

It's also overkill;  there are plenty of instances where one wants a
critical section to protect access to a ref cell (or other mutable
data) where no system calls are going to occur in the critical section.
And for these instances, we don't want the slowdown.

> 2. a. Preventing the ML signal handler from running in the critical section
>    b. Restarting any system calls that are interrupted by a signal.
>    c. Don't let signals prevent a system call from completing.  This
>       is done by blocking (some or all) signals when restarting.

I like this.  But I'll argue below that we should only block some signals
when restarting.

> > I think it's just two sides of the same coin:
> > A) Do I go out of my way to make system calls outside of critical regions
> >    so that my signal handler always gets a chance to run.
> > B) Do I go out of my way to handle intr exceptions so that I can put
> >    system calls inside critical regions.
>
> Maybe I'm still confused, but I don't see why you have to take either
> of these choices with 2abc (and maybe you weren't even saying that you
> do :-).

All I was trying to say is that there seemed to me to be two different
programming models on the table.  I was thinking back to Stephen's orignal
argument against using SA_RESTART:

> I've been playing around with using itimers to do timeouts, and I
> think there is a problem if we use SA_RESTART.  Suppose we have an SML
> program that installs an SML signal handler for Posix.Signal.usr1 and
> then makes a call to Posix.IO.readVec, which blocks at the read system
> call.  If a SIGUSR1 arrives, then the C runtime system handler will
> run, setting the limit so that at the next limit check the SML handler
> will run, and then terminate.  The system call will then restart and
> block.
>
> So, it looks like to me that the SML signal handler will never get a
> chance to run.  This seems wrong.

If I always want a chance to abort the read when a signal arrives, then I
need to program differently, depending on what model is being used.  If
the model is along the lines of 2abc, then in order to abort, I can't call
Posix.IO.readVec from within a critical region.  (Which makes perfect
sense, since entering the region I gave up the ability to run the signal
handler.)  But, I can put the abort code in the signal handler (or a
thread the signal handler switches to).  If the model is along the lines
of the canHandle check that I suggested, then I can call Posix.IO.readVec
from within a critical section, but I will need to handle the
EINTR SysError exception; but, I can put the abort code in the
exception handler.

So, I still claim that with 2abc, you need to program along A.

> > I'm just thinking about what it would take to make the basis library
> > thread safe.  If we wrap Posix.IO.readVec under a restart loop, then
> > either
> > 1) we must always call Posix.IO.readVec outside a critical region;
> >    seems like it would be hard to make StreamIO and ImperativeIO
> >    thread safe.
> > 2) we allow calls to Posix.IO.readVec inside a critical region of the
> >    Basis;  so, the programmer can assume that thier USR1 signal handler
> >    will run if they call Posix.IO.readVec outside a critical region, but
> >    not if they call TextIO.inputLine.
>
> (2) seems fine to me.  We use critical sections to make the basis
> library thread safe.  It seems OK to expect the programmer to know
> that, and hence to not expect their signal handler to be able to run.

I agree that (2) seems more natural.

The one other lingering issue that I want to bring up are the system calls
that require pre- and post- processing.  For example, recall
Posix.FileSys.{,l,f}stat:

            fun fromC (): stat =
               T {dev = Stat.dev (),
                  ino = Stat.ino (),
                  mode = Stat.mode (),
                  nlink = Stat.nlink (),
                  uid = Stat.uid (),
                  gid = Stat.gid (),
                  size = Stat.size (),
                  atime = Time.fromSeconds (Stat.atime ()),
                  mtime = Time.fromSeconds (Stat.mtime ()),
                  ctime = Time.fromSeconds (Stat.ctime ())}

      local
         fun make (prim, f) arg =
            (checkResult (prim (f arg))
             ; ST.fromC ())
      in
         val stat = make (Prim.Stat.stat, NullString.nullTerm)
         val lstat = make (Prim.Stat.lstat, NullString.nullTerm)
         val fstat = make (Prim.Stat.fstat, fn FD fd => fd)
      end

The fromC funtion calls a bunch of little C functions to read out the
fields from the  static struct stat statbuf.  So, make should probably be:

         fun make (prim, f) arg =
            (atomicBegin ()
             ; checkResult (prim (f arg))
             ; ST.fromC ()
             ; atomicEnd ())

because we don't want to switch threads between the time of the system
call that fills statbuf and all the calls that read statbuf.  Now, it
turns out the stat does not error with EINTR, so we don't need a restart
loop here.  So, this short critical region is o.k.; we'll exit it soon
enough for the ML signal handler to run.

What I'm worried about is a system call that needs this kind of pre- or
post- processing that can error with EINTR.  (I don't know if we have any,
but it seems likely.)

> I think this problem also goes away once we get CML in.

Watch this space. ;-)
But, you're right in that CML gives you
TextIO.inputLineEvt : instream -> (string option) CML.event,
although you essentially need to rebuild the IO hierearchy with mvars
instead of refs.

> > I meant the latter.  But, I don't think you should block all signals,
> > just the one you received.
>
> OK.  I think either way works.  Blocking only one signal means that
> the restart loop really needs to be a loop, because it can go around
> once for each signal.  Blocking all signals means that we only need to
> restart the system call once.  I don't see much reason to prefer one
> to the other.

Because if you block all signals, you're beholden to unblock all signals
when the ML signal handler gets a chance to run.  But, that may change the
(admitidly murky) semantics of the user's program.  Because if they had
already blocked some (but not all signals), then they are expecting those
signals to still be blocked, even after the ML signal handler runs on
another signal.