[MLton] Re: [MLton-commit] r6699

Matthew Fluet fluet at tti-c.org
Mon Jun 15 16:13:18 PDT 2009


On Mon, 15 Jun 2009, Wesley W. Terpstra wrote:
> On Mon, Jun 15, 2009 at 5:21 PM, Matthew Fluet<fluet at tti-c.org> wrote:
>> It's about the Win32 spawn* functions (and possibly the CreateProcess
>> function), which provide fork/exec-like functionality.
>>
>> The issue (as I understand it) is that the  char **argv  argument passed to
>> spawnv{,p}{,e} becomes the  const char **argv  argument passed to main of
>> the created process.  One doesn't expect the contents of those character
>> arrays to be changed from spawn{,p}{,e} to main (that is, one shouldn't need
>> to do any escaping at all and one certainly doesn't need to for the *nix
>> exec{,p}{,e} functions), but there is some evidence that MinGW does (or
>> un-does?) escaping of the arguments.
>
> The root problem is that windows does not have an **argv. That's a
> unix convention. Windows programs receive a single flat array (see
> CreateProcess). The crt has code which parses and splits this flat
> array to emulate argv functionality. exec() and spawn() functions have
> code which pastes the arguments together. Unfortunately, a
> long-standing bug in windows is that these pasting and parsing
> operations are NOT compatible.
>
> The MinGW (/ windows CRT) version of pasting is simply ("a", "b", "c")
> -> "a b c". Obviously this breaks for ("a b", "c") -> "a b c". That's
> why MinGW needs to escape arguments to spawn as well as CreateProcess.
> The escaping function in mlton/process.sml was hand-crafted to match
> the parsing done the windows crt at program start-up. The
> launchWithCreate method similarly combines ("a b", "c") -> "a b c",
> but after it escapes it's arguments the same as it would for spawn().
>
> Cygwin has to paste and parse arguments just as MinGW does, however,
> it's possible that the cygwin parsing/pasting actually matches (but I
> wouldn't bet on this). If they do match, then no escaping is needed
> for spawn. However, like MinGW, Cygwin sometimes calls CreateProcess.
> The arguments will need to be escaped and pasted together in whatever
> way matches the cygwin runtime. I don't know how the cygwin runtime
> parses it's single-argument, but was I read said:
>
>      (* In cygwin, according to what I read, \ should always become \\.
>       * Furthermore, more characters cause escaping as compared to MinGW.
>       * From what I read, " should become "", not \", but I leave the old
>       * behaviour alone until someone runs the spawn regression.
>       *)
>
> However, I didn't (and don't) have a cygwin to poke for the parsing
> algorithm used.

While I can understand the marshalling/unmarshalling of arguments through 
a single string, what I'm unclear on is where Cygwin and MinGW interpose 
their own conventions.  That is, spawn{,p}{,e} and CreateProcess are Win32 
functions (right?) --- yet Cygwin and MinGW interpose their own version 
that (may or may not) munge the arguments (before calling the "real" 
spawn{,p}{,e} and CreateProcess)?  Similarly, starting a program from the 
console should begin execution at main; though, technically, it is 
wherever the loader begins execution, so Cygwin and MinGW could provide 
their own _start (or whatever symbol it is in Windows) that (may or may 
not) unmunge the arguments before calling main.

Of course, when calling spawn{,p}{,e} or CreateProcess from a Cygwin or 
MinGW program, it can't know whether the called executable is itself a 
Cygwin, MinGW, or plain Windows program.  Similarly, when starting up, a 
Cygwin or MinGW executable can't know whether it was called via 
spawn{,p}{,e} or CreateProcess by a Cygwin, MinGW, or plain Windows 
(including CMD.exe) program.  So, I don't see why it is sensible for 
Cygwin or MinGW to munge/unmunge arguments at all, since it can't know 
what was/will-be done on the other end.


More information about the MLton mailing list