[MLton] Windows port of MLton using the Microsoft tools (e.g. without MinGW)

Nicolas Bertolotti nicolas.bertolotti at polyspace.com
Thu Jul 26 06:42:50 PDT 2007


> > I've personally tried to evaluate the amount of work that would be
> needed by
> > doing the following :
> >
> > -          port the GNU MP library in order to build it using the
> Microsoft
> > C compiler
> 
> MLton uses only a fraction of the GNU MP library, and uses it out of
> convenience and performance, not necessity.  That is, implementing IntInf

I don't know what to say about GNU MP. It seems that what I did works, at
least for MLton so it was fine for me. Note that the GNU MP provides some
functions written in assembly code which I had to deactivate (and fall back
to the generic C version that is also provided).

> 
> > o        convert the MLton runtime C code to C89 (it is written in C99
> which
> > leads to a number of minor issues)
> 
> How extensive are these changes?  I'm wary of falling back to C89, since I
> imagine that one still needs to make use of many extensions.  For example,
> AFAIK, C89 didn't provide the uint<n>_t and int<n>_t fixed-width integer
> types, which makes specifying the C representations for the ML types
> Word<N>.word and Int<N>.int slightly more difficult.
> 

Most changes correspond to the fact that it is not possible to declare a
variable in the middle of a block in C89
e.g. : for (int i = 0; ...
       x++; int y; ...
Really not big issues

> > o        take into account the fact that the CRT functions which are
> > equivalent to some LIBC functions are prefixed with an '_'
> 
> Is this an issue in C code or an issue in assembly code?  On *-darwin and
> x86-mingw, we need to add a leading underscore to symbols in the assembly
> files, so there is compiler hook for ensuring that leading undrescores
> are added -- but that would be to all symbols, not just some functions.
> 

It is an issue in C code. The MinGW libraries provide some aliases and this
is why we don't face the same issue with MinGW.

> > o        include the code for some MinGW specific extensions in the
> MLton
> > runtime (opendir/readdir/closedir/rint etc .)
> 
> Here, I would be worried about the licensing issues of borrowing
> significant portions of MinGW code.  It's not a blocking issue for this
> investigation, and we could certainly look at more native Win32 functions
> that accomplish the same thing; also, look at the way SML/NJ handles this
> functionality, since they don't use MinGW or Cygwin, and also have a
> compatible license.

The code for the MinGW runtime is not copyrighted and placed in the public
domain (it's not even GPL).

> 
> > o        the generated C code may define some empty arrays which are not
> > accepted by the Microsoft compiler
> 
> Those can probably be eliminated by the C codegen.
> 

Sure, I just didn't want to break everything before I can evaluate the
concept. 

> > -          write a perl script that converts the assembly code MLton
> > generates in order to build it using the Microsoft assembler
> 
> What changes were needed here?
> 
> > Unfortunately, the binaries I get by doing this always crash at the
> > beginning of the execution.
> 
> Hard to say what might be wrong without more information.

The syntax is really different :
- In the Microsoft assembler which uses the Intel syntax, there is no suffix
in the opcodes which gives indication about the size of the operands. It is
different in GAS which uses the AT&T syntax.
- You have to swap the left and right operands
etc ...
Anyway, it remains assembly code and it should be an isomorphic
transformation.

As I mentioned, I'm not an expert and I could not find a way to build an
object file that does not lead to a crash. It is the part for which I would
appreciate someone helps me.

> 
> > As I'm not really an expert in using the Microsoft assembler, I tried to
> > simplify this a bit by skipping the assembly code conversion script and
> > using the GNU assembler anyway (but still compile the runtime and link
> using
> > the Microsoft tools).
> 
> You could also try starting with just the C codegen.  That tends to be a
> little more portable and easier to debug.
> 

Yes, and it seemed to work. This is why I decided to go further. As a matter
of fact, we achieved a 40% performance increase using the native codegen so,
at the end, it won't be possible for me to use it if I only have the c
codegen.

> > This time, I could get some binaries (even some big ones) that run
> pretty
> > fine.
> 
> Very, very nice!
> 
> Nicolas, you might prepare a patch of the changes you needed to make and
> post it to:
>    http://www.mlton.org/TemporaryUpload
> That would let us take a look at what has been done.

I worked on the source code I downloaded from the amd64 branch before you
merge it with the trunk. It's probably better that I do it from a clean copy
of the trunk. Many things are happening those days in my company and I'll
soon be on vacation so I'm not sure I can do it very quickly.





More information about the MLton mailing list