[MLton-user] From machine code back to the source code

Stephen Weeks MLton@mlton.org
Fri, 27 May 2005 17:25:55 -0700


> For various reasons (checking code quality, running oprofile) I'd like
> to find the approximate source code to parts of the machine code MLton
> has emitted.

A search for oprofile at the MLton site points at the following
thread.

  http://mlton.org/pipermail/mlton/2004-November/026347.html

So, at least we know that it works, although it looks like he got
stuck at finding the names like you did.

> Is it possible to instruct MLton to add comments with source location
> information to the assembler file (or produce a map file)?  At least
> when profiling information is generated, this data has to be stored
> somewhere.

There is no way to cause the assembler to be annotated.  For
profiling, the information is stored in various C structures in the
generated C file.  If you compile with "-keep g -profile time", the .S
files will have a label at the beginning of each basic block,
MLtonProfile<N>.  The source positions corresponding to these labels
can be gleaned from the .c file.

The sourceLabels array has one element for each MLtonProfile label.
Each element is a struct GC_sourceLabel, in which the label is
MLtonProfile<N> and the sourceSeqsIndex is an index into sourceSeqs.
The element of sourceSeqs is a pointer to an array of indices (first
element is the length) into the sources array.  Each element of
sources is a struct GC_source, where the nameIndex is an index into
sourceNames, which will actually give you a human readable name for
the source position as a string.

There is already a runtime routine in gc.c, showProf, that prints out
the data that mlprof needs from the C arrays.  You can invoke this
routine by calling an executable with "@MLton show-prof --".  What's
missing from that data is the connection between the MLtonProfile
labels and the sourceSeqsIndex, i.e. the sourceLabels array.  That
isn't needed by mlprof because it's only used at runtime to map the
program counter to the appropriate sourceSeqsIndex.

You could write a variant of showProf that prints out the
sourceLabelsArray in addition to the other data.  In fact, the loop is
already there, in the code conditioned on DEBUG_PROFILE in
profileTimeInit.  What that prints is the address of the MLtonProfile
labels.  Then, all that's missing is the mapping between the human
name MLtonProfile<N> and its address.  nm on the executable gives you
that.  So, a wrapper that calls the showProf variant and matches up
the data with what nm gives you should provide a nicely displayed map
from MLtonProfile<N> to sequence of source positions.