[MLton] can mlbasis rename the top-level?

Matthew Fluet fluet@cs.cornell.edu
Tue, 6 Sep 2005 22:49:00 -0400 (EDT)


> In all fairness, if we agree that people would mostly be using the feature to
> embed short snippets of SML, then I don't find this a convincing argument. The
> shorter the snippet of code is, the faster you can spot unbalanced parentheses.
>
> I would also expect that parentheses would be relatively rare in 
> embedded SML. The reason for this is that you would mostly be dealing 
> with modules and renaming top-level declarations. Not much need to use 
> parentheses there.

I don't agree that the feature is just for short snippets of SML.  In 
particular, all the renaming of Module level constructs (structures, 
signatues, and functors) are available in MLBs without the embedding 
feature.  If you want a self-contained script, then you'll have a lot of 
SML embedded.

>> But, suppose you sent this to mlton.  What's likely to happen is that 
>> you completely throw the parser off.  Worse, if you had an extra close 
>> parenthesis in the middle of the SML code (i.e., ending the embedded 
>> SML early), then you're likely to have a horribly opaque MLB parse 
>> error pointing into the middle of what is manifestly SML code.
>
> Isn't that (having too many/few parens inside the SML code) also possible
> with the {<{ and }>} delimiters?
>
> Unless I'm mistaken, use of {<{ and }>} only makes it easier to see that the
> MLB delimiters are in place. It doesn't seem to help with delimiters inside
> the embedded SML code.

No, the point is that the lexer simply slurps up _everything_ from {<{ to 
}>} without interpretting it at all.  No trying to balance parens or 
grammatical constructs.  To be clear: I'm in favor of not even 
trying to distinguish }>} in SML constants or strings.

>> Similarly, if you try counting {local,let,sig,struct}...end blocks, an SML
>> parse error might manifest itself as an inexplicable MLB parse error.
>
> That seems preferable to me. To be more precise, I think that it would be
> better to get a parse error while parsing the MLB file rather than possibly
> much later while parsing the embedded SML code. It seems to me that counting
> blocks would allow earlier reporting of errors.

But now you'll have some ad-hoc error generated in the MLB lexer, which 
won't be the same as the corresponding SML grammar error that you would 
have gotten if the code were in a separate file.