[MLton-user] SML unicode support

Alexandre Xlex0x835@rambler.ru
Wed, 5 Jan 2005 21:14:37 +0300

Dear subscribers,

this message was "moved" from comp.lang.ml, because of it's slow 
messages moderating.

 >The SML Basis library has an optional structure, WideChar, that
 >supports Unicode.  However, neither MLton nor SML/NJ implements
 >WideChar.  Also, neither compiler supports UTF-8 (or otherwise)
 >encoded string constants.

And what's the problem with WideChar? Is it difficult to implement it?
As far as I understand, I can store utf-8 character in a C char 
variable. At least the following C example (written to test this idea) 
work fine with an utf-8 russian, english mixed text (it just copy 
symbol to symbol from one text doc - test_file.in to other - 
test_file.out; tested on Darwin 7.7):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (void) {

	FILE *testfile_in, *testfile_out;
	unsigned char ch1 = NULL, ch2 = NULL;
	printf("sizeof ch1 - %li bytes\n", sizeof ch1);
	printf("sizeof ch2 - %li bytes\n", sizeof ch2);
	testfile_in = fopen("test_file.in", "r");
	testfile_out = fopen("test_file.out", "w");
	if (testfile_in == NULL) {
		printf("Input file open error.\n");
		return 0;
	}/* if */

	if (testfile_out == NULL) {
		printf("Output file open/creation error.\n");
		return 0;
	}/* if */
	while (ch1 = getc(testfile_in), !feof(testfile_in)) {
		ch2 = ch1; //make a copy
		putc(ch2, testfile_out);
	}/* while */


	return 0;
}/* main */

So, from http://mlton.org/ForeignFunctionInterfaceTypes it is possible 
to conclude, that an SML char/string (which are equal to char/char* 
accordingly) should be able to handle utf-8 characted/string... Or I 
understand something wrong?

 >We are working on adding support for Unicode to MLton, and expect it
 >to be in our next release.

That's nice! Actually I'm curious about MLTon team and Stephen in 
particular - they managed to do so many things and do it so well! ;)

 >In the meantime, you might have a look at fxp, which implements
 >Unicode encoders and decoders in SML without any compiler support.

Thanks for the link.