[MLton] symbol scopes

Wesley W. Terpstra wesley at terpstra.ca
Wed Nov 12 10:56:26 PST 2008


On Wed, Nov 12, 2008 at 7:17 PM, Matthew Fluet <fluet at tti-c.org> wrote:
> I still don't know that I understand the necessity for the difference
> between public and external for an imported symbol.  [Indeed, on ELF, there
> appears to be no difference.]

That's correct. They're the same on ELF.

On darwin, however, you need to output that function call stub to
access external functions. One could output it always (MLton used to
do this), so in this case it's only an optimization.

> My intuition is the following: a symbol that
> is publically exported can be accessed via an 'external' assembly sequence
> by another DSO; since that sequence works for another DSO, why doesn't it
> also work for this DSO.

This is the case for every single platform except windows. The problem
there is that the address of an imported external symbol is copied by
the runtime loader into the local symbol __imp__symname. These
__imp__symbols appear in your program when you link against  the
import library for a DLL. So within the DLL itself, __imp__symname
doesn't exist (the import library is a separate thing) and you can't
access the symbol this way. Outside of the DLL, you need to look in
__imp__symname to find the address of the symbol you want to access.
>From this pure point-of-view you would always have to get
public/external right for linking to work on windows.

However, as a compatibility hack to make *nix-like function calls
work, MinGW also adds a local function called 'symname' which calls
*__imp__symname to the import libraries. That's why niave code can
often get away with just accessing a function without the __imp__
during both static and dynamic linking. However, that function is
local and has a different address than __imp__symname, so if you take
the address of symname, it is your local proxy function, not
*__imp__symname like it should be. Furthermore, that trick only works
for functions, not variables.

> Is it just an optimization?  Within the DSO, I can get to the public symbol
> via relative addressing, but from another DSO (with PIC), I need an
> indirection?

Actually, on ELF the optimization is going from public to private.
There is no benefit to public over external. If you try to access a
public symbol via relative addressing, the linker will rage out. This
is due to a "feature" of ELF where you can override symbols. If I have
a library with two public methods "foo" and "bar", with foo calling
bar, but you link in my library and provide your own bar, my foo will
call your bar. I personally think this is a horrible idea, but it is
part of the ELF ABI. For this reason on ELF public lookups cannot be
done via relative addressing but must use the PLT.

So, no. Except for darwin public/external is not an optimization. On
windows you really do need to get it right and on ELF it makes no
difference.

> The other thing I don't understand is how gcc gets by with just two
> visibility directives ("default" and "hidden"); how does it make the
> private/public/external distinction?

Note that default/hidden are ELF specific. As you note earlier, on ELF
public=external in every respect. This is why two suffice.

There are also many more attributes than just those two. There is
also: internal, protected, extern. However, these don't really matter
except as optimizations.

> I guess it is just for Win{32,64} that
> the EXTERNAL/PUBLIC/PRIVATE macros map to different annotations

Correct.

> (though, I think, that is partly due to the fact that a single C declaration needs to
> serve as both an export directive (for the definition) and an import
> directive (for other uses)).

No. They really do do different things as I explained above.

> But, for Darwin, both EXTERNAL and PUBLIC map
> to the same annotation --- is the MLton x86/amd64 codegen for Darwin simply
> avoiding a cleanup step that would be performed by the linker (if we treated
> 'public' as 'external')?

I'm not sure what you're asking me..? Currently the darwin codegen
avoids making PLT stub indirections for public symbols, but that's
just an optimization.

Just a note, I think if you tried to use _import "foo" external for an
_export "foo" private, you would have problems on both darwin and ELF.

Anyway, it's true that no single operating system (so far) needs all
three cases at once, although I wouldn't be so sure for other
architectures. These three categories were the simplest framework I
could come up with that when used according to the rules would work on
every system I've seen.

The rules aren't that complicated:
1. match private/public within a DSO
2. import using external from another DSO

I also had hoped that the three categories are simple to understand.
It's not obvious how they map to each platform until you've really
understood the ABI of the target, but I am confident that they do
work. I am a bit nervous that an as-yet-unsupported platform might
have an ABI that doesn't fit into this categorization, but it's hard
to imagine how you could break scope further down than
private/public/external.



More information about the MLton mailing list