Difference between revisions of "DIP45"

From D Wiki
Jump to: navigation, search
Line 1: Line 1:
{| class="wikitable"
+
{| class="wikitable" !Title: !'''making export an attribute''' |-
!Title:
+
|DIP: |45 |- |Version: |2 |- |Status: |Draft |- |Created: |2013-08-27
!'''making export an attribute'''
+
|- |Last Modified: |2013-09-06 |- |Author: |Benjamin Thaut, Martin
|-
+
Nowak |- |Links: |
|DIP:
 
|45
 
|-
 
|Version:
 
|1
 
|-
 
|Status:
 
|Draft
 
|-
 
|Created:
 
|2013-08-27
 
|-
 
|Last Modified:
 
|2013-09-01
 
|-
 
|Author:
 
|Benjamin Thaut
 
|-
 
|Links:
 
|
 
 
*http://forum.dlang.org/post/5112D61B.5010905@digitalmars.com
 
*http://forum.dlang.org/post/5112D61B.5010905@digitalmars.com
 
*http://forum.dlang.org/post/kvhu2c$2ikq$1@digitalmars.com
 
*http://forum.dlang.org/post/kvhu2c$2ikq$1@digitalmars.com
*http://d.puremagic.com/issues/show_bug.cgi?id=9816
+
*http://d.puremagic.com/issues/show_bug.cgi?id=9816 |}
|}
 
  
 
==Abstract==
 
==Abstract==
  
Export and its behavior need to be changed in serveral ways to make it work on Windows and allow better code generation for other plattforms. The Rationale section explains the problem and shows how this DIP solves it.
+
Export and its behavior need to be changed in serveral ways to make it
 +
work on Windows and allow better code generation for other
 +
plattforms. The Rationale section explains the problems and shows how
 +
this DIP solves them.
  
 
==Description==
 
==Description==
  
* The 'export' protection level should be turned into a 'export' attribute.
+
* The '''export''' protection level should be turned into a
* If a module contains a single symbol annotated with the 'export' attribute all compiler internal symbols of this module should recieve the 'export' attribute too (e.g. module info).
+
* '''export''' attribute. If a module contains a single symbol
* If a class is annotated with the 'export' attribute, all of its public and protected functions and members will automatically recieve the 'export' attribute. Also all its hidden compiler specific symbols will recieve the 'export' attribute.
+
* annotated with the 'export' attribute all compiler internal symbols
* It should be possible to access TLS variables across DLL / shared library boundaries.
+
* of this module should recieve the 'export' attribute too
* On *nix systems default symbol visibility is changed to hidden, and only with export marked symbols are visible.
+
* (e.g. module info). If a class is annotated with the 'export'
 +
* attribute, all of its public and protected functions and members
 +
* will automatically recieve the 'export' attribute. Also all its
 +
* hidden compiler specific symbols will recieve the 'export'
 +
* attribute. There should be only one meaning of 'export'.  It should
 +
* be possible to access TLS variables across DLL / shared library
 +
* boundaries. On *nix systems default symbol visibility is changed to
 +
* hidden, and only symbols marked with export become visible.
  
 
==Rationale==
 
==Rationale==
Line 44: Line 33:
 
===Turning export into an attribute===
 
===Turning export into an attribute===
  
Currently the export protection level is the highest level of visibility. This however does not allow exporting protected members. Exporting protected members is neccessary sometimes though.
+
Currently '''export''' is a protection level, the highest level of
Consider the following example.
+
visibility actually. This however conflicts with the need to export
 +
'protected' symbols. Consider a Base class in a shared library.
  
<syntaxhighlight lang=D>
+
<syntaxhighlight lang=D> module sharedLib;
module sharedLib;
 
  
class Base
+
class Base { protected final void doSomething() { ... } }
{
 
  protected final void DoSomething() { ... }
 
 
 
  public void func()
 
  {
 
    DoSomething();
 
  }
 
}
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
  
<syntaxhighlight lang=D>
+
<syntaxhighlight lang=D> module executable; import sharedLib;
module executable;
 
import sharedLib;
 
  
class Derived
+
class Derived : Base { public void func() { doSomething(); } }
{
 
  protected final void DoSomethingAdditional() { ... }
 
 
 
  public ovveride void func()
 
  {
 
    DoSomething();
 
    DoSomethingAdditional();
 
  }
 
}
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
In the above example 'DoSomething' should only be visible to derived classes but should still be exported from a shared library otherwise it will lead to a linker error. This does not work with the current implementation of 'export'. Turning 'export' into an attribute will make this work.
+
In the above example 'doSomething' should only be visible to derived
 +
classes but it still needs to be exportable from a shared library.
 +
Therefor '''export''' should become a normal attribute which behaves
 +
orthogonal to protection.
  
===Implicitly exporting compiler internal symbols of a module===
+
===Implicitly exporting compiler internal symbols===
  
Currently compiler internal symbols of modules are not exported at all. This leads to linker errors when linking against shared libraries on windows. By marking compiler internal symbols as 'export' as soon as there is a single exported symbol in the module will fix this problem.
+
All compiler internal symbols need to be treated as exported if using
 +
an exported symbol might implicitly reference them to avoid link
 +
errors. The most prominent example is the ModuleInfo which needs
 +
linkage if the module has a ''static this()''.
  
===export attribute inferrence===
+
===export attribute inference===
  
Currently export has to be specified in a lot of places to export all neccessary functions and data symbols. Export should be transitive in such a sense that it only needs to be specified once in a module to export all of its functions / data members including classes and their members / data symbols. Consider the following example:
+
Currently export has to be specified in a lot of places to export all
 +
neccessary functions and data symbols. Export should be transitive in
 +
such a sense that it only needs to be specified once in a module to
 +
export all of its functions / data members including classes and their
 +
members / data symbols. Consider the following example:
  
<syntaxhighlight lang=D>
+
<syntaxhighlight lang=D> module sharedLib:
module sharedLib:
 
  
 
export:
 
export:
Line 97: Line 76:
 
void globalFunc() { ... } // should be exported
 
void globalFunc() { ... } // should be exported
  
class A // compiler internal members should be exported
+
class A // compiler internal members should be exported { private: int
{
+
m_a;
  private:
 
    int m_a;
 
  
 
     static int s_b; // should not be exported
 
     static int s_b; // should not be exported
Line 106: Line 83:
 
     void internalFunc() { ... } // should not be exported
 
     void internalFunc() { ... } // should not be exported
  
   protected:
+
   protected: void interalFunc2() { ... } // should be exported
     void interalFunc2() { ... } // should be exported
+
 
 +
  public: class Inner // compiler internal members should be exported
 +
     { static s_inner; // should be exported
 +
 
 +
      void innerMethod() { ... } // should be exported }
  
  public:
+
     void method() { ... } // should be exported }
    class Inner // compiler internal members should be exported
 
     {
 
      static s_inner; // should be exported
 
  
      void innerMethod() { ... } // should be exported
+
private class C // should not be exported { public void method() {
    }
+
... } // should not be exported } </syntaxhighlight>
  
    void method() { ... } // should be exported
+
===A single meaning of '''export'''===
}
 
  
private class C // should not be exported
+
The classical solution to handle dllexport/dllimport attributes on
{
+
Windows is to define a macro that depending on the current build
  public void method() { ... } // should not be exported
+
setting expands to __declspec(dllexport) or to __declspec(dllimport).
}
+
This complicates the build setup and means that object files for a
</syntaxhighlight>
+
static library can't be mixed well with object files for a DLL.
 +
Instead we propose that export symbols are accompanied with weak
 +
import aliases and always accessed through them. See the
 +
implementation detail section for how this will work for
 +
[[#Data_Symbols|data symbols]] and [[#Function_Symbols|function
 +
symbols]]. That way a compiled object file can be used for a DLL or a
 +
static library. And vice versa an object file can be linked against an
 +
import library or a static library.
  
 
===Access TLS variables===
 
===Access TLS variables===
  
Currently it is not possible to access TLS variables across shared library boundaries. This should be implemented. See implementation details.
+
Currently it is not possible to access TLS variables across shared
 +
library boundaries on Windows. This should be implemented (see
 +
[[#TLS_variables|implementation details]] for a proposal).
  
 
===Change symbol visibility on *nix systems===
 
===Change symbol visibility on *nix systems===
  
When building shared libraries on *nix systems all symbols are visible by default. This can lead to long loading time of shared libraries if they contain a large number of symbols.
+
When building shared libraries on *nix systems all symbols are visible
Also everything visible by deafult makes it hard to commit to a certain stable interface, as one has no control over the visible symbols.
+
by default. This is a main reason for the performance impact of PIC
 +
because every data access and every function call go through the GOT
 +
or PLT indirection.  It also leads to long loading time because an
 +
excessive number of relocations have to be processed and it
 +
significantly reduces the size of the dynamic symbol table (faster
 +
lookup and smaller libraries).  See http://gcc.gnu.org/wiki/Visibility
 +
and http://people.redhat.com/drepper/dsohowto.pdf for more details.
 +
 
 +
Also making every symbol accessible can inadvertently cause ABI
 +
dependencies making it harder to maintain libraries.
  
 
==Implementation Details==
 
==Implementation Details==
Line 141: Line 136:
 
==== Data Symbols ====
 
==== Data Symbols ====
  
For data symbols the 'export' attribute always means 'dllexport' when compiling a module and always 'dllimport' when importing a module. When compiling a module the compiler will generate a import symbol containing the address of the data symbol for each data symbol it encounters. The mangling of the import symbol should be the same as that of the symbol it referes to with a additional prefix. This prefix is '_imp_' on windows 32-bit and '__imp_' on windows 64-bit. So if the data symbol's mangling is '_D5ivar' the import symbol mangling is '_imp__D5ivar' on 32-bit and '__imp__D5ivar' on 64-bit. The import symbol should NOT be expoted to avoid a conflict with the import symbols which are generated within the import library by the linker.
+
For data symbols the 'export' attribute always means 'dllexport' when
 
+
defining a symbol and 'dllimport' when accessing a symbol.  That is
<syntaxhighlight lang=D>
+
accessing an exported variable is done through dereferencing it's
module a;
+
corresponding import symbols. When defining an exported variable the
export __gshared int var = 5;
+
compiler will emit a corresponding import symbol that is initialized
__gshared int* _imp_var = &var; // import symbol generated by the compiler
+
with address of the variable.  The import symbol can be located in the
 +
read only data segment. The mangling of the import symbol consists of
 +
the '_imp_'/'__imp_' (Win32/Win64) prefix followed by the mangled name
 +
of the variable. Import symbols itself are not exported. When an
 +
exported variable of the same module is accessed the compiler might
 +
avoid the indirection and perform a direct access.
  
void func()
+
<syntaxhighlight lang=D> module a; export __gshared int var = 5;
{
+
__gshared int* _imp__D1a3vari = &var; // import symbol generated by
  var = 3; // accesses var directly, because in the same module
+
the compiler
}
 
</syntaxhighlight>
 
  
 +
void func() { var = 3; // accesses var directly, because in the same
 +
module } </syntaxhighlight>
  
<syntaxhighlight lang=D>
 
module b;
 
import a;
 
  
void bar()
+
<syntaxhighlight lang=D> module b; import a;
{
 
  var = 5; // accesses var through the _imp_var indirection because var is marked as export and is located in a different module
 
  // *_imp_var = 5; // code generated by the compiler
 
}
 
</syntaxhighlight>
 
  
When accessing data symbols which are marked with 'export' the access should always be done through the additional level of indirection using the import symbol unless the data symbol is located within the same module currently beeing compiled. Access through data symbols not marked with 'export' does not change in any way.
+
void bar() { var = 5; // accesses var through the *_imp__D1a3vari
 +
indirection because var is marked as export and is located in a
 +
different module // *_imp__D1a3vari = 5; // code generated by the
 +
compiler } </syntaxhighlight>
  
 
==== Function Symbols ====
 
==== Function Symbols ====
  
For function symbols the 'export' attribute always means 'dllexport' when compiling a module and 'export' always equals a no-op when importing a module. Because the import library will generate method stubs with the correct symbol names function symbols can be called normally no matter if they are imported from a shared library or linked in from a static library.
+
For function symbols the 'export' attribute always means 'dllexport'
 +
when defining a function and 'dllimport' when calling a function.
 +
That is calling an exported function is done through it's
 +
corresponding import symbol. When defining an exported function the
 +
compiler will emit a corresponding import symbols that is an alias to
 +
the function (See [http://blog.omega-prime.co.uk/?p=121#windows COFF weak externals]
 +
and [http://www.azillionmonkeys.com/qed/Omfg.pdf OMF ALIAS record]
 +
on how to implement aliases).  Thus calling an exported
 +
function becomes compatible with both import libraries and static
 +
libraries, in the later case without indirection.
  
 
==== TLS variables ====
 
==== TLS variables ====
  
On windows plattforms the compiler should generate a internal method for each TLS variable returing the addess of the TLS variable. These internal methods should have some kind of unified prefix to mark them as TLS import helpers. I propose "__access_tls_". These internal methods should also be exported from a shared library implicitly.
+
For each exported TLS variable the compiler should generate a function
 +
that returns the address of the TLS variable in the current
 +
thread. These internal methods should have some kind of unified prefix
 +
to mark them as TLS import helpers. I propose "__tlsstub_". These
 +
internal methods are also exported.  So when accessing an exported TLS
 +
variable the compiler will insert a call to
 +
'_imp__D1a15__tlsstub_g_tlsFZPi' instead.  As an optimization accesses
 +
to exported TLS variables within the same module can be performed
 +
directly.
  
<syntaxhighlight lang=D>
+
<syntaxhighlight lang=D> module a; export int g_tls = 5; // thread
module a;
+
local storage
export int g_tls = 5; // thread local storage
 
  
export int* __access_tls_g_tls() // generated by the compiler
+
export int* __tlsstub__g_tls() // generated by the compiler { return
{
+
&g_tls; } alias _imp___tlsstub__g_tls = __tlsstub__g_tls; // also
  return &g_tls;
+
generated by the compiler
}
 
  
void func()
+
void func() { g_tls = 3; // direct access because marked as export and
{
+
in the same module } </syntaxhighlight>
  g_tls = 3; // direct access because marked as export and in the same module
 
}
 
</syntaxhighlight>
 
  
  
<syntaxhighlight lang=D>
+
<syntaxhighlight lang=D> module b; import a;
module b;
 
import a;
 
  
void bar()
+
void bar() { g_tls = 10; // access through _imp___tlsstub__g_tls
{
+
function because marked as export and in a different module //
  g_tls = 10; // access through __access_tls_g_tls function because marked as export and in a imported module
+
*_imp___tlsstub__g_tls() = 10; // code generated by the compiler }
  // *__access_tls_g_tls() = 10; // code generated by the compiler
 
}
 
 
</syntaxhighlight>
 
</syntaxhighlight>
 
When accessing a TLS variable marked with export and located within the same module currently compiled direct access should be done. When accessing a TLS variable marked with export from a imported module the access should be going through a additional indirection using __access_tls_ helper function.
 
  
 
=== *nix ===
 
=== *nix ===
  
On *nix systems the default symbols visibility needs to be changed to hidden and only symbols marked with 'export' should be visible when compiling a shared library.
+
On *nix systems the default symbols visibility should be changed to
 +
hidden, i.e. -fvisibility=hidden argument of gcc.  Only symbols marked
 +
with '''export''' should get the attribute visible.
  
 
== Copyright ==
 
== Copyright ==
 +
 
This document has been placed in the Public Domain.
 
This document has been placed in the Public Domain.
  
 
[[Category: DIP]]
 
[[Category: DIP]]

Revision as of 05:21, 6 September 2013

Nowak |- |Links: |

Abstract

Export and its behavior need to be changed in serveral ways to make it work on Windows and allow better code generation for other plattforms. The Rationale section explains the problems and shows how this DIP solves them.

Description

  • The export protection level should be turned into a
  • export attribute. If a module contains a single symbol
  • annotated with the 'export' attribute all compiler internal symbols
  • of this module should recieve the 'export' attribute too
  • (e.g. module info). If a class is annotated with the 'export'
  • attribute, all of its public and protected functions and members
  • will automatically recieve the 'export' attribute. Also all its
  • hidden compiler specific symbols will recieve the 'export'
  • attribute. There should be only one meaning of 'export'. It should
  • be possible to access TLS variables across DLL / shared library
  • boundaries. On *nix systems default symbol visibility is changed to
  • hidden, and only symbols marked with export become visible.

Rationale

Turning export into an attribute

Currently export is a protection level, the highest level of visibility actually. This however conflicts with the need to export 'protected' symbols. Consider a Base class in a shared library.

 module sharedLib;

class Base { protected final void doSomething() { ... } }


 module executable; import sharedLib;

class Derived : Base { public void func() { doSomething(); } }

In the above example 'doSomething' should only be visible to derived classes but it still needs to be exportable from a shared library. Therefor export should become a normal attribute which behaves orthogonal to protection.

Implicitly exporting compiler internal symbols

All compiler internal symbols need to be treated as exported if using an exported symbol might implicitly reference them to avoid link errors. The most prominent example is the ModuleInfo which needs linkage if the module has a static this().

export attribute inference

Currently export has to be specified in a lot of places to export all neccessary functions and data symbols. Export should be transitive in such a sense that it only needs to be specified once in a module to export all of its functions / data members including classes and their members / data symbols. Consider the following example:

 module sharedLib:

export:

__gshared int g_Var; // should be exported

void globalFunc() { ... } // should be exported

class A // compiler internal members should be exported { private: int
m_a;

    static int s_b; // should not be exported

    void internalFunc() { ... } // should not be exported

  protected: void interalFunc2() { ... } // should be exported

  public: class Inner // compiler internal members should be exported
    { static s_inner; // should be exported

       void innerMethod() { ... } // should be exported }

    void method() { ... } // should be exported }

private class C // should not be exported { public void method() {
... } // should not be exported }

A single meaning of export

The classical solution to handle dllexport/dllimport attributes on Windows is to define a macro that depending on the current build setting expands to __declspec(dllexport) or to __declspec(dllimport). This complicates the build setup and means that object files for a static library can't be mixed well with object files for a DLL. Instead we propose that export symbols are accompanied with weak import aliases and always accessed through them. See the implementation detail section for how this will work for data symbols and function symbols. That way a compiled object file can be used for a DLL or a static library. And vice versa an object file can be linked against an import library or a static library.

Access TLS variables

Currently it is not possible to access TLS variables across shared library boundaries on Windows. This should be implemented (see implementation details for a proposal).

Change symbol visibility on *nix systems

When building shared libraries on *nix systems all symbols are visible by default. This is a main reason for the performance impact of PIC because every data access and every function call go through the GOT or PLT indirection. It also leads to long loading time because an excessive number of relocations have to be processed and it significantly reduces the size of the dynamic symbol table (faster lookup and smaller libraries). See http://gcc.gnu.org/wiki/Visibility and http://people.redhat.com/drepper/dsohowto.pdf for more details.

Also making every symbol accessible can inadvertently cause ABI dependencies making it harder to maintain libraries.

Implementation Details

Windows

Data Symbols

For data symbols the 'export' attribute always means 'dllexport' when defining a symbol and 'dllimport' when accessing a symbol. That is accessing an exported variable is done through dereferencing it's corresponding import symbols. When defining an exported variable the compiler will emit a corresponding import symbol that is initialized with address of the variable. The import symbol can be located in the read only data segment. The mangling of the import symbol consists of the '_imp_'/'__imp_' (Win32/Win64) prefix followed by the mangled name of the variable. Import symbols itself are not exported. When an exported variable of the same module is accessed the compiler might avoid the indirection and perform a direct access.

 module a; export __gshared int var = 5;
__gshared int* _imp__D1a3vari = &var; // import symbol generated by
the compiler

void func() { var = 3; // accesses var directly, because in the same
module }


 module b; import a;

void bar() { var = 5; // accesses var through the *_imp__D1a3vari
indirection because var is marked as export and is located in a
different module // *_imp__D1a3vari = 5; // code generated by the
compiler }

Function Symbols

For function symbols the 'export' attribute always means 'dllexport' when defining a function and 'dllimport' when calling a function. That is calling an exported function is done through it's corresponding import symbol. When defining an exported function the compiler will emit a corresponding import symbols that is an alias to the function (See COFF weak externals and OMF ALIAS record on how to implement aliases). Thus calling an exported function becomes compatible with both import libraries and static libraries, in the later case without indirection.

TLS variables

For each exported TLS variable the compiler should generate a function that returns the address of the TLS variable in the current thread. These internal methods should have some kind of unified prefix to mark them as TLS import helpers. I propose "__tlsstub_". These internal methods are also exported. So when accessing an exported TLS variable the compiler will insert a call to '_imp__D1a15__tlsstub_g_tlsFZPi' instead. As an optimization accesses to exported TLS variables within the same module can be performed directly.

 module a; export int g_tls = 5; // thread
local storage

export int* __tlsstub__g_tls() // generated by the compiler { return
&g_tls; } alias _imp___tlsstub__g_tls = __tlsstub__g_tls; // also
generated by the compiler

void func() { g_tls = 3; // direct access because marked as export and
in the same module }


 module b; import a;

void bar() { g_tls = 10; // access through _imp___tlsstub__g_tls
function because marked as export and in a different module //
*_imp___tlsstub__g_tls() = 10; // code generated by the compiler }

*nix

On *nix systems the default symbols visibility should be changed to hidden, i.e. -fvisibility=hidden argument of gcc. Only symbols marked with export should get the attribute visible.

Copyright

This document has been placed in the Public Domain.

45 |- |Version: |2 |- |Status: |Draft |- |Created: |2013-08-27