Difference between revisions of "DIP38"

From D Wiki
Jump to: navigation, search
Line 28: Line 28:
  
 
== Abstract ==
 
== Abstract ==
In short, the compiler internally annotates ref-return functions with ref(i1,...,iN) indicating that the function may return argument j (j=i1...iN) by reference (possibly via field accesses), where j is also a ref input argument.
+
In short, the compiler internally annotates each ref input argument ai of a ref-return function with inref or outref indicating whether or not the function may return ai by reference (possibly via field accesses). If the function is a method or internal function, the function itself is marked as inref or outref as reference to the implicit 'this' parameter.
This list can be empty, and if the function is a method or internal function, argument 0 refers to implicit 'this' parameter.
+
 
These annotations are used to validate/invalidate ref return functions that call such a ref return function.
+
These annotations are used to validate/invalidate safety of ref return functions.
These annotations are also written in the automatically generated di interface files.
+
 
 +
They're written in the automatically generated di interface files.
  
 
== Internal Compiler Annotation ==
 
== Internal Compiler Annotation ==
 
Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: 'Rvalue references - The resolution'). I propose a formulation that is safe, yet doesn't require any runtime check.
 
Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: 'Rvalue references - The resolution'). I propose a formulation that is safe, yet doesn't require any runtime check.
The compiler automatically annotates any return by ref function with a dependency on input arguments:
+
The compiler automatically annotates any ref return function with inref/outref on ref input arguments:
  
 
eg:
 
eg:
Line 54: Line 55:
 
will be rewritten internally by the compiler as having the signature:
 
will be rewritten internally by the compiler as having the signature:
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
ref(0,2) T foo(ref T a, ref T b, ref U c);
+
ref T foo(outref T a, inref T b, outref U c);
 
</syntaxhighlight>
 
</syntaxhighlight>
indicating that ref depends on ref arguments 0(a) and 2 (c) (dependency on c is via field access). The other arguments are not in the list because they're not input ref arguments, or not returned.  
+
indicating that ref depends on ref arguments a and c (dependency on c is via field access). The other input ref arguments are marked 'inref' because they can't be returned by ref.  
  
Second example: when the function is a member (say of a struct), the 'this' parameter is implicitly argument number 0, but the same rules apply:
+
Second example: when the function is a member (say of a struct), the 'this' parameter is implicit, and the same rules apply:
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
Line 67: Line 68:
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
struct S { T t; ref(0,1) T fooc(ref T a); }
+
struct S { T t; ref T fooc(outref T a) outref; }
 
</syntaxhighlight>
 
</syntaxhighlight>
because there's a ref dependency on 0(this) and 1(a).
+
because there's a ref dependency on a and the implicit 'this' argument (the annotation for 'this' is at the method level, as const would be).
 
 
Note, if there's no dependency on input ref arguments (for example, it returns by ref a global), then the ref list is empty, ie ref().
 
  
 
The di files will have to write those annotations written down, which shall be done automatically.
 
The di files will have to write those annotations written down, which shall be done automatically.
Line 77: Line 76:
 
== Safe ref validation at compile time ==
 
== Safe ref validation at compile time ==
  
Given those compiler generated annotations, it is easy to validate/invalidate ref safety:
+
Given those inref/outref annotations, it is easy to validate/invalidate ref safety; we simply check whether the program typechecks under the following conversion rules:
the rule is: in any return statement of a ref-return function, the ref dependency list can only refer to:
+
 
A) global variables (gc-alloced, static etc)
+
Allowed type conversions:
B) ref inputs
+
 
This excludes locals.  
+
global => outref                                          //global: gc-allocated, static, etc.
 +
output of ref-return function call => outref 
 +
outref . field => outref                                // field access
 +
local => inref
 +
global => inref
 +
outref => inref
 +
temporary => inref
  
 
Examples: taken from Walter's above mentioned email:
 
Examples: taken from Walter's above mentioned email:
Line 89: Line 94:
 
//Case A:
 
//Case A:
 
     ref T fooa(ref T t) { return t; }
 
     ref T fooa(ref T t) { return t; }
     //=> ref(0) T fooa(ref T t);
+
     //=> ref T fooa(outref T t);
     ref T bar() { T t; return fooa(t); } // error since 0 refers to t which is a local
+
     ref T bar() { T t; return fooa(t); } // error: wrong conversion local => outref
  
 
//Case B:
 
//Case B:
 
     ref T foob(ref U u) { return u.t; }  
 
     ref T foob(ref U u) { return u.t; }  
//=>ref(0) T foob(ref U u) { return u.t; }  
+
//=>ref T foob(outref U u) { return u.t; }  
     ref U bar() { T t; return foob(t); } // error since 0 refers to t which is a local
+
     ref U bar() { T t; return foob(t); } // error: wrong conversion local => outref
  
 
//Case C:
 
//Case C:
 
     struct S { T t; ref T fooc() { return t; } }
 
     struct S { T t; ref T fooc() { return t; } }
//=>struct S { T t; ref(0) T fooc(); } //0 refers to 'this'
+
//=>struct S { T t; ref T fooc() outref; } //outref refers to hidden this parameter
     ref T bar() { S s; return s.fooc(); } // error since 0 refers to this = s, which is local
+
     ref T bar() { S s; return s.fooc(); } // error: wrong conversion local => outref
  
 
//Case D:
 
//Case D:
    Returning ref to uplevel local:
 
 
 
     ref T food() {
 
     ref T food() {
 
         T t;
 
         T t;
 
         ref T bar() { return t; }
 
         ref T bar() { return t; }
//=>ref(0) T bar(); // 0 here indicates an implicit 'this' pointing to local stack allocated variables (wouldn't be the case if t were static variable)
+
//=>ref T bar() outref; //outref refers to hidden this parameter, this could be rewritten as: ref T bar(outref void*this) ;
         return bar(); //error since 0 refers to local stack
+
         return bar(); // error: wrong conversion local => outref (since 'this' refers to local stack)
 
     }
 
     }
  
 
//case E:
 
//case E:
 
     Transitively calling other functions:
 
     Transitively calling other functions:
 
+
     ref T fooe(T t) { return ref  fooa(t); } //error because of conversion local=>outref when attempting to call fooa(t).
     ref T fooe(T t) { return ref  fooa(t); } //error since we have the compiler rewrite 'ref(0) T fooa(ref T t);' and 0 refers to t, a local variable (t is not a ref variable in fooe).
 
 
 
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
== Algorithmic details for ref dependency analysis==
 
== Algorithmic details for ref dependency analysis==
Note, the algorithm proposed below does NOT require inter-procedure analysis, as only function signatures (with internal annotation) of functions called in a return statement is required when considering a function's ref dependencies.
 
 
 
Let's take the following example for illustration:
 
Let's take the following example for illustration:
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
Line 130: Line 129:
 
The propagation algorithm goes as follows
 
The propagation algorithm goes as follows
  
* introduce the internal notation ref(i0,...,iN)? meaning that a ref return function has dependency on at least ref input arguments i0...iN (as above), but perhaps more.
+
* initialize each ref argument of ref-return functions with 'ref' (ie we don't know yet whether it's inref or outref)
* initialize each ref-return function with annotation ref()?
 
 
* construct an oriented graph:
 
* construct an oriented graph:
 
   * nodes are ref-return functions  
 
   * nodes are ref-return functions  
 
   * edges are ref-return dependencies (one edge per return statement in a ref return function): with example above, there is a graph with 2 nodes (foo1 and foo2) and a single edge (foo1 -> foo2).
 
   * edges are ref-return dependencies (one edge per return statement in a ref return function): with example above, there is a graph with 2 nodes (foo1 and foo2) and a single edge (foo1 -> foo2).
 
* while some ref annotations have changed do:
 
* while some ref annotations have changed do:
   * for each node with a '?' annotation
+
   * for each node with a 'ref' annotation
 
       * recompute annotations and remove '?' if all outgoing edges have no '?' annotations
 
       * recompute annotations and remove '?' if all outgoing edges have no '?' annotations
 
* case A1) if there are no nodes with '?' annotations, then we have succeeded in compile time inference of ref dependency
 
* case A1) if there are no nodes with '?' annotations, then we have succeeded in compile time inference of ref dependency

Revision as of 05:15, 7 May 2013

DIP38: Safe references and rvalue references without runtime checks.

Title: Safe references and rvalue references without runtime checks
DIP: 38
Version: 1
Status: Draft
Created: 2013-05-06
Last Modified: 2013-05-06
Author: Timothee Cour
Links:

Abstract

In short, the compiler internally annotates each ref input argument ai of a ref-return function with inref or outref indicating whether or not the function may return ai by reference (possibly via field accesses). If the function is a method or internal function, the function itself is marked as inref or outref as reference to the implicit 'this' parameter.

These annotations are used to validate/invalidate safety of ref return functions.

They're written in the automatically generated di interface files.

Internal Compiler Annotation

Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: 'Rvalue references - The resolution'). I propose a formulation that is safe, yet doesn't require any runtime check. The compiler automatically annotates any ref return function with inref/outref on ref input arguments:

eg:

struct U{T x;}
ref T foo(ref T a, ref T b, ref U c){
  static T d;
  if(condition)
    return a;
  else if(condition(b))
    return c.x;
  else
     return d;
}

will be rewritten internally by the compiler as having the signature:

ref T foo(outref T a, inref T b, outref U c);

indicating that ref depends on ref arguments a and c (dependency on c is via field access). The other input ref arguments are marked 'inref' because they can't be returned by ref.

Second example: when the function is a member (say of a struct), the 'this' parameter is implicit, and the same rules apply:

struct S { T t; ref T fooc(ref T a) { if(condition) return t; else return a;} }

will be rewritten internally by the compiler as having the signature:

struct S { T t; ref T fooc(outref T a) outref; }

because there's a ref dependency on a and the implicit 'this' argument (the annotation for 'this' is at the method level, as const would be).

The di files will have to write those annotations written down, which shall be done automatically.

Safe ref validation at compile time

Given those inref/outref annotations, it is easy to validate/invalidate ref safety; we simply check whether the program typechecks under the following conversion rules:

Allowed type conversions:

global => outref //global: gc-allocated, static, etc. output of ref-return function call => outref outref . field => outref // field access local => inref global => inref outref => inref temporary => inref

Examples: taken from Walter's above mentioned email:

//Case A:
    ref T fooa(ref T t) { return t; }
    //=> ref T fooa(outref T t);
    ref T bar() { T t; return fooa(t); } // error: wrong conversion local => outref

//Case B:
    ref T foob(ref U u) { return u.t; } 
//=>ref T foob(outref U u) { return u.t; } 
    ref U bar() { T t; return foob(t); } // error: wrong conversion local => outref

//Case C:
    struct S { T t; ref T fooc() { return t; } }
//=>struct S { T t; ref T fooc() outref; } //outref refers to hidden this parameter
    ref T bar() { S s; return s.fooc(); } // error: wrong conversion local => outref

//Case D:
    ref T food() {
        T t;
        ref T bar() { return t; }
//=>ref T bar() outref; //outref refers to hidden this parameter, this could be rewritten as: ref T bar(outref void*this) ; 
        return bar(); // error: wrong conversion local => outref (since 'this' refers to local stack)
    }

//case E:
    Transitively calling other functions:
    ref T fooe(T t) { return ref  fooa(t); } //error because of conversion local=>outref when attempting to call fooa(t).

Algorithmic details for ref dependency analysis

Let's take the following example for illustration:

    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a); else return foo2(c); }
    ref T foo2(ref T a) { return a; }

The propagation algorithm goes as follows

  • initialize each ref argument of ref-return functions with 'ref' (ie we don't know yet whether it's inref or outref)
  • construct an oriented graph:
  * nodes are ref-return functions 
  * edges are ref-return dependencies (one edge per return statement in a ref return function): with example above, there is a graph with 2 nodes (foo1 and foo2) and a single edge (foo1 -> foo2).
  • while some ref annotations have changed do:
  * for each node with a 'ref' annotation
     * recompute annotations and remove '?' if all outgoing edges have no '?' annotations
  • case A1) if there are no nodes with '?' annotations, then we have succeeded in compile time inference of ref dependency
  • case A2) otherwise, for each node with '?' annotations, then there are loops in the graph, and for these nodes we fall back in runtime check on return addresses as proposed in Dconf13. This case should be rare in practice. However there might be a slightly more complex algorithm in that case too that doesn't require runtime check (I will think about it).

For the above example we have: iteration 0(initialization): foo1:ref()? foo2:ref()?

iteration 1: foo1:ref()? foo2:ref(0)

iteration 2: foo1:ref(0,2) foo2:ref(0)

Loops in the graph (case A2) correspond to the case of mutually recursive ref return functions. For example:

    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a,0,c); else return a; }
    ref T foo2(ref T a, T b, ref T c) { if(...) return foo1(a,1,c); else return c; }

Rvalue references

As for rvalue references, the compiler shall introduce a temporary variable before calling the ref function as has been discussed elsewhere. The same rules apply.

Copyright

This document has been placed in the Public Domain.