Difference between revisions of "DIP38"

From D Wiki
Jump to: navigation, search
(Abstract)
m
 
(19 intermediate revisions by one other user not shown)
Line 1: Line 1:
== DIP38: Safe references and rvalue references without runtime checks. (STILL EDITING) ==
+
== DIP38: Safe references without runtime checks. ==
  
 
{| class="wikitable"
 
{| class="wikitable"
 
!Title:
 
!Title:
!''Safe references and rvalue references without runtime checks''
+
!''Safe references without runtime checks''
 
|-
 
|-
 
|DIP:
 
|DIP:
Line 18: Line 18:
 
|-
 
|-
 
|Last Modified:
 
|Last Modified:
|2013-05-06
+
|2013-05-10
 
|-
 
|-
 
|Author:
 
|Author:
Line 28: Line 28:
  
 
== Abstract ==
 
== Abstract ==
Safe references and rvalue references without runtime checks.
+
Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: ([http://forum.dlang.org/post/km3k8v$80p$1@digitalmars.com 'Rvalue references - The resolution']). We propose a formulation that is safe (guaranteed safety at compile time), efficient (doesn't require any runtime bounds checks) and simple.
In short, the compiler internally annotates ref-return functions with ref(i1,...,iN) indicating that the function may return argument j (j=i1...iN) by reference (possibly via field accesses), where j is also a ref input argument.
+
 
This list can be empty, and if the function is a method or internal function, argument 0 refers to implicit 'this' parameter.
+
We introduce 2 types of references for ref input arguments of ref return functions: inref and outref (the exact keywords can be discussed later) to distinguish whether a given input argument can be escaped or not (possibly via field accesses):
These annotations are used to validate/invalidate ref return functions that call such a ref return function.
+
ref fun(int inref a, int outref b, int outref c, int d);
These annotations are also written in the automatically generated di interface files.
+
indicates that b and c  can be escaped by ref return (there could indeed be multiple return statements), but not a.
 +
 
 +
We argue that these annotations (inref and outref) are sufficient for guaranteeing ref safety, simply by typechecking a program under a set of allowable conversions.
 +
 
 +
We propose two schemes:
 +
 
 +
* scheme A: the user annotates the ref return functions on his own (just a choice between inref or outref is needed for each ref input arg)
 +
* scheme B: the compiler takes care of the annotations via a proposed procedural analysis
 +
 
 +
If the function is a method or internal function, the function itself is marked as inref or outref as reference to the implicit 'this' parameter.
  
== Description ==
+
The annotations are part of the type system and written in the automatically generated di interface files.
Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: 'Rvalue references - The resolution'). I propose a formulation that is safe, yet doesn't require any runtime check.
 
The compiler automatically annotates any return by ref function with a dependency on input arguments:
 
  
eg:
+
== Examples ==
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
  
 
struct U{T x;}
 
struct U{T x;}
ref T foo(ref T a, ref T b, ref U c, T d){
+
ref T foo(ref T a, ref T b, ref U c, int d){
   if(condition)
+
  static T e;
    return a;
+
   if(...) return a;
   else if(condition(b))
+
   else if(...) return c.x;
    return c.x;
+
   else return e;
   else
 
    return d;
 
 
}
 
}
 
</syntaxhighlight>
 
</syntaxhighlight>
  
will be rewritten internally by the compiler as having the signature:
+
shall have the new signature:
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
ref(0,2) T foo(ref T a, ref T b, ref T c, T d);
+
ref T foo(outref T a, inref T b, outref U c, int d);
 
</syntaxhighlight>
 
</syntaxhighlight>
indicating that ref depends on ref arguments 0(a) and 2 (c) (dependency on c is via field access). The other arguments are not in the list because they're not input ref arguments, or not returned.  
+
indicating that it may return by ref a and c only (dependency on c is via field access).
  
Second example: when the function is a member (say of a struct), the 'this' parameter is implicitly argument number 0, but the same rules apply:
+
Second example: when the function is a member (say of a struct), the 'this' parameter is implicit, and the same rules apply:
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
struct S { T t; ref T fooc(ref T a) { if(condition) return t; else return a;} }
+
struct S { T t; ref T fooc(ref T a) { if(...) return t; else return a;} }
 
</syntaxhighlight>
 
</syntaxhighlight>
  
will be rewritten internally by the compiler as having the signature:
+
shall have the new signature:
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
struct S { T t; ref(0,1) T fooc(ref T a); }
+
struct S { T t; ref T fooc(outref T a) outref; }
 
</syntaxhighlight>
 
</syntaxhighlight>
because there's a ref dependency on 0(this) and 1(a).
+
indicating that it may return by ref a and the hidden 'this' parameter. The annotation for 'this' is at the method level, same as where const would be.
  
Note, if there's no dependency on input ref arguments (for example, it returns by ref a global), then the ref list is empty, ie ref().
+
The di file will also have those inref/outref annotations.
  
Given those compiler generated annotations, it is easy to validate/invalidate ref safety:
+
== Safe ref validation at compile time ==
the rule is: in any return statement of a ref-return function, the ref dependency list can only refer to:
 
A) global variables (gc-alloced, static etc)
 
B) ref inputs
 
This excludes locals.
 
  
Examples: taken from Walter's above mentioned email:
+
Given those inref/outref annotations, it is easy to validate/invalidate ref safety; we simply check whether the program typechecks under the following conversion rules:
 +
 
 +
Allowed type conversions:
 +
 
 +
* global => outref //global: gc-allocated, static, etc.
 +
* outref 'dot' field => outref // field access
 +
* ref function(args) where each outref arg is an outref expression => outref
 +
* ref function(args) where at least one outref arg is not an outref expression => local
 +
* inref => local
 +
* return outref => outref
 +
* return local => local // compile error if this is a ref return function
 +
 
 +
Examples taken from Walter's above mentioned email. Each one yields an error, and an explanation is given.
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
Line 85: Line 98:
 
//Case A:
 
//Case A:
 
     ref T fooa(ref T t) { return t; }
 
     ref T fooa(ref T t) { return t; }
     //=> ref(0) T fooa(ref T t);
+
     //=> ref T fooa(outref T t);
     ref T bar() { T t; return fooa(t); } // error since 0 refers to t which is a local
+
     ref T bar() { T t; return fooa(t); } // T t: local; fooa(t): local because ref fooa takes t as outref and t is a local expression. return fooa(t) therefore returns a local, which is an error.
  
 
//Case B:
 
//Case B:
 
     ref T foob(ref U u) { return u.t; }  
 
     ref T foob(ref U u) { return u.t; }  
//=>ref(0) T foob(ref U u) { return u.t; }  
+
//=>ref T foob(outref U u) { return u.t; }  
     ref U bar() { T t; return foob(t); } // error since 0 refers to t which is a local
+
     ref U bar() { T t; return foob(t); } // same as above, using rule 'outref 'dot' field => outref'.
  
 
//Case C:
 
//Case C:
 
     struct S { T t; ref T fooc() { return t; } }
 
     struct S { T t; ref T fooc() { return t; } }
//=>struct S { T t; ref(0) T fooc(); } //0 refers to 'this'
+
//=>struct S { T t; ref T fooc() outref; } //outref refers to hidden this parameter
     ref T bar() { S s; return s.fooc(); } // error since 0 refers to this = s, which is local
+
     ref T bar() { S s; return s.fooc(); } // same as above
  
 
//Case D:
 
//Case D:
    Returning ref to uplevel local:
 
 
 
     ref T food() {
 
     ref T food() {
 
         T t;
 
         T t;
 
         ref T bar() { return t; }
 
         ref T bar() { return t; }
//=>ref(0) T bar(); // 0 here indicates an implicit 'this' pointing to local stack allocated variables (wouldn't be the case if t were static variable)
+
//=>ref T bar() outref; //outref refers to hidden this parameter, this could be rewritten as: ref T bar(outref void*this) ;
         return bar(); //error since 0 refers to local stack
+
         return bar(); // same error as above (since 'this' refers to local stack)
 
     }
 
     }
  
 
//case E:
 
//case E:
 
     Transitively calling other functions:
 
     Transitively calling other functions:
 +
    ref T fooe(T t) { return fooa(t); } //same error because t is a local.
 +
</syntaxhighlight>
  
    ref T fooe(T t) { return ref  fooa(t); } //error since we have the compiler rewrite 'ref(0) T fooa(ref T t);' and 0 refers to t, a local variable (t is not a ref variable in fooe).
+
== scheme A: the user annotates the ref return functions on his own==
 +
Just a choice between inref or outref is needed for each ref input arg.
  
 +
== scheme B: the compiler takes care of the annotations via a proposed procedural analysis ==
 +
We sketch an algorithm to infer inref/outref attributes. Let's take the following example for illustration:
 +
<syntaxhighlight lang="d">
 +
    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a); else return foo2(c); }
 +
    ref T foo2(ref T a) { return a; }
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 +
The propagation algorithm goes as follows
  
Final point: the di files will have to write those annotations written down, which shall be done automatically.
+
* initialize each ref argument of ref-return functions with 'ref' (ie we don't know yet whether it's inref or outref)
 +
* construct an oriented graph:
 +
  * nodes are ref-return functions
 +
  * edges are ref-return dependencies (one edge per return statement in a ref return function): with example above, there is a graph with 2 nodes (foo1 and foo2) and a single edge (foo1 -> foo2).
 +
* while some ref annotations have changed do:
 +
  * for each node with a 'ref' annotation
 +
      * recompute annotations and remove uncertainty if all all outgoing edges have no uncertainty (ie ref instead of inref or outref)
 +
* case A1) if there are no nodes with ref annotations, then we have succeeded in compile time inference of ref dependency
 +
* case A2) otherwise, for each node with ref annotations, then there are loops in the graph, and for these nodes we fall back in runtime check on return addresses as proposed in Dconf13. This case should be rare in practice. However there might be a slightly more complex algorithm in that case too that doesn't require runtime check (will think about it).
  
== Usage ==
+
For the above example we have:
<div style="padding: 1ex 1ex; background: #ffd;">This section has been adapted for MediaWiki.</div>
+
iteration 0(initialization):
 +
foo1(ref T a, T b, ref T c);
 +
foo2(ref T a);
  
To start a new DIP you can go to Edit link and copy the source of this DIP, then go to [[DIPs|DIP index]] and add a new item in the list. The DIP number should be one more than the last DIP in the index (for example, if the DIP1 is the last DIP, your DIP should be DIP2). The link in the index should have the form: <nowiki>[[DIPx]]</nowiki>, Title, Status, resume. Where x is the DIP number, title is the DIP title and resume is a short description about the DIP.
+
iteration 1:
 +
foo1(ref T a, T b, ref T c);
 +
foo2(outref T a);
  
Save the [[DIPs|DIP index]] page and click on the new red link. Now you are editing the new DIP you just created, now paste the copied source text from this template and replace all the you need.
+
iteration 2:
 +
foo1(outref T a, T b, outref T c);
 +
foo2(outref T a);
  
Remember to update the metadata at the start of the DIP, and keep it as a Draft at the beginning. When your DIP is done, you should announce it in the News Group for discussion, with a subject like this: new DIPx: title (where one more time x is the DIP number and title is the DIP title).
+
Loops in the graph (case A2) correspond to the case of mutually recursive ref return functions. For example:
 
+
<syntaxhighlight lang="d">
You should always put you DIPs in the Public Domain (or a similarly permissive license but use Public Domain unless you're very sure of what you're doing).
+
    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a,0,c); else return a; }
 
+
    ref T foo2(ref T a, T b, ref T c) { if(...) return foo1(a,1,c); else return c; }
== Recommendations ==
+
</syntaxhighlight>
When writing a DIP, try not to express your opinion. DIPs should provide facts and be as objective as possible. Even when that's pretty hard, you can make the DIP look more objective by not using, for example, "I prefer XXX because YYY". If YYY is an objective advantage, write that in a more objective way, like "XXX can be a better option because YYY". Try to leave non-technical personal preferences aside; "XXX can be a better option because the syntax is nicer" is not good enough even when you don't say "I prefer".
 
 
 
Try not to include half-baked ideas. If you are not sure about something, leave it outside the DIP and write it on the NG announcement instead for further discussion. The idea can be added to the DIP in the future when it is in a better shape.
 
 
 
== Abstract ==
 
Make the abstract as descriptive as possible (while keeping it brief). From an abstract you should be able to tell what the DIP is about, you should introduce for every non-trivial concept a person should know for understanding the DIP (or provide links if you can't briefly describe those concepts in the abstract). Don't copy the title of the DIP to use it as an abstract. Ideally an abstract should be a paragraph 5 to 10 lines long.
 
 
 
== Rationale ==
 
Rationale should be complete. When the DIP tries to solve a problem, try to describe that problem as detailed as possible. If you have links to the NG describing the problem more deeply, used them. All the background information is welcome.
 
  
== NG Announcement ==
+
== Rvalue references ==
When posting the DIP announcement to the NG, please copy the abstract, so people can easily know what is it about and follow the link if they are interested.
 
  
 +
See DIP39 for how to handle those safely in conjunction with this DIP38.
  
 
== Copyright ==
 
== Copyright ==

Latest revision as of 16:48, 26 May 2013

DIP38: Safe references without runtime checks.

Title: Safe references without runtime checks
DIP: 38
Version: 1
Status: Draft
Created: 2013-05-06
Last Modified: 2013-05-10
Author: Timothee Cour
Links:

Abstract

Dconf13 introduced safe references enabled by a runtime check (see email thread from Walter: ('Rvalue references - The resolution'). We propose a formulation that is safe (guaranteed safety at compile time), efficient (doesn't require any runtime bounds checks) and simple.

We introduce 2 types of references for ref input arguments of ref return functions: inref and outref (the exact keywords can be discussed later) to distinguish whether a given input argument can be escaped or not (possibly via field accesses): ref fun(int inref a, int outref b, int outref c, int d); indicates that b and c can be escaped by ref return (there could indeed be multiple return statements), but not a.

We argue that these annotations (inref and outref) are sufficient for guaranteeing ref safety, simply by typechecking a program under a set of allowable conversions.

We propose two schemes:

  • scheme A: the user annotates the ref return functions on his own (just a choice between inref or outref is needed for each ref input arg)
  • scheme B: the compiler takes care of the annotations via a proposed procedural analysis

If the function is a method or internal function, the function itself is marked as inref or outref as reference to the implicit 'this' parameter.

The annotations are part of the type system and written in the automatically generated di interface files.

Examples

struct U{T x;}
ref T foo(ref T a, ref T b, ref U c, int d){
  static T e;
  if(...) return a;
  else if(...) return c.x;
  else return e;
}

shall have the new signature:

ref T foo(outref T a, inref T b, outref U c, int d);

indicating that it may return by ref a and c only (dependency on c is via field access).

Second example: when the function is a member (say of a struct), the 'this' parameter is implicit, and the same rules apply:

struct S { T t; ref T fooc(ref T a) { if(...) return t; else return a;} }

shall have the new signature:

struct S { T t; ref T fooc(outref T a) outref; }

indicating that it may return by ref a and the hidden 'this' parameter. The annotation for 'this' is at the method level, same as where const would be.

The di file will also have those inref/outref annotations.

Safe ref validation at compile time

Given those inref/outref annotations, it is easy to validate/invalidate ref safety; we simply check whether the program typechecks under the following conversion rules:

Allowed type conversions:

  • global => outref //global: gc-allocated, static, etc.
  • outref 'dot' field => outref // field access
  • ref function(args) where each outref arg is an outref expression => outref
  • ref function(args) where at least one outref arg is not an outref expression => local
  • inref => local
  • return outref => outref
  • return local => local // compile error if this is a ref return function

Examples taken from Walter's above mentioned email. Each one yields an error, and an explanation is given.

//Case A:
    ref T fooa(ref T t) { return t; }
    //=> ref T fooa(outref T t);
    ref T bar() { T t; return fooa(t); } // T t: local; fooa(t): local because ref fooa takes t as outref and t is a local expression. return fooa(t) therefore returns a local, which is an error.

//Case B:
    ref T foob(ref U u) { return u.t; } 
//=>ref T foob(outref U u) { return u.t; } 
    ref U bar() { T t; return foob(t); } // same as above, using rule 'outref 'dot' field => outref'.

//Case C:
    struct S { T t; ref T fooc() { return t; } }
//=>struct S { T t; ref T fooc() outref; } //outref refers to hidden this parameter
    ref T bar() { S s; return s.fooc(); } // same as above

//Case D:
    ref T food() {
        T t;
        ref T bar() { return t; }
//=>ref T bar() outref; //outref refers to hidden this parameter, this could be rewritten as: ref T bar(outref void*this) ; 
        return bar(); // same error as above (since 'this' refers to local stack)
    }

//case E:
    Transitively calling other functions:
    ref T fooe(T t) { return fooa(t); } //same error because t is a local.

scheme A: the user annotates the ref return functions on his own

Just a choice between inref or outref is needed for each ref input arg.

scheme B: the compiler takes care of the annotations via a proposed procedural analysis

We sketch an algorithm to infer inref/outref attributes. Let's take the following example for illustration:

    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a); else return foo2(c); }
    ref T foo2(ref T a) { return a; }

The propagation algorithm goes as follows

  • initialize each ref argument of ref-return functions with 'ref' (ie we don't know yet whether it's inref or outref)
  • construct an oriented graph:
  * nodes are ref-return functions 
  * edges are ref-return dependencies (one edge per return statement in a ref return function): with example above, there is a graph with 2 nodes (foo1 and foo2) and a single edge (foo1 -> foo2).
  • while some ref annotations have changed do:
  * for each node with a 'ref' annotation
     * recompute annotations and remove uncertainty if all all outgoing edges have no uncertainty (ie ref instead of inref or outref)
  • case A1) if there are no nodes with ref annotations, then we have succeeded in compile time inference of ref dependency
  • case A2) otherwise, for each node with ref annotations, then there are loops in the graph, and for these nodes we fall back in runtime check on return addresses as proposed in Dconf13. This case should be rare in practice. However there might be a slightly more complex algorithm in that case too that doesn't require runtime check (will think about it).

For the above example we have: iteration 0(initialization): foo1(ref T a, T b, ref T c); foo2(ref T a);

iteration 1: foo1(ref T a, T b, ref T c); foo2(outref T a);

iteration 2: foo1(outref T a, T b, outref T c); foo2(outref T a);

Loops in the graph (case A2) correspond to the case of mutually recursive ref return functions. For example:

    ref T foo1(ref T a, T b, ref T c) { if(...) return foo2(a,0,c); else return a; }
    ref T foo2(ref T a, T b, ref T c) { if(...) return foo1(a,1,c); else return c; }

Rvalue references

See DIP39 for how to handle those safely in conjunction with this DIP38.

Copyright

This document has been placed in the Public Domain.