DIP69

From D Wiki
Revision as of 08:53, 4 December 2014 by WalterBright (talk | contribs) (Initial edit)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Title: Implement scope for escape proof references
DIP: 69
Version: 1
Status: Draft
Created: 2014-12-04
Last Modified: 2014-12-04
Author: Marc Schütz, deadalnix, Andrei Alexandrescu and Walter Bright
Links: Proposals

Scope Proposal by Marc Schütz. The above document is derived from this one.

- DIP36: Scope References by Dicebot

- DIP35: Addition to DIP25: Sealed References by Zach Tollen

- DIP25: Sealed References by Andrei Alexandrescu and Walter Bright

Discussions

- Why is scope planned for deprecation?

- RFC: scope and borrowing

- borrowed pointers vs ref

- Proposal for design of scope

- Proposal for the design of scope pt 2

- scope escaping


$(H2 Abstract)

$(P A garbage collected language is inherently memory safe. References to data can be passed around without concern for ownership, lifetimes, etc. But this runs into difficulty when combined with other sorts of memory management, like stack allocation, malloc/free allocation, reference counting, etc. )

$(P Knowing when the lifetime of a reference is over is critical for safely implementing memory management schemes other than GC. It is also critical for the performance of reference counting systems, as it will expose opportunities for elision of the inc/dec operations. )

$(P $(D scope) provides a mechanism to guarantee that a reference cannot escape lexical scope. )

$(H2 Benefits)

$(UL $(LI References to stack variables can no longer escape.) $(LI Delegates currently defensively allocate closures with the GC. Few actually escape, and with scope

    only those that actually escape need to have the closures allocated.)

$(LI $(D @system) code like $(D std.internal.scopebuffer) can be made $(D @safe).) $(LI Reference counting systems need not adjust the count when passing references that do not escape.) $(LI Better self-documentation of encapsulation.) )

$(H2 Definitions)

$(H3 Lifetime)

$(P Any object exists for a specific period of time, with a well defined beginning and end point: from the point it is created (constructed), to the point it is released (destroyed). This is called its $(I lifetime). A reference that points to an object whose lifetime has ended is a dangling reference. Use of such references can cause all kinds of errors, and must therefore be prevented. )

$(P The lifetime of variables is based purely on their lexical scope and order of declaration. The following rules define a hierarchy of lifetimes: )

$(UL $(LI A variable's lifetime starts at the point of its declaration, and ends with the lexical scope it is defined in.) $(LI An (rvalue) expression's lifetime is temporary; it lives till the end of the statement that it appears in.) $(LI The lifetime of A is higher than that of B, if A appears in a higher scope than B, or if both appear in the same scope,

but A comes lexically before B. This matches the order of destruction of local variables. 

) $(LI The lifetime of a function parameter is higher than that of that function's local variables,

but lower than any variables in higher scopes.

) )

$(H3 Ownership)

$(P A variable $(I owns) the data it contains if, when the lifetime of the variable is ended, the data can be destroyed. There can be at most one owner for any piece of data. )

$(P Ownership may be $(I transitive), meaning anything reachable through the owned data is also owned, or $(I head) where only the top level reference is owned.)

$(H3 View)

$(P Other references to $(I owned) data are called $(I views). Views must not survive the end of the lifetime of the owner of the data. A view v$(SUB 2) may be taken of another view v$(SUB 1), and the lifetime of v$(SUB 2) must be subset of the lifetime of v$(SUB 1). Views may also be transitive or head.)

$(H2 Scope Fundamentals)

$(P The purpose of $(D scope) is to provide a means for ensuring the lifetime of a viewer is a subset of the lifetime of the viewee. )


$(P $(D scope) is a storage class, and affects declarations. It is not a type constructor. There is no change to existing $(LINK2 http://dlang.org/attribute, $(D scope) grammar). It fits in the grammar as a storage class. )

$(P Scope affects:)

$(UL $(LI local variables allocated on the stack) $(LI function parameters) $(LI non-static member functions (applying to the $(D this))) $(LI delegates (applying to the $(D this))) $(LI return value of functions) )

$(P It is ignored for other declarations. It is ignored for declarations which are not views.)

scope enum e = 3;  // scope is ignored for enums
scope int i;       // scope is ignored because integers are not references and so are not views

$(P Scope affects variables according to these rules:)

$(OL $(LI A scope variable's value is a head view that lasts for the lifetime of the variable.) $(LI Scope variables can only be assigned values that have lifetimes that are a superset of the variable's lifetime.) $(LI Scope variables can only be assigned to scope variables with a lifetime that is a subset.) $(LI A variable is inferred to be scope if it is initialized with a view that has a non-$(INF) lifetime.) $(LI A scope variable cannot be initialized with the address of a scoped variable.) $(LI A scope ref variable can be initialized with another scope ref variable - scope ref is idempotent.) )

Basic operation:

int global_var;
int* global_ptr;
 
void bar(scope int* input);
 
void foo() {
    scope int* a;
    a = &global_var;       // Ok, `global_var` has a greater lifetime than `a`
    scope b = &global_var; // Ok, type deduction
    int c;
 
    if(...) {
        scope x = a;       // Ok, copy of reference,`x` has shorter lifetime than `a`
        scope y = &c;      // Ok, lifetime(y) < lifetime(& c)
        int z;
        b = &z;            // Error, `b` will outlive `z`
        int* d = a;        // Ok: d is inferred to be `scope`
    }
 
    bar(a);                // Ok, scoped pointer is passed to scoped parameter
    bar(&c);               // Ok, lifetime(parameter input) < lifetime(c)
    int* e;
    e = &c;                // Error, lifetime(e's view) is $(INF) and is greater than lifetime(c)
    a = e;                 // Ok, lifetime(a) < lifetime(e)
    scope int** f = &a;    // Error, rule 5
    scope int** h = &e;    // Ok
    int* j = *h;           // Ok, scope is not transitive
}

void abc() {
    scope int* a;
    int* b;
    scope ref int* c = a;  // Error, rule 5
    scope ref int* d = b;  // Ok
    int* i = a;            // Ok, scope is inferred for i
    global_ptr = d;        // Error, lifetime(d) < lifetime(global_ptr)
    global_ptr = i;        // Error, lifetime(i) < lifetime(global_ptr)
    int* j;                // Ok, scope is inferred for i
    global_ptr = j;        // Ok, j is not scope
}

$(H2 Return Statement)

$(P A view annotated with $(D scope) cannot be returned from a function.)

class C { ... }
scope C c;
return c;   // Error

scope int i;
return i;   // Ok, i is not a view

scope int* p;
return p;   // Error
return p+1; // Error, nice try!
return &*p; // Error, won't work either

ref int func(scope ref int r, scope out int s)
{
    return r; // Error
    return s; // Error, 'out' is treated like 'ref'
}

$(H2 Functions)

$(H3 Inference)

Scope is inferred for function parameters if not specified, under the same circumstances as $(D pure), $(D nothrow), $(D @nogc) and safety are inferred.

$(H3 Overloading)

$(P Scope does not affect overloading. If it did, then whether a variable was scope or not would affect the code path, making scope inference impractical. It also makes turning scope checking on/off impractical.)

T func(scope ref T);
T func(ref T);

T t; func(t); // Error, ambiguous
scope T u; func(u); // Error, ambiguous


Implicit Conversion of Function Pointers and Delegates

$(D scope) can be added to parameters, but not removed.

alias int function(ref T) fp_t;
alias int function(scope ref T) fps_t;

int foo(ref T);
int bar(scope ref T);

fp_t fp = &bar;   // Ok, scope behavior is subset of non-scope
fps_t fp = &foo;  // Error, fps_t demands scope behavior

Inheritance

Overriding functions inherit any $(D scope) annotations from their antecedents. Scope is covariant, meaning it can be added to overriding functions.

class C
{
    int foo(ref T);
    int bar(scope ref T);
}

class D
{
    override int foo(scope ref T); // Ok, can add scope
    override int bar(ref T);       // Error, cannot remove scope
}

Mangling

Scope will require additional mangling, as it affects the interface of the function. In cases where scope is ignored, it does not contribute to the mangling. Scope parameters will be mangled with ???.

Nested Functions

Nested functions have more objects available than just their arguments:

ref T foo() {
  T t;
  ref T func() { return t; }
  return func();  // disallowed
}

Nested functions are analyzed as if each variable accessed outside of its scope was passed as a ref parameter. All parameters have scope inferred from how they are used in the function body.


Ref

Escaping via Return

The simple cases of this are disallowed:

T* func(T t) {
  T u;
  return &t; // Error: escaping reference to local t
  return &u; // Error: escaping reference to local u
}

But are easily circumvented:

T* func(T t) {
  T* p = &t;
  return p;  // no error detected
}

@safe currently deals with this by preventing taking the address of a local:

T* func(T t) @safe {
  T* p = &t; // Error: cannot take address of parameter t in @safe function func
  return p;
}

This is restrictive. The $(D ref) storage class was introduced which defines a special purpose pointer. $(D ref) can only appear in certain contexts, in particular function parameters and returns, only applies to declarations, cannot be stored, and cannot be incremented.

ref T func(T t) @safe {
  return t; // Error: escaping reference to local variable t
}

Ref can be passed down to functions:

void func(ref T t) @safe;
void bar(ref T t) @safe {
   func(t); // ok
}

But the following idiom is far too useful to be disallowed:

ref T func(ref T t) {
  return t; // ok
}

And if it is misused it can result in stack corruption:

ref T foo() {
  T t;
  return func(t); // currently, no error detected, despite returning pointer to t
}

The:

return func(t);

case is detected by all of the following conditions being true:


  • foo() returns by reference
  • func() returns by reference
  • func() has one or more parameters that are by reference
  • 1 or more of the arguments to those parameters are stack objects local to foo()
  • Those arguments can be @safe-ly converted from the parameter to the return type.
  For example, if the return type is larger than the parameter type, the return type
  cannot be a reference to the argument. If the return type is a pointer, and the
  parameter type is a size_t, it cannot be a reference to the argument. The larger
  a list of these cases can be made, the more code will pass @safe checks without requiring
  further annotation.

Scope Ref

The above solution is correct, but a bit restrictive. After all, func(t, u) could be returning a reference to non-local u, not local t, and so should work. To fix this, introduce the concept of $(D scope ref):

ref T func(scope ref T t, ref T u) {
  return t; // Error: escaping scope ref t
  return u; // ok
}

Scope means that the ref is guaranteed not to escape.

T u;
ref T foo() @safe {
  T t;
  return func(t, u); // Ok, u is not local
  return func(u, t); // Error: escaping scope ref t
}

This minimizes the number of $(D scope) annotations required.

Scope Function Returns

$(D scope) can be applied to function return values (even though it is not a type constructor). It must be applied to the left of the declaration, in the same way $(D ref) is:


int* foo() scope;     // applies to 'this' reference
scope: int* foo();    // applies to 'this' reference
scope { int* foo(); } // applies to 'this' reference
scope int* foo();     // applies to return value

The lifetime of a scope return value is the lifetime of an rvalue. It may not be copied in a way that extends its life.

int* bar(scope int*);
scope int* foo();
...
return foo();         // Error, lifetime(return) > lifetime(foo())
int* p = foo();       // Error, lifetime(p) is $(INF)
bar(foo());           // Ok, lifetime(foo()) > lifetime(bar())
scope int* q = foo(); // error, lifetime(q) > lifetime(rvalue)

This enables scope return values to be safely chained from function to function; in particular it also allows a ref counted struct to safely expose a reference to its wrapped type.


Out Parameters

$(D out) parameters are treated like $(D ref) parameters when $(D scope) is applied.


Expressions

The $(I lifetime) of an expression is either $(INF) or the lifetime of a single variable. Which it is can be statically deduced by looking at the type and AST of the expression.

The root cases are:

$(TABLE1 $(THEAD root case, lifetime) $(TROW address of variable v, lifetime(v)) $(TROW v containing value with references, lifetime(v)) $(TROW otherwise, $(INF)) )


More complex expressions can be reduced to be one of the root cases:


$(TABLE1 $(THEAD expression, lifetime) $(TROW $(D &(*e)), lifetime(e)) $(TROW $(D *(&e + integer)), lifetime(e)) $(TROW $(D *((e1? & e2 : &e3) + integer)), min(lifetime(e2), lifetime(e3))) $(TROW $(D *e), $(INF)) $(TROW $(D e1,e2), lifetime(e2)) $(TROW $(D e1 = e2), lifetime(e1)) $(TROW $(D e1 op= e2), lifetime(e1)) $(TROW $(D e1 ? e2 : e3), min(lifetime(e2), lifetime(e3))) $(TROW $(D ptr + integer), lifetime(ptr)) $(TROW $(D e1 op e2), min(lifetime(e1), lifetime(e2))) $(TROW $(D op e), lifetime(e)) $(TROW $(D e++), lifetime(e)) $(TROW $(D e--), lifetime(e)) $(TROW $(D cast(type) e), lifetime(e)) $(TROW $(D new), $(INF)) $(TROW $(D e.field), lifetime(e)) $(TROW $(D (*e).field), lifetime(*e)) $(TROW $(D e.func(args)), min(lifetime(e, args), $(INF))) $(TROW $(D func(args)), min(lifetime(non-scope-ref args), $(INF))) $(TROW $(D e[]), lifetime(e)) $(TROW $(D e[i..j]), lifetime(e)) $(TROW $(D e[i]), lifetime(*e)) $(TROW $(I ArrayLiteral), min(lifetime(args))) $(TROW $(I ArrayLiteral[constant]), lifetime(ArrayLiteral[constant])) )

Classes

Scope class semantics are equivalent to a pointer to a struct.

Static Arrays

Scope static array semantics are to equivalent to a scope struct:

T[3] a;
struct A { T t0, t1, t2; } A a;

@safe

Errors for scope violations are only reported in @safe code.


Breaking Existing Code

Some code will no longer work. Although inference will take care of a lot of cases, there are still some that will fail.

int i,j;
int* p = &i;  // Ok, scope is inferred for p
int* q;
q = &i;   // Error: too late to infer scope for q

Currently, $(D scope) is ignored except that a new class use to initialize a scope variable allocates the class instance on the stack. Fortunately, this can work with this new proposal, with an optimization that recognizes that if a new class is unique, and assigned to a scope variable, that that instance can be placed on the stack.


Implementation Plan

Turning this on may cause significant breakage, and may also be found to be an unworkable design. Therefore, implementation stages will be:

  • enable new behavior with a compiler switch -scope
  • remove -scope, issue warning when errors are detected
  • replace warnings with deprecation messages
  • replace deprecations with errors