Difference between revisions of "DIP69"

From D Wiki
Jump to: navigation, search
m (Fix subscript tags)
m (Fix some obvious typos, link anchor: attribute#scope, class D : C)
Line 98: Line 98:
  
 
<tt>scope</tt> is a storage class, and affects declarations. It is not a type constructor.
 
<tt>scope</tt> is a storage class, and affects declarations. It is not a type constructor.
There is no change to existing [http://dlang.org/attribute <tt>scope</tt> grammar]. It fits in the grammar as a storage class.
+
There is no change to existing [http://dlang.org/attribute#scope <tt>scope</tt> grammar]. It fits in the grammar as a storage class.
  
 
Scope affects:
 
Scope affects:
Line 241: Line 241:
 
}
 
}
  
class D
+
class D : C
 
{
 
{
 
     override int foo(scope ref T); // Ok, can add scope
 
     override int foo(scope ref T); // Ok, can add scope
Line 468: Line 468:
 
=== Static Arrays ===
 
=== Static Arrays ===
  
Scope static array semantics are to equivalent to a scope struct:
+
Scope static array semantics are equivalent to a scope struct:
  
 
<syntaxhighlight lang="D">
 
<syntaxhighlight lang="D">
Line 493: Line 493:
 
Currently, <tt>scope</tt> is ignored except that a new class use to initialize a scope variable allocates the class
 
Currently, <tt>scope</tt> is ignored except that a new class use to initialize a scope variable allocates the class
 
instance on the stack. Fortunately, this can work with this new proposal, with an optimization that recognizes
 
instance on the stack. Fortunately, this can work with this new proposal, with an optimization that recognizes
that if a new class is unique, and assigned to a scope variable, that that instance can be placed on the stack.
+
that if a new class is unique, and assigned to a scope variable, then that instance can be placed on the stack.
  
 
=== Implementation Plan ===
 
=== Implementation Plan ===

Revision as of 18:15, 4 December 2014

Title: Implement scope for escape proof references
DIP: 69
Version: 1
Status: Draft
Created: 2014-12-04
Last Modified: 2014-12-04
Authors: Marc Schütz, deadalnix, Andrei Alexandrescu and Walter Bright
Links: Proposals Discussions


Abstract

A garbage collected language is inherently memory safe. References to data can be passed around without concern for ownership, lifetimes, etc. But this runs into difficulty when combined with other sorts of memory management, like stack allocation, malloc/free allocation, reference counting, etc.

Knowing when the lifetime of a reference is over is critical for safely implementing memory management schemes other than GC. It is also critical for the performance of reference counting systems, as it will expose opportunities for elision of the inc/dec operations.

scope provides a mechanism to guarantee that a reference cannot escape lexical scope.

Benefits

  • References to stack variables can no longer escape.
  • Delegates currently defensively allocate closures with the GC. Few actually escape, and with scope only those that actually escape need to have the closures allocated.
  • @system code like std.internal.scopebuffer can be made @safe.
  • Reference counting systems need not adjust the count when passing references that do not escape.
  • Better self-documentation of encapsulation.

Definitions

Lifetime

Any object exists for a specific period of time, with a well defined beginning and end point: from the point it is created (constructed), to the point it is released (destroyed). This is called its lifetime. A reference that points to an object whose lifetime has ended is a dangling reference. Use of such references can cause all kinds of errors, and must therefore be prevented.

The lifetime of variables is based purely on their lexical scope and order of declaration. The following rules define a hierarchy of lifetimes:

  • A variable's lifetime starts at the point of its declaration, and ends with the lexical scope it is defined in.
  • An (rvalue) expression's lifetime is temporary; it lives till the end of the statement that it appears in.
  • The lifetime of A is higher than that of B, if A appears in a higher scope than B, or if both appear in the same scope, but A comes lexically before B. This matches the order of destruction of local variables.
  • The lifetime of a function parameter is higher than that of that function's local variables, but lower than any variables in higher scopes.


Ownership

A variable owns the data it contains if, when the lifetime of the variable is ended, the data can be destroyed. There can be at most one owner for any piece of data.

Ownership may be transitive, meaning anything reachable through the owned data is also owned, or head where only the top level reference is owned.

View

Other references to owned data are called views. Views must not survive the end of the lifetime of the owner of the data. A view v2 may be taken of another view v1, and the lifetime of v2 must be subset of the lifetime of v1. Views may also be transitive or head.

Scope Fundamentals

The purpose of scope is to provide a means for ensuring the lifetime of a viewer is a subset of the lifetime of the viewee.

scope is a storage class, and affects declarations. It is not a type constructor. There is no change to existing scope grammar. It fits in the grammar as a storage class.

Scope affects:

  • local variables allocated on the stack
  • function parameters
  • non-static member functions (applying to the this)
  • delegates (applying to the this)
  • return value of functions

It is ignored for other declarations. It is ignored for declarations which are not views.

scope enum e = 3;  // scope is ignored for enums
scope int i;       // scope is ignored because integers are not references and so are not views

Scope affects variables according to these rules:

  1. Scope variables can only be assigned values that have lifetimes that are a superset of the variable's lifetime.
  2. Scope variables can only be assigned to scope variables with a lifetime that is a subset.
  3. A variable is inferred to be scope if it is initialized with a view that has a non-∞ lifetime.
  4. A scope variable cannot be initialized with the address of a scoped variable.
  5. A scope ref variable can be initialized with another scope ref variable - scope ref is idempotent.


Basic operation:

int global_var;
int* global_ptr;
 
void bar(scope int* input);
 
void foo() {
    scope int* a;
    a = &global_var;       // Ok, `global_var` has a greater lifetime than `a`
    scope b = &global_var; // Ok, type deduction
    int c;
 
    if(...) {
        scope x = a;       // Ok, copy of reference,`x` has shorter lifetime than `a`
        scope y = &c;      // Ok, lifetime(y) < lifetime(& c)
        int z;
        b = &z;            // Error, `b` will outlive `z`
        int* d = a;        // Ok: d is inferred to be `scope`
    }
 
    bar(a);                // Ok, scoped pointer is passed to scoped parameter
    bar(&c);               // Ok, lifetime(parameter input) < lifetime(c)
    int* e;
    e = &c;                // Error, lifetime(e's view) is &infin; and is greater than lifetime(c)
    a = e;                 // Ok, lifetime(a) < lifetime(e)
    scope int** f = &a;    // Error, rule 5
    scope int** h = &e;    // Ok
    int* j = *h;           // Ok, scope is not transitive
}

void abc() {
    scope int* a;
    int* b;
    scope ref int* c = a;  // Error, rule 5
    scope ref int* d = b;  // Ok
    int* i = a;            // Ok, scope is inferred for i
    global_ptr = d;        // Error, lifetime(d) < lifetime(global_ptr)
    global_ptr = i;        // Error, lifetime(i) < lifetime(global_ptr)
    int* j;                // Ok, scope is inferred for i
    global_ptr = j;        // Ok, j is not scope
}

Return Statement

A view annotated with scope cannot be returned from a function.

class C { ... }
scope C c;
return c;   // Error

scope int i;
return i;   // Ok, i is not a view

scope int* p;
return p;   // Error
return p+1; // Error, nice try!
return &*p; // Error, won't work either

ref int func(scope ref int r, scope out int s)
{
    return r; // Error
    return s; // Error, 'out' is treated like 'ref'
}

Functions

Inference

Scope is inferred for function parameters if not specified, under the same circumstances as pure, nothrow, @nogc and safety are inferred.

Overloading

Scope does not affect overloading. If it did, then whether a variable was scope or not would affect the code path, making scope inference impractical. It also makes turning scope checking on/off impractical.

T func(scope ref T);
T func(ref T);

T t; func(t); // Error, ambiguous
scope T u; func(u); // Error, ambiguous


Implicit Conversion of Function Pointers and Delegates

scope can be added to parameters, but not removed.

alias int function(ref T) fp_t;
alias int function(scope ref T) fps_t;

int foo(ref T);
int bar(scope ref T);

fp_t fp = &bar;   // Ok, scope behavior is subset of non-scope
fps_t fp = &foo;  // Error, fps_t demands scope behavior

Inheritance

Overriding functions inherit any scope annotations from their antecedents. Scope is covariant, meaning it can be added to overriding functions.

class C
{
    int foo(ref T);
    int bar(scope ref T);
}

class D : C
{
    override int foo(scope ref T); // Ok, can add scope
    override int bar(ref T);       // Error, cannot remove scope
}

Mangling

Scope will require additional mangling, as it affects the interface of the function. In cases where scope is ignored, it does not contribute to the mangling. Scope parameters will be mangled with ???.

Nested Functions

Nested functions have more objects available than just their arguments:

ref T foo() {
  T t;
  ref T func() { return t; }
  return func();  // disallowed
}

Nested functions are analyzed as if each variable accessed outside of its scope was passed as a ref parameter. All parameters have scope inferred from how they are used in the function body.


Ref

Escaping via Return

The simple cases of this are disallowed:

T* func(T t) {
  T u;
  return &t; // Error: escaping reference to local t
  return &u; // Error: escaping reference to local u
}

But are easily circumvented:

T* func(T t) {
  T* p = &t;
  return p;  // no error detected
}

@safe currently deals with this by preventing taking the address of a local:

T* func(T t) @safe {
  T* p = &t; // Error: cannot take address of parameter t in @safe function func
  return p;
}

This is restrictive. The ref storage class was introduced which defines a special purpose pointer. ref can only appear in certain contexts, in particular function parameters and returns, only applies to declarations, cannot be stored, and cannot be incremented.

ref T func(T t) @safe {
  return t; // Error: escaping reference to local variable t
}

Ref can be passed down to functions:

void func(ref T t) @safe;
void bar(ref T t) @safe {
   func(t); // ok
}

But the following idiom is far too useful to be disallowed:

ref T func(ref T t) {
  return t; // ok
}

And if it is misused it can result in stack corruption:

ref T foo() {
  T t;
  return func(t); // currently, no error detected, despite returning pointer to t
}

The:

return func(t);

case is detected by all of the following conditions being true:


  • foo() returns by reference
  • func() returns by reference
  • func() has one or more parameters that are by reference
  • 1 or more of the arguments to those parameters are stack objects local to foo()
  • Those arguments can be @safe-ly converted from the parameter to the return type.

For example, if the return type is larger than the parameter type, the return type cannot be a reference to the argument. If the return type is a pointer, and the parameter type is a size_t, it cannot be a reference to the argument. The larger a list of these cases can be made, the more code will pass @safe checks without requiring further annotation.

Scope Ref

The above solution is correct, but a bit restrictive. After all, func(t, u) could be returning a reference to non-local u, not local t, and so should work. To fix this, introduce the concept of scope ref:

ref T func(scope ref T t, ref T u) {
  return t; // Error: escaping scope ref t
  return u; // ok
}

Scope means that the ref is guaranteed not to escape.

T u;
ref T foo() @safe {
  T t;
  return func(t, u); // Ok, u is not local
  return func(u, t); // Error: escaping scope ref t
}

This minimizes the number of scope annotations required.

Scope Function Returns

scope can be applied to function return values (even though it is not a type constructor). It must be applied to the left of the declaration, in the same way ref is:


int* foo() scope;     // applies to 'this' reference
scope: int* foo();    // applies to 'this' reference
scope { int* foo(); } // applies to 'this' reference
scope int* foo();     // applies to return value

The lifetime of a scope return value is the lifetime of an rvalue. It may not be copied in a way that extends its life.

int* bar(scope int*);
scope int* foo();
...
return foo();         // Error, lifetime(return) > lifetime(foo())
int* p = foo();       // Error, lifetime(p) is &infin;
bar(foo());           // Ok, lifetime(foo()) > lifetime(bar())
scope int* q = foo(); // error, lifetime(q) > lifetime(rvalue)

This enables scope return values to be safely chained from function to function; in particular it also allows a ref counted struct to safely expose a reference to its wrapped type.

Out Parameters

out parameters are treated like ref parameters when scope is applied.

Expressions

The lifetime of an expression is either ∞ or the lifetime of a single variable. Which it is can be statically deduced by looking at the type and AST of the expression.

The root cases are:

root caselifetime
address of variable vlifetime(v)
v containing value with referenceslifetime(v)
otherwise


More complex expressions can be reduced to be one of the root cases:

expressionlifetime
&(*e)lifetime(e)
*(&e + integer)lifetime(e)
*((e1? & e2 : &e3) + integer)min(lifetime(e2), lifetime(e3))
*e
e1,e2lifetime(e2)
e1 = e2lifetime(e1)
e1 op= e2lifetime(e1)
e1 ? e2 : e3min(lifetime(e2), lifetime(e3))
ptr + integerlifetime(ptr)
e1 op e2min(lifetime(e1), lifetime(e2))
op elifetime(e)
e++lifetime(e)
e--lifetime(e)
cast(type) elifetime(e)
new
e.fieldlifetime(e)
(*e).fieldlifetime(*e)
e.func(args)min(lifetime(e, args), ∞)
func(args)min(lifetime(non-scope-ref args), ∞)
e[]lifetime(e)
e[i..j]lifetime(e)
e[i]lifetime(*e)
ArrayLiteralmin(lifetime(args))
ArrayLiteral[constant]lifetime(ArrayLiteral[constant])

Classes

Scope class semantics are equivalent to a pointer to a struct.

Static Arrays

Scope static array semantics are equivalent to a scope struct:

T[3] a;
struct A { T t0, t1, t2; } A a;

@safe

Errors for scope violations are only reported in @safe code.

Breaking Existing Code

Some code will no longer work. Although inference will take care of a lot of cases, there are still some that will fail.

int i,j;
int* p = &i;  // Ok, scope is inferred for p
int* q;
q = &i;   // Error: too late to infer scope for q

Currently, scope is ignored except that a new class use to initialize a scope variable allocates the class instance on the stack. Fortunately, this can work with this new proposal, with an optimization that recognizes that if a new class is unique, and assigned to a scope variable, then that instance can be placed on the stack.

Implementation Plan

Turning this on may cause significant breakage, and may also be found to be an unworkable design. Therefore, implementation stages will be:

  • enable new behavior with a compiler switch -scope
  • remove -scope, issue warning when errors are detected
  • replace warnings with deprecation messages
  • replace deprecations with errors