DIP69
Title: | Implement scope for escape proof references | |
---|---|---|
DIP: | 69 | |
Version: | 1 | |
Status: | Draft | |
Created: | 2014-12-04 | |
Last Modified: | 2014-12-04 | |
Authors: | Marc Schütz, deadalnix, Andrei Alexandrescu and Walter Bright | |
Links: | Proposals | Discussions |
Contents
Abstract
A garbage collected language is inherently memory safe. References to data can be passed around without concern for ownership, lifetimes, etc. But this runs into difficulty when combined with other sorts of memory management, like stack allocation, malloc/free allocation, reference counting, etc.
Knowing when the lifetime of a reference is over is critical for safely implementing memory management schemes other than tracing garbage collection. It is also critical for the performance of reference counting systems, as it will expose opportunities for elision of the inc/dec operations.
scope provides a mechanism to guarantee that a reference cannot escape lexical scope.
Benefits
- References to stack variables can no longer escape.
- Delegates currently defensively allocate closures with the GC. Few actually escape, and with scope only those that actually escape need to have the closures allocated.
- @system code like std.internal.scopebuffer can be made @safe.
- Reference counting systems need not adjust the count when passing references that do not escape.
- Better self-documentation of encapsulation.
Definitions
Visibility vs. lifetime
For each value we define two notions:
- Visibility is the lexical extent through which the value can be accessed. Notation: visibility(v).
- For a named variable, the visibility is the lexical scope of the variable per the language rules.
- For an rvalue, the visibility is the expression within it is used.
- Lifetime is the extent during which a value can be safely used. Notation: lifetime(v).
- For types without indirections such as int, accessibility and lifetime are equal.
- For values allocated on the garbage collected heap, lifetime is infinite whilst visibility is dependent on the references in the program bound to those values.
- For an unrestricted pointer, visibility is dictated by lexical scope. Lifetime is dictated by the lifetime of the data to which the pointer points to.
Examples:
void fun1() {
int x; // lifetime(x) and visibility(x) start here
int y = 42; // lifetime(42) and visibility(42) last through the initialization expression
...
// lifetime(y) and visibility(y) end here, before those of x
// lifetime(x) and visibility(x) end here
}
Lifetime
Any object exists for a specific period of time, with a well defined beginning and end point: from the point it is created (memory allocated and possibly constructor call), to the point it is released (possibly a call to the destructor followed by memory reclamation). This is called its lifetime. A reference that points to an object whose lifetime has ended is a dangling reference. Use of such references can cause all kinds of errors, and must therefore be prevented.
The lifetime of values is based on order in which they are introduced, lexical scope, and the static storage class when present. The following rules define a hierarchy of lifetimes:
- Data with static duration is considered to have infinite lifetime. The following data has static duration:
- string literals;
- variables defined at module level;
- variables introduced with static;
- variables introduced with enum declarations that do not construct an enumerated type.
- Inside functions, a variable's lifetime starts at the point of its declaration and ends with the lexical scope it is defined in. Variables are destroyed in reverse order of their creation. Therefore a variable defined in a scope has a strictly longer lifetime than those defined after it within the same scope. Variables in nested scopes have lifetimes correspondingly nested.
- An rvalue's lifetime is temporary; it lives till the end of the largest expression that it appears in.
- Rvalues are created in left-to-right lexical order and destroyed in order reverse of creation.
- Members of a struct, class, or statically-sized array have lifetimes enclosed within the lifetime of the aggregate itself and ordered lexically as well.
Given any two values v1 and v2, exactly one of these statements is true:
- v1 has a strictly longer lifetime than v2, meaning v1 is constructed before v2 and destroyed after v2;
- v2 has a strictly longer lifetime than v1, meaning v2 is constructed before v1 and destroyed after v1;
- Both v1 and v2 have infinite lifetime.
It is never the case that two lifetimes intersect, i.e. it is impossible to create values v1 and v2 such that v1 is constructed before v2 but v2 is destroyed after v1. (This concerns language rules; manual calls to constructors, destructors, and/or memory allocation functions may impose arbitrary orderings. Also, pointers in @system code may refer to data of arbitrary lifetime.)
We define a total ordering of lifetimes: either lifetime(v1) < lifetime(v2), lifetime(v1) > lifetime(v2), or both lifetime(v1) and lifetime(v2) are infinite (in which case we consider the lifetimes vacuously equal). We casually say "v1 is shorter-lived than v2" (or "v2 is longer-lived than v1") if lifetime(v1) <= lifetime(v2).
By consequence of the above, inside a function:
- Parameters passed by ref or out are conservatively assumed to have lifetime somewhere in the caller's scope;
- Parameters passed by value have shorter lifetime than those passed by passed by ref/out, but longer than any locals defined by the function. The lifetimes of by-value parameters are ordered lexically.
Algebra of Lifetimes
Certain expressions create values of which lifetime is in relationship with the participating value lifetimes, as follows:
expression | lifetime | notes |
---|---|---|
&e | lifetime(e) | |
&*e | lifetime(e) | |
e + integer | lifetime(e) | Applies only when e is a pointer type |
e - integer | lifetime(e) | Applies only when e is a pointer type |
*e | ∞ | lifetime is not transitive |
e1, e2 | lifetime(e2) | |
e1 = e2 | lifetime(e1) | |
e1 op= e2 | lifetime(e1) | |
e1 ? e2 : e3 | min(lifetime(e2), lifetime(e3)) | |
e++ | lifetime(e) | Applies only when e is a pointer type. This has academic value only because pointer increment is disallowed in @safe code. |
e-- | lifetime(e) | Applies only when e is a pointer type. This has academic value only because pointer decrement is disallowed in @safe code. |
cast(T) e | lifetime(e) | Applies only when both T and e have pointer type. |
new | ∞ | Allocates on the GC heap. |
e.field | lifetime(e) | |
e.func(args) | See section dedicated to discussing methods. | |
func(args) | See section dedicated to discussing functions. | |
e[] | lifetime(e) | |
e[i..j] | lifetime(e) | |
&e[i] | lifetime(e) | |
e[i] | ∞ | |
ArrayLiteral | ∞ | Array literals are allocated on the GC heap |
ArrayLiteral[constant] | ∞ |
Ownership
A variable owns the data it contains if, when the lifetime of the variable is ended, the data can be destroyed. There can be at most one owner for any piece of data.
Ownership may be transitive, meaning anything reachable through the owned data is also owned, or head where only the top level reference is owned.
View
Other references to owned data are called views. Views must not survive the end of the lifetime of the owner of the data. A view v2 may be taken of another view v1, and the lifetime of v2 must be subset of the lifetime of v1. Views may also be transitive or head.
Aggregates
The following sections define scope working on primitive types (such as int) and pointers thereof (such as int*). This is without loss of generality because aggregates can be handled by decomposition as follows:
- From a lifetime analysis viewpoint, a struct is considered a juxtaposition of its direct members. Passing a struct by value into a function is equivalent to passing each of its members by value. Passing a struct by ref is equivalent to passing each of its members by ref. Finally, passing a pointer to a struct is analyzed as passing a pointer to each of its members. Example:
struct A { int x; float y; }
void fun(A a); // analyzed similarly to fun(int x, float y);
void gun(ref A a); // analyzed similarly to gun(ref int x, ref float y);
void hun(A* a); // analyzed similarly to hun(int* x, float* y);
- Lifetimes of statically-sized arrays T[n] is analyzed as if the array were a struct with n fields, each of type T.
- Lifetimes of built-in dynamically-sized slices T[] are analyzed as structs with two fields, one of type T* and the other of type size_t.
- Analysis of lifetimes of class types is similar to analysis of pointers to struct types.
- For struct members of aggregate type, decomposition may continue transitively.
Fundamentals of scope
The scope storage class ensures that the lifetime of a pointer/reference is a shorter of the lifetime of the referred object. Dereferencing through a scope variable is guaranteed to be safe.
scope is a storage class, and affects declarations. It is not a type qualifier. There is no change to existing scope grammar. It fits in the grammar as a storage class.
scope affects:
- local variables allocated on the stack
- function parameters
- non-static member functions (applying to the this implicit parameter)
- delegates (applying to their implicit environment)
- return value of functions
It is ignored for other declarations. It is ignored for declarations that have no indirections.
scope enum e = 3; // ignored, no indirections
scope int i; // ignored no indirections
The scope storage class affects variables according to these rules:
- A scope variable can only be initialized and assigned from values that have lifetimes longer than the variable's lifetime. (As a consequence a scope variable can only be assigned to scope variables that have shorter lifetime.)
- A variable is inferred to be scope if it is initialized with a value that has a non-∞ lifetime.
- A scope variable cannot be initialized with the address of a scope variable.
- A scope ref parameter can be initialized with another scope ref parameter—scope ref is idempotent.
Examples for each rule:
int global_var;
int* global_ptr;
void bar(scope int* input);
void fun1() {
scope int* a = &global_var; // OK per rule 1, lifetime(&global_var) > lifetime(a)
a = &global_var; // OK per rule 1, lifetime(&global_var) > lifetime(a)
int b;
a = &b; // Disallowed per rule 1, lifetime(&b) < lifetime(a)
scope c = &b; // OK per rule 1, lifetime(&b) > lifetime(c)
int* b;
a = b; // Disallowed per rule 1, lifetime(b) < lifetime(a)
}
void fun2() {
auto a = &global_var; // OK, b is a regular int*
int b;
auto c = &b; // Per rule 2, c has scope storage class
}
void fun3(scope int * p1) {
scope int** p2 = &p1; // Disallowed per rule 3
scope int* p3;
scope int** p4 = &p3; // Disallowed per rule 3
}
void fun4(scope int * p1) {
bar(p1); // OK per rule 4
}
A few more examples combining the rules:
int global_var;
int* global_ptr;
void bar(scope int* input);
void foo() {
scope int* a;
a = &global_var; // Ok, `global_var` has a greater lifetime than `a`
scope b = &global_var; // Ok, type deduction
int c;
if(...) {
scope x = a; // Ok, copy of reference,`x` has shorter lifetime than `a`
scope y = &c; // Ok, lifetime(y) < lifetime(& c)
int z;
b = &z; // Error, `b` will outlive `z`
int* d = a; // Ok: d is inferred to be `scope`
}
bar(a); // Ok, scoped pointer is passed to scoped parameter
bar(&c); // Ok, lifetime(parameter input) < lifetime(c)
int* e;
e = &c; // Error, lifetime(e's view) is ∞ and is greater than lifetime(c)
a = e; // Ok, lifetime(a) < lifetime(e)
scope int** f = &a; // Error, rule 4
scope int** h = &e; // Ok
int* j = *h; // Ok, scope is not transitive
}
void abc() {
scope int* a;
int* b;
scope ref int* c = a; // Error, rule 5
scope ref int* d = b; // Ok
int* i = a; // Ok, scope is inferred for i
global_ptr = d; // Error, lifetime(d) < lifetime(global_ptr)
global_ptr = i; // Error, lifetime(i) < lifetime(global_ptr)
int* j;
global_ptr = j; // Ok, j is not scope
}
Interaction of scope with the return Statement
A value containing indirections and annotated with scope cannot be returned from a function.
class C { ... }
C fun1() {
scope C c;
...
return c; // Error
}
int fun2() {
scope int i;
...
return i; // Ok, i has no indirections
}
scope int* fun3() {
scope int* p;
return p; // Error
return p+1; // Error, nice try!
return &*p; // Error, won't work either
}
ref int func(scope ref int r, scope out int s)
{
return r; // Error
return s; // Error, 'out' is treated like 'ref'
}
Functions
Inference
scope is inferred for function parameters if not specified, under the same circumstances as pure, nothrow, @nogc and @safe are inferred. Scope is not inferred for virtual functions.
Overloading
scope does not affect overloading. If it did, then whether a variable was scope or not would affect the code path, making scope inference impractical. It also makes turning scope checking on/off impractical.
T func(scope ref T);
T func(ref T);
T t; func(t); // Error, ambiguous
scope T u; func(u); // Error, ambiguous
Implicit Conversion of Function Pointers and Delegates
scope can be added to parameters, but not removed.
alias int function(ref T) fp_t;
alias int function(scope ref T) fps_t;
int foo(ref T);
int bar(scope ref T);
fp_t fp = &bar; // Ok, scope behavior is subset of non-scope
fps_t fp = &foo; // Error, fps_t demands scope behavior
Inheritance
Overriding functions inherit any scope annotations from their antecedents. Scope is covariant, meaning it can be added to overriding functions.
class C
{
int foo(ref T);
int bar(scope ref T);
}
class D : C
{
override int foo(scope ref T); // Ok, can add scope
override int bar(ref T); // Error, cannot remove scope
}
Mangling
Scope will require additional mangling, as it affects the interface of the function. In cases where scope is ignored, it does not contribute to the mangling. Scope parameters will be mangled with ???.
Nested Functions
Nested functions have more objects available than just their arguments:
ref T foo() {
T t;
ref T func() { return t; }
return func(); // disallowed
}
Nested functions are analyzed as if each variable accessed outside of its scope was passed as a ref parameter. All parameters have scope inferred from how they are used in the function body.
Ref
Escaping via Return
The simple cases of this are disallowed:
T* func(T t) {
T u;
return &t; // Error: escaping reference to local t
return &u; // Error: escaping reference to local u
}
But are easily circumvented:
T* func(T t) {
T* p = &t;
return p; // no error detected
}
@safe currently deals with this by preventing taking the address of a local:
T* func(T t) @safe {
T* p = &t; // Error: cannot take address of parameter t in @safe function func
return p;
}
This is restrictive. The ref storage class was introduced which defines a special purpose pointer. ref can only appear in certain contexts, in particular function parameters and returns, only applies to declarations, cannot be stored, and cannot be incremented.
ref T func(T t) @safe {
return t; // Error: escaping reference to local variable t
}
Ref can be passed down to functions:
void func(ref T t) @safe;
void bar(ref T t) @safe {
func(t); // ok
}
But the following idiom is far too useful to be disallowed:
ref T func(ref T t) {
return t; // ok
}
And if it is misused it can result in stack corruption:
ref T foo() {
T t;
return func(t); // currently, no error detected, despite returning pointer to t
}
The:
return func(t);
case is detected by all of the following conditions being true:
- foo() returns by reference
- func() returns by reference
- func() has one or more parameters that are by reference
- 1 or more of the arguments to those parameters are stack objects local to foo()
- Those arguments can be @safe-ly converted from the parameter to the return type.
For example, if the return type is larger than the parameter type, the return type cannot be a reference to the argument. If the return type is a pointer, and the parameter type is a size_t, it cannot be a reference to the argument. The larger a list of these cases can be made, the more code will pass @safe checks without requiring further annotation.
Scope Ref
The above solution is correct, but a bit restrictive. After all, func(t, u) could be returning a reference to non-local u, not local t, and so should work. To fix this, introduce the concept of scope ref:
ref T func(scope ref T t, ref T u) {
return t; // Error: escaping scope ref t
return u; // ok
}
Scope means that the ref is guaranteed not to escape.
T u;
ref T foo() @safe {
T t;
return func(t, u); // Ok, u is not local
return func(u, t); // Error: escaping scope ref t
}
This minimizes the number of scope annotations required.
Scope Function Returns
scope can be applied to function return values (even though it is not a type qualifier). It must be applied to the left of the declaration, in the same way ref is:
int* foo() scope; // applies to 'this' reference
scope: int* foo(); // applies to 'this' reference
scope { int* foo(); } // applies to 'this' reference
scope int* foo(); // applies to return value
The lifetime of a scope return value is the lifetime of an rvalue. It may not be copied in a way that extends its life.
int* bar(scope int*);
scope int* foo();
...
return foo(); // Error, lifetime(return) > lifetime(foo())
int* p = foo(); // Error, lifetime(p) is ∞
bar(foo()); // Ok, lifetime(foo()) > lifetime(bar())
scope int* q = foo(); // error, lifetime(q) > lifetime(rvalue)
This enables scope return values to be safely chained from function to function; in particular it also allows a ref counted struct to safely expose a reference to its wrapped type.
Out Parameters
out parameters are treated like ref parameters when scope is applied.
Classes
Scope class semantics are equivalent to a pointer to a struct.
Static Arrays
Scope static array semantics are equivalent to a scope struct:
T[3] a;
struct A { T t0, t1, t2; } A a;
@safe
Errors for scope violations are only reported in @safe code.
Breaking Existing Code
Some code will no longer work. Although inference will take care of a lot of cases, there are still some that will fail.
int i,j;
int* p = &i; // Ok, scope is inferred for p
int* q;
q = &i; // Error: too late to infer scope for q
Currently, scope is ignored except that a new class use to initialize a scope variable allocates the class instance on the stack. Fortunately, this can work with this new proposal, with an optimization that recognizes that if a new class is unique, and assigned to a scope variable, then that instance can be placed on the stack.
Implementation Plan
Turning this on may cause significant breakage, and may also be found to be an unworkable design. Therefore, implementation stages will be:
- enable new behavior with a compiler switch -scope
- remove -scope, issue warning when errors are detected
- replace warnings with deprecation messages
- replace deprecations with errors