Difference between revisions of "User:Schuetzm/scope2"
Line 2: | Line 2: | ||
The current D language specification reserves the '''scope''' keyword in function signatures to specify that a parameter will not be escaped by the function, making it @safe to pass references to local variables or manually managed memory to it, among other things. This feature is currently unimplemented, apart from its use with lambdas where it guarantees the closure will be allocated on the stack instead of the GC. This proposal intends to change that. It will allow the safe and efficient implementation of various memory management strategies (including reference counting), as well as unified handling of references to GC, reference counted data, local variables, containers, and others. | The current D language specification reserves the '''scope''' keyword in function signatures to specify that a parameter will not be escaped by the function, making it @safe to pass references to local variables or manually managed memory to it, among other things. This feature is currently unimplemented, apart from its use with lambdas where it guarantees the closure will be allocated on the stack instead of the GC. This proposal intends to change that. It will allow the safe and efficient implementation of various memory management strategies (including reference counting), as well as unified handling of references to GC, reference counted data, local variables, containers, and others. | ||
+ | |||
+ | The proposal is mostly a superset of [[DIP25]], but is generalized to all types of references and adds inference to alleviate the need for explicit annotations. | ||
+ | |||
+ | === Basics === | ||
+ | |||
+ | '''scope''' is a storage class; it will only be applicable to parameters in function signatures (which include the implicit '''this''' parameter for methods, as well as the context pointer for delegates). It will have the semantics one expects: when a function with a scope parameter returns, the corresponding argument will not have been stored in a global variable or on the heap, etc: | ||
+ | |||
+ | <source lang="D"> | ||
+ | void sendData(scope ubyte[] data); | ||
+ | void someOtherFunction(ubyte[] data); | ||
+ | |||
+ | void main() { | ||
+ | ubyte[1024] chunkOfData = ...; | ||
+ | sendData(chunkOfData); | ||
+ | // this is @safe: no reference to the local have escaped | ||
+ | someOtherFunction(chunkOfData); | ||
+ | // this is @system: the callee gives no guarantees about `data` | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | As we can see, certain operations, like taking the address of a local (or slicing of a fixed-size array, which is equivalent), no longer need to be @system per se. Instead, it's what is done with the resulting reference that decides whether it's @system or @safe. | ||
+ | |||
+ | === Implicit '''scope''' and opt-out === | ||
+ | |||
+ | To reduce the need for manual annotations, @safe functions take all their reference parameters as '''scope'''. '''ref''' also implies '''scope''', and the '''this''' parameter is always passed as '''scope''' unless opted out. Because sometimes a @safe function may actually want to accept non-scope params, there is an opt-out in the form of '''static'''. Coupled with scope inference for templates, and an optional change like "@safe by default" (currently being discussed), this will get rid of most explicit scope annotations: | ||
+ | |||
+ | <source lang="D"> | ||
+ | void doSomething(int[] data) @safe; | ||
+ | // equivalent to: | ||
+ | void doSomething(scope int[] data) @safe; | ||
+ | |||
+ | void foo(int[] input, static int* output) @safe; | ||
+ | // `input` is scope, `output` isn't | ||
+ | |||
+ | void bar(ref MyStruct s) @safe; | ||
+ | // equivalent to: | ||
+ | void bar(scope ref MyStruct s) @safe; | ||
+ | </source> | ||
+ | |||
+ | === '''scope''' for value types & overloading === | ||
+ | |||
+ | '''scope''' applies to all types with indirections: pointers, slices, class references, '''ref''' parameters, delegates, and aggregates containing such. Functions and methods can be overloaded on '''scope'''. This allows efficient passing of RC wrappers for instance: | ||
+ | |||
+ | <source lang="D"> | ||
+ | struct RC(T) if(is(T == class)) { | ||
+ | // ... | ||
+ | this(scope this) { | ||
+ | // increment refcount | ||
+ | count++; | ||
+ | } | ||
+ | ~this() { | ||
+ | // decrement refcount | ||
+ | if(--count == 0) | ||
+ | destroy(payload); | ||
+ | } | ||
+ | this(scope this) scope { | ||
+ | // DON'T increment refcount | ||
+ | } | ||
+ | ~this() scope { | ||
+ | // DON'T decrement refcount | ||
+ | } | ||
+ | // magic, to be explained later | ||
+ | alias borrow this; | ||
+ | } | ||
+ | |||
+ | void foo(scope MyClass object); | ||
+ | |||
+ | RC!MyClass global; | ||
+ | void bar(scope RC!MyClass object) { | ||
+ | if(some_condition) | ||
+ | global = object; // make a copy, adjust refcount | ||
+ | } | ||
+ | |||
+ | void main() { | ||
+ | RC!MyClass x = ...; | ||
+ | // auto conversion to MyClass, no refcount update: | ||
+ | foo(x); | ||
+ | // no refcount update at call site, | ||
+ | // no needless double indirection with `ref`: | ||
+ | bar(x); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | All of this can be implemented in user code or in the standard library. The compiler doesn't need to be aware of reference counting. | ||
+ | |||
+ | === Implicit conversions === | ||
+ | |||
+ | A '''scope''' parameter doesn't care how the data it refers to has been allocated. All it requires is that the reference stays valid for the duration of the function call. Therefore, it's a perfect fit for library functions. They don't need to be templated to support different resource management strategies of the library's user. It acts as a bridge between different types of strategies, just like '''const''' acts as a bridge between mutable and immutable data. | ||
+ | |||
+ | <source lang="D"> | ||
+ | // no template bloat, no knowledge about RC etc.: | ||
+ | double computeAverage(scope int[] data); | ||
+ | |||
+ | void main() { | ||
+ | int[20] local = [1,2,3,...]; | ||
+ | writeln(computeAverage(local)); // OK | ||
+ | int[] heap = ...; | ||
+ | writeln(computeAverage(heap)); // OK | ||
+ | RC!(int[]) rc = ...; | ||
+ | writeln(computeAverage(rc)); // OK | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | This is achieved by allowing non-scoped types to convert to '''scope''' implicitly. For builtin references, the language does this automatically. User-defined types must opt in by defining an appropriate '''alias this'''. | ||
+ | |||
+ | === Returning scoped parameters === | ||
+ | |||
+ | Some functions want to return a parameter that is passed in, or something reachable through one, e.g. a member of '''this'''. They can express this by annotating the parameter with the keyword '''return''', just as in [[DIP25]]. To specify that the value is returned through another parameter, the '''return!ident''' syntax can be used. These annotations can be used multiple times per parameter, when the reference can be returned through several other parameters: | ||
+ | |||
+ | <source lang="D"> | ||
+ | struct RC(T) if(is(T == class)) { | ||
+ | scope T payload; | ||
+ | T borrow() return { // `return` applies to `this` | ||
+ | return payload; | ||
+ | } | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | To prevent accidental access to a member (e.g. ''payload'' in the above example), the member can be annotated with '''scope'''. The compiler will then treat it as if it we're alway accessed through an appropriately annotated property that returns a (scoped) reference to it. | ||
+ | |||
+ | The compiler will make sure the returned value is not used in any way that is un-@safe. In particular, it will verify that the returned references' lifetimes won't exceed the lifetimes of the arguments they're coming from. | ||
+ | |||
+ | === '''scope''' inference === | ||
+ | |||
+ | For templates and nested functions, the compiler can infer the scope annotations, just as it infers purity and @safe-ty. Generic code therefore rarely needs any explicit annotations: | ||
+ | |||
+ | <source lang="D"> | ||
+ | T* foo(T)(T* a, T* b) { | ||
+ | static T* cache; | ||
+ | cache = b; | ||
+ | return a; | ||
+ | } | ||
+ | |||
+ | // `foo!int` will be inferred as: | ||
+ | int* foo_int(scope int* a return, static int* b); | ||
+ | // (`static` is the default anyway, only here for clarity) | ||
+ | </source> | ||
+ | |||
+ | === Multiple indirections === | ||
+ | |||
+ | Multiple indirections are also handled in a way that preserves the guarantees about lifetimes. Because '''scope''' is not a type modifier, it cannot encode information about the lifetimes of objects behinds multiple indirections. Therefore, the compiler must be conservative. For the left-hand side of assignments, it must assume that the destination has infinite lifetime, while for the right-hand side, it must assume that the destination will vanish as soon as the reference through which it is accessed goes out of scope. | ||
+ | |||
+ | === @safe-ty violations with borrowing === | ||
+ | |||
+ | When borrowing is combined with explicit, non lexical-scope based memory management (of which reference counting is one form), there will inevitably be problems as the one discussed in [http://forum.dlang.org/post/huspgmeupgobjubtsmfe@forum.dlang.org this forum thread]. To deal with them in a safe way requires some kind of data flow and aliasing analysis. Rust is an example of a language that uses very sophisticated analysis algorithms for that. This proposal will include a simplified algorithm to detect potentially unsafe uses at compile time, at the cost of detecting some false positives, for which there will however be workarounds. Instead of disallowing these operations, they will be treated as @system. It is therefore up to the end user to decide how to deal with them. | ||
== Details == | == Details == | ||
Line 96: | Line 241: | ||
private: | private: | ||
scope E[] array; // this is the only explicit annotation | scope E[] array; // this is the only explicit annotation | ||
+ | // enforces treatment as `scope`, to avoid | ||
+ | // accidental access as unscoped | ||
int* count; | int* count; | ||
} | } |
Revision as of 13:27, 15 March 2015
Contents
Overview
The current D language specification reserves the scope keyword in function signatures to specify that a parameter will not be escaped by the function, making it @safe to pass references to local variables or manually managed memory to it, among other things. This feature is currently unimplemented, apart from its use with lambdas where it guarantees the closure will be allocated on the stack instead of the GC. This proposal intends to change that. It will allow the safe and efficient implementation of various memory management strategies (including reference counting), as well as unified handling of references to GC, reference counted data, local variables, containers, and others.
The proposal is mostly a superset of DIP25, but is generalized to all types of references and adds inference to alleviate the need for explicit annotations.
Basics
scope is a storage class; it will only be applicable to parameters in function signatures (which include the implicit this parameter for methods, as well as the context pointer for delegates). It will have the semantics one expects: when a function with a scope parameter returns, the corresponding argument will not have been stored in a global variable or on the heap, etc:
void sendData(scope ubyte[] data);
void someOtherFunction(ubyte[] data);
void main() {
ubyte[1024] chunkOfData = ...;
sendData(chunkOfData);
// this is @safe: no reference to the local have escaped
someOtherFunction(chunkOfData);
// this is @system: the callee gives no guarantees about `data`
}
As we can see, certain operations, like taking the address of a local (or slicing of a fixed-size array, which is equivalent), no longer need to be @system per se. Instead, it's what is done with the resulting reference that decides whether it's @system or @safe.
Implicit scope and opt-out
To reduce the need for manual annotations, @safe functions take all their reference parameters as scope. ref also implies scope, and the this parameter is always passed as scope unless opted out. Because sometimes a @safe function may actually want to accept non-scope params, there is an opt-out in the form of static. Coupled with scope inference for templates, and an optional change like "@safe by default" (currently being discussed), this will get rid of most explicit scope annotations:
void doSomething(int[] data) @safe;
// equivalent to:
void doSomething(scope int[] data) @safe;
void foo(int[] input, static int* output) @safe;
// `input` is scope, `output` isn't
void bar(ref MyStruct s) @safe;
// equivalent to:
void bar(scope ref MyStruct s) @safe;
scope for value types & overloading
scope applies to all types with indirections: pointers, slices, class references, ref parameters, delegates, and aggregates containing such. Functions and methods can be overloaded on scope. This allows efficient passing of RC wrappers for instance:
struct RC(T) if(is(T == class)) {
// ...
this(scope this) {
// increment refcount
count++;
}
~this() {
// decrement refcount
if(--count == 0)
destroy(payload);
}
this(scope this) scope {
// DON'T increment refcount
}
~this() scope {
// DON'T decrement refcount
}
// magic, to be explained later
alias borrow this;
}
void foo(scope MyClass object);
RC!MyClass global;
void bar(scope RC!MyClass object) {
if(some_condition)
global = object; // make a copy, adjust refcount
}
void main() {
RC!MyClass x = ...;
// auto conversion to MyClass, no refcount update:
foo(x);
// no refcount update at call site,
// no needless double indirection with `ref`:
bar(x);
}
All of this can be implemented in user code or in the standard library. The compiler doesn't need to be aware of reference counting.
Implicit conversions
A scope parameter doesn't care how the data it refers to has been allocated. All it requires is that the reference stays valid for the duration of the function call. Therefore, it's a perfect fit for library functions. They don't need to be templated to support different resource management strategies of the library's user. It acts as a bridge between different types of strategies, just like const acts as a bridge between mutable and immutable data.
// no template bloat, no knowledge about RC etc.:
double computeAverage(scope int[] data);
void main() {
int[20] local = [1,2,3,...];
writeln(computeAverage(local)); // OK
int[] heap = ...;
writeln(computeAverage(heap)); // OK
RC!(int[]) rc = ...;
writeln(computeAverage(rc)); // OK
}
This is achieved by allowing non-scoped types to convert to scope implicitly. For builtin references, the language does this automatically. User-defined types must opt in by defining an appropriate alias this.
Returning scoped parameters
Some functions want to return a parameter that is passed in, or something reachable through one, e.g. a member of this. They can express this by annotating the parameter with the keyword return, just as in DIP25. To specify that the value is returned through another parameter, the return!ident syntax can be used. These annotations can be used multiple times per parameter, when the reference can be returned through several other parameters:
struct RC(T) if(is(T == class)) {
scope T payload;
T borrow() return { // `return` applies to `this`
return payload;
}
}
To prevent accidental access to a member (e.g. payload in the above example), the member can be annotated with scope. The compiler will then treat it as if it we're alway accessed through an appropriately annotated property that returns a (scoped) reference to it.
The compiler will make sure the returned value is not used in any way that is un-@safe. In particular, it will verify that the returned references' lifetimes won't exceed the lifetimes of the arguments they're coming from.
scope inference
For templates and nested functions, the compiler can infer the scope annotations, just as it infers purity and @safe-ty. Generic code therefore rarely needs any explicit annotations:
T* foo(T)(T* a, T* b) {
static T* cache;
cache = b;
return a;
}
// `foo!int` will be inferred as:
int* foo_int(scope int* a return, static int* b);
// (`static` is the default anyway, only here for clarity)
Multiple indirections
Multiple indirections are also handled in a way that preserves the guarantees about lifetimes. Because scope is not a type modifier, it cannot encode information about the lifetimes of objects behinds multiple indirections. Therefore, the compiler must be conservative. For the left-hand side of assignments, it must assume that the destination has infinite lifetime, while for the right-hand side, it must assume that the destination will vanish as soon as the reference through which it is accessed goes out of scope.
@safe-ty violations with borrowing
When borrowing is combined with explicit, non lexical-scope based memory management (of which reference counting is one form), there will inevitably be problems as the one discussed in this forum thread. To deal with them in a safe way requires some kind of data flow and aliasing analysis. Rust is an example of a language that uses very sophisticated analysis algorithms for that. This proposal will include a simplified algorithm to detect potentially unsafe uses at compile time, at the cost of detecting some false positives, for which there will however be workarounds. Instead of disallowing these operations, they will be treated as @system. It is therefore up to the end user to decide how to deal with them.
Details
scope
is a storage class. It applies to function parameters (including this
), local variables, the return value (treated as if it were an out
parameter), and member variables of aggregates. It participates in overloading.
With every variable, a particular lifetime (called scope) is associated.
It usually refers to a local variable or parameter; additionally, an infinite scope is defined that corresponds to global/static variables or GC managed memory.
Scopes and lifetimes are defined purely based on lexical scope and order of declaration of their corresponding variables. Therefore, for any two lifetimes, one is either completely contained in the other, or they are disjoint.
By annotating a variable with scope
, it's scope is defined to be equal to the variable's lifetime, instead of the default (infinity).
For any expression involving at least one scope value, two lifetimes (LHS lifetime and RHS lifetime) are computed in a way that ensures that the resulting RHS lifetime will not be greater than that of any of the expression's parts, and the LHS lifetime will not be smaller.
The exact rules will be defined in one of the following sections.
An assignment (i.e., =
operator, passing to a function, returning from a function, capturing in a closure, throwing, etc.) involving scope values is only permissible, if the destination's LHS lifetime is fully contained in the source's RHS lifetime.
Throwing is considered assignment to a variable with static lifetime.
The following invariant is enforced to always be true on every assignment: A location with scope a will never contain references pointing to values with a lifetime shorter than a.
To allow a function to return a value it received as a parameter, the parameter can be annotated with the return
keyword,
as in DIP25.
It's also possible to express that a parameter escapes through another parameter (including this
) by using the return!identifier
syntax.
Multiple such annotations can appear for each parameter.
When a function is called, the compiler checks (for each argument and the return value) that only expressions with a lifetime longer than those of all the corresponding return
annotations are passed in, and that the return value is used in a conforming way.
Because all relevant information about lifetimes is contained in the function signature, no explicit scope
annotations for local variables
are necessary; the compiler can figure them out by itself.
Additionally, inference of annotations is done in the usual situations, i.e. nested functions and templates.
In a subsequent section, an algorithm is presented that can be used for inference of scopes of local variables as well as annotations of parameters.
In parameters of @safe
functions, all reference types (class references, pointers, slices, ref
parameters) or aggregates containing references are implicitly treated as scope
unless the parameter is annotated with the keyword static
. This does however not apply to the return value.
ref
and out
parameters can also be treated as implicitly scoped, but this has the potential to break lots of code and needs to be considered carefully.
A scope
annotation on a member variable shall be equivalent to a @property
function returning a reference to that member, scoped to this
.
Borrowing is the term used for assignment of a value to a variable with narrower scope. This typically happens on function calls, when a value is passed as an argument to a parameter annotated with (or inferred as) scope
.
Examples
RCArray
Walter's RCArray
, adjusted to this proposal:
@safe:
struct RCArray(E) {
@disable this();
this(E[] a)
{
array = a.dup;
count = new int;
*count = 1;
}
~this() @trusted
{
if (--*count == 0)
{
delete count;
// either `delete` in `@system` code will accept `scope`:
delete array;
// or a helper needs to be used to remove `scope`:
delete assumeStatic(array);
}
}
this(this)
{
++*count;
}
@property size_t length()
{
return array.length;
}
ref E opIndex(size_t i)
{
return array[i];
}
E[] opSlice(size_t lwr, size_t upr)
{
return array[lwr .. upr];
}
E[] opSlice()
{
return array[];
}
private:
scope E[] array; // this is the only explicit annotation
// enforces treatment as `scope`, to avoid
// accidental access as unscoped
int* count;
}
Implementation
Scope inference
This algorithm works at the function level. Scope inference for variables and parameters in one function is independent from all other functions.
It takes as input a list of variables whose scope is already fixed (by explicit annotations) and another list whose scopes are to be inferred. It will choose the narrowest possible scopes for them for which the function will still compile. This is based on the observation that variables that are read from need to have a scope at least as large as their destination. Therefore, we can start with the smallest possible scope, the lifetime of the variable itself, and extend it to the scope of the destination, if it isn't already larger.
1. Let Q
be a list of all variables whose scopes are to be inferred. This includes template function parameters not otherwise annotated and all local variables.
2. Assign all elements of Q
an initial scope equivalent to their own lifetime:
foreach(var; Q) {
var.scope := [var];
}
3. For each ASSIGNMENT
whose RHS_SCOPE
depends on a variable in Q
, expand that variable's scope to at least the LHS_SCOPE
. For all variables the LHS_SCOPE
depends on and that are in Q
, record a dependency:
foreach(ass; ASSIGNMENTS) {
if(ass.rhs_scope.depends_on(Q)) {
foreach(rhs_var; ass.rhs_scope.vars) {
if(not rhs_var in Q)
continue;
foreach(lhs_var; ass.lhs_scope.vars) {
rhs_var.scope |= ass.lhs_scope;
if(lhs_var in Q)
rhs_var.deps ~= lhs_var;
}
}
}
}
4. Remove all variables from Q
that have no dependencies:
foreach(var; Q) {
if(var.deps.empty)
Q.remove(var);
}
5. If Q
is empty, terminate, else remember length of Q
:
if(Q.empty)
return;
old_Q_len := Q.length;
6. Expand all variables' scopes to at least that of their dependencies; if a dependency has no dependencies itself, remove it from the variable's dependencies:
foreach(var; Q) {
foreach(dep; var.deps) {
var.scope |= dep.scope;
if(dep.deps.empty)
var.deps.remove(dep);
}
}
7. If the length changed, we made progress. We can repeat from step 4. Otherwise we have a dependency loop. Find a cycle (for example using Tarjan's algorithm). Collect all elements in the cycle, remove their dependencies from DEPENDENCIES
, and assign them all the union of their scopes:
if(Q.length != old_Q_len)
goto step4;
cycle := tarjan(DEPENDENCIES);
new_scope := []
foreach(var; cycle) {
new_scope |= var.scope;
var.deps.remove_each(cycle);
}
foreach(var; cycle) {
var.scope := new_scope;
}
8. Go to step 4.
At this point, each variable will have a scope assigned. Now, all assignment can be checked to verify that they never place a reference in a location that outlives the reference's target.