User:Schuetzm/scope3

From D Wiki
Jump to: navigation, search

Introduction

The current D language specification reserves the scope keyword in function signatures to specify that a parameter will not be escaped by the function, making it @safe to pass references to local variables or manually managed memory to it, among other things. This feature is currently unimplemented, apart from its use with lambdas where it guarantees the closure will be allocated on the stack instead of the GC. This proposal intends to change that. It will allow the safe and efficient implementation of various memory management strategies (including reference counting), as well as unified handling of references to GC, reference counted data, local variables, containers, and others.

The proposal is mostly a superset of DIP25, but is generalized to all types of references and adds inference to alleviate the need for explicit annotations. Credits are due to the authors of that DIP, Andrei and Walter, then to Zach the Mystic who had the idea to generalize DIP25 as well as provided inspiration for the inference algorithm, deadalnix for his many valuable arguments, for example pointing out the intricacies of handling multiple indirections safely, and various other members of the community who provided useful contributions in past discussions in the news groups.

Overview

Basics

scope is a storage class; it will only be applicable to parameters in function signatures (which include the implicit this parameter for methods, as well as the context pointer for delegates). It will have the semantics one expects: when a function with a scope parameter returns, the corresponding argument will not have been stored in a global variable or on the heap, etc:

void sendData(scope ubyte[] data);
void someOtherFunction(ubyte[] data);

void main() {
    ubyte[1024] chunkOfData = ...;
    sendData(chunkOfData);
    // this is @safe: no reference to the local has escaped
    someOtherFunction(chunkOfData);
    // this is @system: the callee gives no guarantees about the param
}

As we can see, certain operations, like taking the address of a local (or slicing of a fixed-size array, which is equivalent), no longer need to be @system per se. Instead, it's what is done with the resulting reference that decides whether it's @system or @safe.

scope for value types & overloading

scope applies to all types with indirections: pointers, slices, class references, ref parameters, delegates, and aggregates containing such. scope has no influence on overloading. However, a value passed as scope will not have its postblit or destructor called, it will only be bitblitted on copy and abandoned after use. This allows efficient passing of RC wrappers for instance:

struct RC(T) if(is(T == class)) {
    // ...
    this(this) {
        // increment refcount
        count++;
    }
    ~this() {
        // decrement refcount
        if(--count == 0)
            destroy(payload);
    }
    // magic, to be explained later
    T borrow() return {
        return payload;
    }
    alias borrow this;
}

void foo(scope MyClass object);

RC!MyClass global;
void bar(scope RC!MyClass object) {
    if(some_condition)
        global = object; // make a copy, postblit called
    // no destructor for `object` called
}

void main() {
    RC!MyClass x = ...;
    // auto conversion to MyClass, no postblit called:
    foo(x);
    // no refcount update at call site,
    // no needless double indirection with `ref`:
    bar(x);
}

Implicit conversions

A scope parameter doesn't care how the data it refers to has been allocated. All it requires is that the reference stays valid for the duration of the function call. Therefore, it's a perfect fit for library functions. They don't need to be templated to support different resource management strategies of the library's user. It acts as a bridge between different types of strategies, just like const acts as a bridge between mutable and immutable data.

// no template bloat, no knowledge about RC etc.:
double computeAverage(scope int[] data);

void main() {
    int[20] local = [1,2,3,...];
    writeln(computeAverage(local));    // OK
    int[] heap = ...;
    writeln(computeAverage(heap));     // OK
    RC!(int[]) rc = ...;
    writeln(computeAverage(rc));       // OK
}

This is achieved by allowing non-scoped types to convert to scope implicitly. For builtin references, the language does this automatically. User-defined types can define an appropriate alias this to convert implicitly to a member (e.g. payload).

Returning scoped parameters

Some functions want to return a parameter that is passed in, or something reachable through one, e.g. a member of this. They can express this by annotating the parameter with the keyword return, just as in DIP25:

struct RC(T) if(is(T == class)) {
    scope T payload;
    T borrow() return {    // `return` applies to `this`
        return payload;
    }
}

To specify that the value is returned through another parameter, the return!ident syntax can be used. If necessary, these annotations can be used multiple times per parameter, when the reference can be returned through several other parameters:

int* foo(
    scope int* input return return!output return!output2,
    int** output,
    out int* output2
);

To prevent accidental non-scoped access to a member (e.g. payload in the above example), the member can be annotated with scope. The compiler will then treat it as if it were always accessed through an appropriately annotated property that returns a (scoped) reference to it.

The compiler will make sure the returned value is not used in any way that is un-@safe. In particular, it will verify that the returned references' lifetimes won't exceed the lifetimes of the arguments they're coming from.

scope inference

For templates, nested functions and lambdas, the compiler will infer the scope annotations, just as it infers purity and @safe-ty. Generic code therefore rarely needs any explicit annotations:

T* foo(T)(T* a, T* b) {
    static T* cache;
    cache = b;
    return a;
}

// `foo!int` will be inferred as:
int* foo_int(scope int* a return, int* b);

Also, to keep the semantics of DIP25, return and ref imply scope.

Multiple indirections

Multiple indirections are also handled in a way that preserves the guarantees about lifetimes. Because scope is not a type modifier, it cannot encode information about the lifetimes of objects behind more than one indirection. Therefore, the compiler must be conservative. For the left-hand side of assignments, it must assume that the destination has infinite lifetime, while for the right-hand side, it must assume that the source will vanish as soon as the reference through which it is accessed goes out of scope. This implies that only references with infinite scope can be assigned through an indirection, while references read through an indirection get the same scope as the indirection itself. Violating these restrictions will result in @system code.

@safe-ty violations with borrowing

When borrowing is combined with explicit, non lexical-scope based memory management (of which reference counting is one form), there will inevitably be problems as the one discussed in this forum thread. To deal with them in a safe way requires some kind of data flow and aliasing analysis. Rust is an example of a language that uses very sophisticated analysis algorithms for that. This proposal will include a simplified algorithm to detect potentially unsafe uses at compile time, at the cost of detecting some false positives, for which there will however be workarounds. Instead of disallowing these operations, they will be treated as @system. It is therefore up to the end user to decide how to deal with them: they can make their code @trusted if they verify that it is indeed safe, but the compiler just can't know it, or they can rewrite it in a way that allows the compiler to proof the safety.

The operations that are potentially unsafe are:

  • borrowing from a mutable global variable
Global variables can be accessed and therefore be mutated from anywhere.
  • re-borrowing from a mutable variable to which another borrowed reference is currently accessible
A @safe function can then assume that its parameters don't alias in a dangerous way.

The rules are therefore:

  1. Borrowing from a global variable is always @system.
  2. Borrowing from a local variable marks that variable (the owner) as loaned.
    Taking the address of a ref parameter does not count as borrowing, because no new reference is created.
    Copying a borrowed value however increases the owner's loan count.
  3. When the last borrowed value to an owner has disappeared (goes out of scope), the owner no longer counts as loaned.
    Alternatively: after the last access to any of the borrowed values. (But destructors need to count as an access, too.)
  4. Any potential mutation of a loaned owner is @system.
    Passing a reference to a loaned owner to a function as a mutable parameter must be treated as potential mutation.
  5. A borrowed value can contain a reference to the owner. Therefore, if a borrowed value contains a mutable reference compatible with the owner's type, it must be treated as if it were the owner in the context of the above rules.
  6. Loops, gotos, and branches are treated conservatively; when a borrowing (or copy of a borrowed value) can potentially take place, it is assumed it does. To ease implementation, special restrictive rules can be added for goto.

Examples

No mutation while borrowed values exist:

void foo() @safe {
    RCArray!int arr = [0,1,2];
    {
        int* p = &arr[0];  // `arr` is now loaned
                           // scope inferrence takes care of `p`
        RCArray!int other;
        arr = other;       // opAssign takes mutable `this` (= loaned `arr`),
                           // therefore the assignment is @system
        writeln(*p);
    }
}

Available strategies when it happens (user decides which one is suited best):

// defer mutation until it's ok
void foo() @safe {
    RCArray!int arr = [0,1,2];
    {
        int* p = &arr[0];
        writeln(*p);
    }
    RCArray!int other;
    arr = other;           // now ok, `arr` is not loaned
}

// operate an a copy
// (like DIP77, but explicit and obvious)
void foo() @safe {
    RCArray!int arr = [0,1,2];
    auto tmp = arr;
    {
        int* p = &tmp[0];  // `tmp` is now loaned
                           // scope inferrence takes care of `p`
        RCArray!int other;
        arr = other;       // now ok, `arr` is not loaned
        writeln(*p);
    }
}

// verify manually that it's @safe
void foo() @trusted { ... }

@safe borrowing of multiple values:

void bar(scope int* p);
void foo() @safe {
    RCArray!int arr = [0,1,2];
    {
        int* p = &arr[0];  // `arr` is now loaned once
        int* q = &arr[1];  // `arr` is now loaned twice
        int* r = q;        // copy: `arr` is loaned three times
        int* s;
        if(halting_problem())
            s = q;         // conservatively assume branch is taken
                           // => `arr` is now loaned four times
        bar(p);            // @safe, because `arr` cannot possibly
                           // be mutated through `p`
        // `p`, `q` go out of scope
    }
    // `arr`'s loan count is now 0
    RCArray!int other;
    arr = other;           // now @safe, not loaned anymore
}

Alternative

As an alternative to making certain operations @system, values with outstanding loans can instead be treated as const. The user can then use casts to mutate such values on their own risk.

Terminology

reference
An object reference, a ref parameter, a pointer, a slice, or a delegate. Also, any aggregate containing at least one of those.
lifetime
The entire duration during which a variable exists. A variable comes into existence through constructions, and stops existing when it is destroyed. A variable's lifetime can be determined by the compiler (locals, temporaries), by the programmer through explicit manual memory management, or by some other mechanism (reference counting, tracing GC).
scope
The minimum guaranteed lifetime of a variable. The variable will exist as least as long as its scope indicates. Scopes are arranged in a hierarchy based on the lexical scope and order of declaration of their references. For any two scopes, one is either completely contained in the other, or they are disjoint.
borrowing
Storing a value that is scoped to an owner anywhere. This includes temporaries, and passing a reference to a function.
owner
The original variable a borrowed reference comes from.