User talk:Schuetzm/scope2

From D Wiki
Jump to: navigation, search

Example for the inference algorithm

Dereferencing yields static scope

Note that this is depending on whether we're on the LHS or RHS of an assignment. The inference algorithm only assigns scopes from LHS to RHS; therefore we always use static. On the RHS, a dereference gets the scope of the reference it's been reached through, because we at least known that it can't have a shorter lifetime than that.

Dereferencing includes the dereference operator, implicit dereferencing of struct/union pointers, of class references, as well as indexing and slicing.

Applied to the function deadalnix used to demonstrate the rvalue/lvalue problem:

(1) foo() is @safe, making its param scoped:

void foo(scope int** a) {
    // no inference for `a` => self-owned
    // ("mystery scope" in your terms)
    // => SCOPE(a) := [a] (final)
    int** b;
    // => SCOPE(b) := [b] (final)
    b = a;
    // scope of `a` is fixed => do nothing
    int d;
    int* c;
    // => SCOPE(c) := [c] (incomplete)
    c = &d;
    *b = c;
    // assignment from `c`:
    // => SCOPE(c) |= SCOPE(*b) = [c] | [static] (final)
    // (dereferencing loses information => static)
    // ---------------------------------------------------
    // we now have:
    // SCOPE(a) = [a]
    // SCOPE(b) = [b]
    // SCOPE(c) = [static]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    int** b = a;        // [b]      := [a]         => OK
    int d;
    int* c = &d;        // [static] := [d]         => BAD
    *b = c;             // [static] := [static]    => OK
    // => invalid assignment
    // note how it even traces where the bad value comes from
}

(2) foo() is in a template, param scope gets inferred

void foo(T)(T** a) {
    // start with empty scope for params
    // => SCOPE(a) := [a] (incomplete)
    T** b;
    // start with empty scope for locals
    // => SCOPE(b) := [b] (final)
    b = a;
    // only access to `a`:
    // => SCOPE(a) |= SCOPE(b) = [a] | [b] = [a]
    T d;
    T* c;
    // => SCOPE(c) = [c] (incomplete)
    c = &d;
    *b = c;
    // assignment from `c`:
    // => SCOPE(c) |= SCOPE(*b) = [c] | [static] = [static]
    // ---------------------------------------------------
    // we now have:
    // SCOPE(a) = [a]
    // SCOPE(b) = [b]
    // SCOPE(c) = [static]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    int** b = a;        // [b]      := [a]        => OK
    int d;
    int* c = &d;        // [static] := [d]        => BAD
    *b = c;             // [static] := [static]   => OK
    // => invalid assignment
    // again, the culprit can be detected
}

(3) foo() takes unscoped pointer

void foo(int** a) {
    // no inference for `a` => static
    // => SCOPE(a) := [static] (final)
    int** b;
    // => SCOPE(b) := [b] (final)
    b = a;
    // scope of `a` is fixed => do nothing
    int d;
    int* c;
    // => SCOPE(c) := [c] (incomplete)
    c = &d;
    *b = c;
    // assignment from `c`:
    // => SCOPE(c) |= SCOPE(*b) = [c] | [static] (final)
    // ---------------------------------------------------
    // we now have:
    // SCOPE(a) = [static]
    // SCOPE(b) = [b]
    // SCOPE(c) = [static]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    int** b = a;        // [b]      := [static]    => OK
    int d;
    int* c = &d;        // [static] := [d]         => BAD
    *b = c;             // [static] := [static]    => OK
    // => invalid assignment
    // escape detection for locals works even without scope params
    // and in @system code
}

return inference

T[] findSubstring(T)(T[] haystack, T[] needle) {
    // find first occurrence of substring
    // naive implementation

    // => SCOPE(return)   := [return] (fixed)
    // [return] is a scope higher than all params
    // => SCOPE(haystack) := [haystack] (incomplete)
    // => SCOPE(needle)   := [needle] (final)

    for(int i = 0; i < haystack.length - needle.length; i++) {
        T[] sub;
        // => SCOPE(sub) := [sub] (incomplete)
        sub = haystack[i .. i + needle.length];
        // haystack and needle are read again; ignoring from now on
        // => SCOPE(haystack) |= SCOPE(sub)
        // => DEFER because SCOPE(sub) is incomplete
        if(sub == needle) {
            return haystack[i .. $];
            // => SCOPE(haystack) |= SCOPE(return) = [haystack] | [return] = [return]
        }
    }
    return null;
    // nothing to do here, `null` has fixed static scope
    // ---------------------------------------------------
    // we now have:
    // SCOPE(haystack) = [return]
    // SCOPE(needle)   = [needle]
    // SCOPE(sub)      = [sub]
    // unresolved:
    // SCOPE(haystack) |= SCOPE(sub)
    // resolve:
    // SCOPE(haystack) := [return] | [sub] = [return]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    for(int i = 0; i < haystack.length - needle.length; i++) {
        T[] sub;
        sub = haystack[i .. i + needle.length];
        // [sub] := [return]    => OK
        if(sub == needle) {
            // equivalent to: if(opEquals(sub, needle))
            // let's assume `opEquals` takes both params by scope
            // sub:    [] := [sub]       => OK
            // needle: [] := [needle]    => OK
            return haystack[i .. $];
            // [return] := [return]    => OK
        }
    }
    return null;
    // [return] := [static]    => OK
}

// the inferred signature for T == immutable(char) is then
string findSubstring(scope string haystack return, scope string needle);

Multiple return params:

T chooseStringAtRandom(T)(T a, T b) {
    // => SCOPE(a) = [a] (incomplete)
    // => SCOPE(b) = [a] (incomplete)
    return random() % 2 == 0 ? a : b;
    // => SCOPE(a) |= [a] = [a]
    // => SCOPE(b) |= [b] = [b]
    // ?: operator affects both `a` and `b`
    // => SCOPE(a) |= SCOPE(return) = [a] | [return] = [return]
    // => SCOPE(b) |= SCOPE(return) = [b] | [return] = [return]
    // ---------------------------------------------------
    // we now have:
    // SCOPE(a) = [return]
    // SCOPE(b) = [return]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    return random() % 2 == 0 ? a : b;
    // RHS SCOPE(cond ? a : b) = MIN(SCOPE(a), SCOPE(b)) =
    //                         = MIN([return], [return]) = [return]
    // [return] := [return]    => OK
}

// the inferred signature for T == string is then
string chooseStringAtRandom(scope string a return, scope string b return);

Calling functions

string global_string;
void foo() {
    string[13] text = "Hello, world!";
    // => nothing, string[13] has no indirections
    string a;
    // => SCOPE(a) := [a] (incomplete)
    a = findSubstring(text, "world");
    // equivalent to: ... findSubstring(text[], ...)
    // text[] is returned, but SCOPE(text[]) is fixed
    // => no inference
    global_string = findSubstring(text, "world");
    // ditto
    string[5] s1 = "Hello";
    string b;
    // => SCOPE(b) := [b] (final)
    b = chooseStringAtRandom(s1, a);
    // SCOPE(s1[]) = [s1] (final)
    // `a` is returned and stored in `b`
    // => SCOPE(a) |= SCOPE(b) = [a] | [b] = [a] (final)
    // ---------------------------------------------------
    // we now have:
    // SCOPE(a) = [a]
    // SCOPE(b) = [b]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    string[13] text = "Hello, world!";
    string a;
    a = findSubstring(text, "world");
    // equivalent to: ... findSubstring(text[], ...)
    // findSubstring takes both params as scope
    // text[]:  []  := [text]      => OK
    // "world": []  := [static]    => OK
    // findSubstring returns haystack
    // => equivalent to `a = text[]`
    // return:  [a] := [text]      => OK
    global_string = findSubstring(text, "world");
    // [static] := [text]    => BAD
    string[5] s1 = "Hello";
    string b;
    b = chooseStringAtRandom(s1, a);
    // [b] := MIN([s1], [a]) = [s1]    => // OK
    // ---------------------------------------------------
    // => invalid assignment to global_string
}

Wrappers

string findSubstring(scope string haystack return, scope string needle);

auto trace(alias func, Args...)(Args args) {
    import std.conv : to;
    writeln("Calling " ~ func.stringof ~ "(" ~ args.to!string ~ ")");
    return func(args);
}
 
auto s = trace!findSubstring(haystack, needle);
 
// expands to (scope/return not yet in the signature):

string trace_findSubstring(string haystack, string needle) {
    // => SCOPE(haystack) = [haystack] (incomplete)
    // => SCOPE(needle)   = [needle] (final)
    import std.conv : to;
    // ignoring the writeln(), works as usual
    writeln("Calling findSubstring(...)");
    return findSubstring(haystack, needle);
    // `haystack` is assigned to the return value
    // => SCOPE(haystack) |= [return] = [haystack] | [return] = [return] (final)
    // ---------------------------------------------------
    // we now have:
    // SCOPE(haystack) = [return]
    // SCOPE(needle)   = [needle]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    return findSubstring(haystack, needle);
    // haystack: []       := [return]                      => OK
    // needle:   []       := [needle]                      => OK
    // return:   [return] := SCOPE(haystack) = [return]    => OK
}

// inferred annotations:
string trace_findSubstring(scope string haystack return, scope string needle);
auto filter(alias pred, T)(T input) {
    static struct FilterImpl {
        private scope T _input;
        private void skip() {
            while(!_input.empty && !pred(_input.front))
                _input.popFront();
        }
        @property ElementType!T front() {
            skip();
            return _input.front;
        }
        @property bool empty() {
            skip();
            return _input.empty;
        }
        void popFront() {
            _input.popFront();
        }
    }
    return FilterImpl(input);
}

alias odd = filter!(n => n % 2, int[]);

// becomes (without annotations):

auto odd_filter(int[] input) {
    static struct odd_FilterImpl {
        private scope int[] _input;
        // scope member is equivalent to:
        private @property ref prop_input() return { return _input; }
        private void skip() {
            // `empty`, `front` and `popFront` are template functions
            // => their scope/return annotations are inferred as needed
            // => they take _input as scope
            while(!_input.empty && !(_input.front % 2))
                _input.popFront();
        }
        @property int front() {
            // can be inferred to take `this` as scope
            skip();
            return _input.front;
        }
        @property bool empty() {
            // ditto
            skip();
            return _input.empty;
        }
        void popFront() {
            // ditto
            _input.popFront();
        }
    }
    return odd_FilterImpl(input);
    // equivalent to:
    odd_FilterImpl tmp;
    tmp._input = input;
    return tmp;
    // equivalent to:
    // => SCOPE(input) = [input]
    odd_FilterImpl tmp;
    // => SCOPE(tmp) = [tmp]
    ref tmp2 = tmp.prop_input;
    // => SCOPE(tmp2) = [tmp2]
    // call `tmp.prop_input`:
    // (`this` is returned and assigned to `tmp2`)
    // => SCOPE(tmp) |= SCOPE(tmp2)
    // => DEFER because tmp2 is incomplete
    // `tmp2` is a ref => assume it will be written to
    // => SCOPE(tmp2) |= SCOPE(tmp)
    // => DEFER because tmp is incomplete
    tmp2 = input;
    // `input` is assigned to `tmp2`
    // => SCOPE(input) |= SCOPE(tmp2)
    // => DEFER because tmp2 is incomplete
    return tmp;
    // => SCOPE(tmp) |= [return] = [return] (incomplete)
    //    doesn't need to be deferred, because `|` is commutative
    // ---------------------------------------------------
    // we now have:
    // SCOPE(input) = [input]
    // SCOPE(tmp)   = [return]
    // SCOPE(tmp2)  = [tmp2]
    // unresolved:
    // SCOPE(tmp)   |= SCOPE(tmp2)
    // SCOPE(tmp2)  |= SCOPE(tmp)
    // SCOPE(input) |= SCOPE(tmp2)
    // resolve loop using union:
    // SCOPE(tmp)   := [return] | [tmp2]   = [return]
    // SCOPE(tmp2)  := [return] | [tmp2]   = [return]
    // unresolved:
    // SCOPE(input) |= SCOPE(tmp2)
    // resolve:
    // SCOPE(input) := [input]  | [return] = [return]
    // => all resolved
    // ---------------------------------------------------
    // => satisfiable, now let's check the assignments:

    odd_FilterImpl tmp;
    ref tmp2 = tmp.prop_input;
    // call (`tmp` is the implicit `this` argument):
    // []       := [return]    => OK
    // `tmp` is returned and assigned to `tmp2`
    // [return] := [return]    => OK
    tmp2 = input;
    // [return] := [return]    => OK
    return tmp;
    // [return] := [return]    => OK
}

// inferred signature:
// - SCOPE(input) is not [static]
//   => `input` can be `scope`
// - SCOPE(input) refers to [return]
//   => `input` is annotated with `return`
auto odd_filter(scope int[] input return);

Details

scope is a storage class. It applies to function parameters (including this), local variables, the return value (treated as if it were an out parameter), and member variables of aggregates. It participates in overloading.

With every variable, a particular lifetime (called scope) is associated. It usually refers to a local variable or parameter; additionally, an infinite scope is defined that corresponds to global/static variables or GC managed memory. Scopes and lifetimes are defined purely based on lexical scope and order of declaration of their corresponding variables. Therefore, for any two lifetimes, one is either completely contained in the other, or they are disjoint. By annotating a variable with scope, it's scope is defined to be equal to the variable's lifetime, instead of the default (infinity).

For any expression involving at least one scope value, two lifetimes (LHS lifetime and RHS lifetime) are computed in a way that ensures that the resulting RHS lifetime will not be greater than that of any of the expression's parts, and the LHS lifetime will not be smaller. The exact rules will be defined in one of the following sections. An assignment (i.e., = operator, passing to a function, returning from a function, capturing in a closure, throwing, etc.) involving scope values is only permissible, if the destination's LHS lifetime is fully contained in the source's RHS lifetime. Throwing is considered assignment to a variable with static lifetime.

The following invariant is enforced to always be true on every assignment: A location with scope a will never contain references pointing to values with a lifetime shorter than a.

To allow a function to return a value it received as a parameter, the parameter can be annotated with the return keyword, as in DIP25. It's also possible to express that a parameter escapes through another parameter (including this) by using the return!identifier syntax. Multiple such annotations can appear for each parameter. When a function is called, the compiler checks (for each argument and the return value) that only expressions with a lifetime longer than those of all the corresponding return annotations are passed in, and that the return value is used in a conforming way.

Because all relevant information about lifetimes is contained in the function signature, no explicit scope annotations for local variables are necessary; the compiler can figure them out by itself. Additionally, inference of annotations is done in the usual situations, i.e. nested functions and templates. In a subsequent section, an algorithm is presented that can be used for inference of scopes of local variables as well as annotations of parameters.

In parameters of @safe functions, all reference types (class references, pointers, slices, ref parameters) or aggregates containing references are implicitly treated as scope unless the parameter is annotated with the keyword static. This does however not apply to the return value.

ref and out parameters can also be treated as implicitly scoped, but this has the potential to break lots of code and needs to be considered carefully.

A scope annotation on a member variable shall be equivalent to a @property function returning a reference to that member, scoped to this.

Borrowing is the term used for assignment of a value to a variable with narrower scope. This typically happens on function calls, when a value is passed as an argument to a parameter annotated with (or inferred as) scope.

Examples

RCArray

Walter's RCArray, adjusted to this proposal:

@safe:

struct RCArray(E) {
    @disable this();

    this(E[] a)
    {
        array = a.dup;
        count = new int;
        *count = 1;
    }

    ~this() @trusted
    {
        if (--*count == 0)
        {
            delete count;
            // either `delete` in `@system` code will accept `scope`:
            delete array;
            // or a helper needs to be used to remove `scope`:
            delete assumeStatic(array);
        }
    }

    this(this)
    {
        ++*count;
    }

    @property size_t length()
    {
        return array.length;
    }

    ref E opIndex(size_t i)
    {
        return array[i];
    }

    E[] opSlice(size_t lwr, size_t upr)
    {
        return array[lwr .. upr];
    }

    E[] opSlice()
    {
        return array[];
    }

private:
    scope E[] array;    // this is the only explicit annotation
                        // enforces treatment as `scope`, to avoid
                        // accidental access as unscoped
    int* count;
}

Implementation

Scope inference

This algorithm works at the function level. Scope inference for variables and parameters in one function is independent from all other functions.

It takes as input a list of variables whose scope is already fixed (by explicit annotations) and another list whose scopes are to be inferred. It will choose the narrowest possible scopes for them for which the function will still compile. This is based on the observation that variables that are read from need to have a scope at least as large as their destination. Therefore, we can start with the smallest possible scope, the lifetime of the variable itself, and extend it to the scope of the destination, if it isn't already larger.


1. Let Q be a list of all variables whose scopes are to be inferred. This includes template function parameters not otherwise annotated and all local variables.

2. Assign all elements of Q an initial scope equivalent to their own lifetime:

    foreach(var; Q) {
        var.scope := [var];
    }

3. For each ASSIGNMENT whose RHS_SCOPE depends on a variable in Q, expand that variable's scope to at least the LHS_SCOPE. For all variables the LHS_SCOPE depends on and that are in Q, record a dependency:

    foreach(ass; ASSIGNMENTS) {
        if(ass.rhs_scope.depends_on(Q)) {
            foreach(rhs_var; ass.rhs_scope.vars) {
                if(not rhs_var in Q)
                    continue;
                foreach(lhs_var; ass.lhs_scope.vars) {
                    rhs_var.scope |= ass.lhs_scope;
                    if(lhs_var in Q)
                        rhs_var.deps ~= lhs_var;
                }
            }
        }
    }

4. Remove all variables from Q that have no dependencies:

    foreach(var; Q) {
        if(var.deps.empty)
            Q.remove(var);
    }

5. If Q is empty, terminate, else remember length of Q:

    if(Q.empty)
        return;
    old_Q_len := Q.length;

6. Expand all variables' scopes to at least that of their dependencies; if a dependency has no dependencies itself, remove it from the variable's dependencies:

    foreach(var; Q) {
        foreach(dep; var.deps) {
            var.scope |= dep.scope;
            if(dep.deps.empty)
                var.deps.remove(dep);
        }
    }

7. If the length changed, we made progress. We can repeat from step 4. Otherwise we have a dependency loop. Find a cycle (for example using Tarjan's algorithm). Collect all elements in the cycle, remove their dependencies from DEPENDENCIES, and assign them all the union of their scopes:

    if(Q.length != old_Q_len)
        goto step4;
    cycle := tarjan(DEPENDENCIES);
    new_scope := []
    foreach(var; cycle) {
        new_scope |= var.scope;
        var.deps.remove_each(cycle);
    }
    foreach(var; cycle) {
        var.scope := new_scope;
    }

8. Go to step 4.


At this point, each variable will have a scope assigned. Now, all assignment can be checked to verify that they never place a reference in a location that outlives the reference's target.