Difference between revisions of "User:Schuetzm/scope"
(→Phobos & libraries) |
(→Automatic borrowing for pure functions) |
||
Line 251: | Line 251: | ||
</source> | </source> | ||
− | + | This is a nice opportunity to reduce the need for explicit annotations, which are seen as burdensome by the users. | |
=== Allow references to locals in '''@safe''' code === | === Allow references to locals in '''@safe''' code === |
Revision as of 13:31, 26 July 2014
Contents
What's this about?
Taking addresses of local variables is currently not allowed @safe code. [1] This is, however, a very broad restriction, that disallows many useful idioms. Examples include slicing of local arrays, passing local structures by reference for efficiency reasons, out parameters, and allocators with limited lifetimes. Additionally, GC avoidance techniques like reference counting and unique/owned objects need some kind of borrowing to work efficiently, to avoid the costs of reference incrementing/decrementing or move semantics, while at the same time still being provably memory safe.
The language already designates the scope concept for that purpose, but as of today it is unimplemented, and the original concept is generally seen as insufficient. This proposal intends to extend the design of scope and define it's semantics to be usable for the above-mentioned purposes.
Lifetimes
An important concept is that of the lifetime. Any object exists for a specific period of time, with a well defined beginning and end point: from the point it is created (constructed), to the point it is released. A reference that points to an object whose lifetime has ended is a dangling reference. Use of such references can cause all kinds of errors, and must therefore be prevented.
Because the lifetimes of actual manually managed objects are complex and unpredictable, a different concept of lifetime is hereby introduced, that only applies to named variables and is based purely on their lexical scope and order of declaration. By the following rules, a hierarchy of lifetimes is defined:
- A variable's lifetime starts at the point of its declaration, and ends with the lexical scope it is defined in.
- An (rvalue) expression's lifetime is temporary; it lives till the end of the statement that it appears in. (FIXME: provide a reference)
- The lifetime of A is higher than that of B, if A appears in a higher scope than B, or if both appear in the same scope, but A comes lexically before B. This matches the order of destruction of local variables.
- The lifetime of a function parameter is higher than that of that function's local variables, but lower than any variables in higher scopes. (FIXME: relative lifetimes among function parameters == order of destruction)
Ownership and borrowing
Taking a reference to a variable is called borrowing. The original variable is called the owner of the reference.
In @safe code, taking a reference to a local variable will be allowed, but the type of the resulting reference will contain information about the owner which is used by the compiler to decide whether assigning the reference to a particular variable is permissible or not. Assignment in this context means copying the reference, be it by assignment to another variable, passing it to a function, returning it from a function, or by throwing it as an exception. In general, scoped references may only be assigned to variables with lower (= shorter) lifetimes than their lifetime owner.
These restrictions apply only to reference types, i.e. pointers, slices, classes, and ref or out parameters. Non-reference types may be freely copied, because no memory-safety issues can arise.
For this purpose, scope needs to be changed into a type modifier (currently it is a storage class[2]), and the appropriate changes to the syntax need to be made, as detailed in the following sections.
Bare scope
This is the "normal", usual syntax, which will be used in most cases. Examples:
@safe:
int global_var;
void bar(scope(int*) input);
void foo() {
scope(int*) a;
a = &global_var; // OK, `global_var` has higher lifetime than `a`
scope b = &global_var; // OK, type deduction
int c;
if(...) {
scope x = a; // OK, copy of reference,`x` has shorter lifetime than `a`
scope y = &c; // OK, borrowing
int z;
b = &z; // ERROR: `b` will outlive `z`
int* d = a; // ERROR: `d` is unscoped, but `a` is scoped
}
bar(a); // OK, scoped reference is passed to scoped parameter
bar(&c); // OK, borrowing
int* e;
a = e; // OK, implicit conversion to '''scope'''
}
Because we don't know anything about the owner of a bare scoped reference, we have to assume its lifetime to be the same as the reference itself. It can effectively be treated as "self-owned".
scope with owner(s)
Bare scope is already quite powerful, but for certain things, it still cannot be used. Typical examples are haystack-needle type functions that are passed in slices as input (the haystack and the needle), and return a slice of the haystack:
string findSubstring(scope(string) haystack, scope(string) needle) {
// ... do the actual searching ...
return haystack[found .. $]; // ERROR: needs to return `string`, not `scope(string)`
}
Changing the return type to `scope(string)` doesn't help, because it is a temporary that could not be stored anywhere, as temporaries' lifetimes are restricted to the current statement. Therefore, returning a bare scope(T) from a function doesn't make sense, and is disallowed, in order to avoid having to specify its behaviour.
The solution is to specify the owner explicity:
scope!haystack(string) findSubstring(scope(string) haystack, scope(string) needle) {
// ... do the actual searching ...
return haystack[found .. $]; // OK
}
The signature of findSubstring says: "This function will return a reference to the same object that is referenced by the parameter haystack."
Multiple owners can be specified, too:
scope!(a, b)(string) chooseStringAtRandom(scope(string) a, scope(string) b) {
return random() % 2 == 0 ? a : b;
}
scope!(identifier1, identifier2, ...) means that assignment (with the meaning as above) from any of the listed identifiers is allowed, and also from anything that specifies all or a subset of the listed identifiers as its owners.
From the caller's point of view:
@safe:
string global_string;
void foo() {
string[$] text = "Hello, world!";
scope(string) a;
a = findSubstring(text, "world"); // OK
global_string = findSubstring(text, "world"); // ERROR
a = findSubstring("Literal world", "world"); // ERROR
string[$] s1 = "Hello", s2 = "world";
scope(string) b = chooseStringAtRandom(s1, s2); // OK
}
What happens at the call site is that the compiler matches up the owners in the parameter list that refer to other parameters, with the actual arguments. `scope!haystack` therefore gets turned into `scope!text`, as `text` is the variable that gets passed as the parameter `haystack`. When there are multiple owners in the return type, the compiler chooses the one whose matching argument has the shortest lifetime, because it needs to assume the worst case. In our example, this would be `scope!s2`, because `s2`, while being living in the same lexical scope as `s1`, is declared later, and is therefore destroyed earlier. For this purpose, for methods `this` is treated as a parameter, and can be referred to by `scope!this`.
ref and out parameters work analogously to return values.
An owner can be anything with a higher lifetime. Inside function signatures, the return value as well as ref and out are considered having higher lifetime than the other parameters, as they will outlive value parameters, and ref and out in turn have a higher lifetime than the return value.
Implicit conversion
Non-scoped references are implicitly convertible to scope. Also, any scoped reference is implicitly convertible to a scoped reference with an owner with a shorter lifetime. As described above, these checks are done on assignment of a scoped reference.
Owner tracking
All reference expressions (not only scoped ones) internally have a set of owners. This owner is propagated through the various operations that can be applied to a reference:
Operation | Owners |
---|---|
&a | [a] |
*a | owners specified in a's type, none if it's not a reference |
a + n | owners of a |
a[], a[n .. m] | owners of a |
x ? a : b | union of owners of a and b |
a | owners of a |
This effectively tracks the origin of a pointer in an expression. The internal owner set is, however, only used in type deduction (see next section). It does not mean that the expression is automatically treated as scoped, as this would be a breaking change. This could however be considered as an extension if @safe code.
Type deduction
The compiler sometimes infers the type of a variable automatically. This happens when the type is not specified in a variable declaration, when a functions return type is unspecified, and for templates (IFTI). All three of them work by determining the type of an expression: the right-hand side of an initialization, the returned value(s) in a function, or a value that is passed to a function.
Type deduction does never add scope or an owner by itself. It always starts with the exact same type of the expression. Only if scope is used in the declaration, it is added to the type. (This works in the same way as const.) In this case, it will also add the owner of the source expression:
int a = 42;
/* type deduction */
auto b = &a; // int*
scope c = &a; // scope!a(int*)
scope!b d = &a; // scope!(a,b)(int*)
/* no type deduction */
scope(int*) e = &a; // scope(int*)
/* analogously for functions */
auto foo(int* a) { // int*
return a;
}
scope foo(int* a) { // scope!a(int*)
return a;
}
No further scope attribute and owner inference in templates is done. It might be considered as an extension, but it needs to be specified carefully depending on how much flow control analysis we want to require compliant compilers to implement.
Transitivity
scope is not transitive. This means that references reachable through a scoped reference are not automatically scoped. There are multiple reasons for this:
- Ownership is usually defined by the implementer of a type. For example, a structure or class is written in a way that either assumes that it owns an embedded reference, and therefore needs to manage it by itself (be it via manual memory management, a unique type, reference counting, or the garbage collector), or it doesn't, and leaves management to its users. It is therefore not really possible for a type's user to change that decision. What the user can decide, however, is the ownership of instances of the type itself. Therefore, transitivity doesn't make any sense.
- Besides that, it would probably be really complicated to even define the exact semantics, let alone to implement them. For example, it might involves unions of disjunct lifetimes.
- There is also no known great advantage of making scope transitive. This is in contrast to const and shared, where we can prove many interesting properties about functions and types because of transitivity.
- Even if such an advantage should be discovered in the future, introspection will make it possible to detect whether a given type is actually transitively scoped, simply by following all the referenced types.
Optional enhancements
scope!(const ...)
An interesting extension of the concept of borrowing is constant borrowing: as long as there are borrowed references to a variable, that variable is treated as const. Casually spoken, "while you've given something away, you can not change it". The syntax would be as the above scope with owners, but with some of the identifiers prefixed by the keyword const.
To show why this can be useful, consider the as of yet unsolved problem of the so called transient ranges. [3][4] These are ranges which return references to data that can change when popFront() is called on them. The most well-known instance of those is std.stdio.File.byLine, which reuses the same internal buffer for each line that is read.
To my knowledge, only unsatisfying ad-hoc workarounds like byLineCopy and manually duping the individual lines have been proposed as of yet. But consider this:
struct ByLineImpl(Char, Terminator) {
private:
Char[] line;
// ...
public:
// - return value must not outlive `this` (i.e. the range)
// - as long as the return value exists, `this` will be const
@property scope!(const this)(Char[]) front() const {
return line;
}
void popFront() { // not `const`, of course
// ...
}
// ...
}
void main() {
alias Line = const(char)[];
auto byline = stdin.byLine();
foreach(line; byline) {
write(line); // OK, `write` takes its parameters as scope
// (assuming the widespread use of scope throughout Phobos)
}
Line[] lines;
foreach(line; byline) {
lines ~= line;
// ERROR: `line` has type scope!(const byline)(Line), not Line
}
// let's try to work around it:
scope!(const byline)(Line)[] clines;
foreach(line; byline) { // ERROR: `byline` is const
clines ~= line;
}
// => nope, won't work
// another example, to show how it works:
auto tmp = byline.front; // OK
// `byline` is const as long as `tmp` exists
write(byline.front); // OK, `front` is const
byline.popFront(); // ERROR: `byline` is const
}
Here, the compiler matches up `this` with `byline` as described above. `byline` is then treated as const for the lifetime of the variables that refer to the individual lines. In other words, as long as a variable with the type `scope!(const ident)` exists, `ident` will be const.
Automatic borrowing for pure functions
In many cases, bare scope suffices, and scope with owner is not needed. There is, however, an opportunity to allow borrowing without any explicit annotations at all, not even bare scope. When we're dealing with pure functions, we can sometimes guarantee that a borrowed reference cannot escape the function. pure functions cannot access global variables directly, so the only way a reference can come out of a pure function is by its return value or by one of its parameters. By looking at the function signature, we can in many cases proof that this cannot happen:
void foo(int[] p) pure; // the function has no opportunity to keep a reference to `p`
int bar(int[] p) pure; // returns an `int` but that's a value type, so it's ok
int[] baz(const(int)[] p) pure; // the return type is not `const` and thus cannot come from `p`
This is a nice opportunity to reduce the need for explicit annotations, which are seen as burdensome by the users.
Allow references to locals in @safe code
Currently, @safe code is not allowed to take the addresses of local variables or parameters. This restriction could be lifted. Instead, the address operator & would return a scoped expression with its operand as the owner. The scope attribute then guarantees that the resulting pointer will not be escaped or outlive its owner.
@safe functions can the use local buffers to avoid GC allocation (or manual memory management), without having to resort to @trusted code.
This change will not break existing code, because it only allows certain operations that were previously disallowed.
Implementation
Compiler
Implementation of this feature is possible without doing flow control or interprocedural analysis. All checks that need to be made can be done locally at the assignment or call site. The same is true for matching owners to parameters.
Phobos & libraries
For scope to be maximally useful, the functions in the standard library need to be annotated with scope wherever necessary. As much of Phobos consists of templates, this may however not be as much work as it seems at first glance, because attribute inference can do this automatically in many cases (especially because purity is already inferred for templates). For the remaining cases, type deduction can help, too.