DIP25

From D Wiki
Revision as of 00:10, 28 December 2014 by AndreiAlexandrescu (talk | contribs) (In a nutshell)
Jump to: navigation, search

DIP25: Sealed references

Title: Sealed references
DIP: 25
Version: 1
Status: Draft
Created: 2013-02-05
Last Modified: 2013-02-05
Author: Walter Bright and Andrei Alexandrescu
Links:

Abstract

D offers a number of features aimed at systems-level coding, such as unrestricted pointers, casting between integers and pointers, and the @system attribute. These means, combined with the other features of D, make it a complete and expressive language for systems-level tasks. On the other hand, economy of means should be exercised in defining such powerful but dangerous features. Most other features should offer good safety guarantees with little or no loss in efficiency or expressiveness. This proposal makes ref provide such a guarantee: with the proposed rules, it is impossible in safe code to have ref refer to a destroyed object. The restrictions introduced are not entirely backward compatible, but disallow code that is stylistically questionable and that can be easily replaced either with equivalent and clearer code.

In a nutshell

This DIP proposes that any ref parameter that a function received must be also annotated with inout. Example:

ref int fun(ref int a} { return a; } // ERROR
ref int fun(ref inout int a} { return a; } // FINE

Description

Currently, D has some provisions for avoiding dangling references:

ref int fun(int x) {
  return x; // Error: escaping reference to local variable x 
}

ref int gun() {
  int x;
  return x; // Error: escaping reference to local variable x 
}

However, this enforcement is shallow. The following code compiles and allows reads and writes through defunct stack locations, bypassing scoping and lifetime rules:

ref int identity(ref int x) {
  return x; // pass-through function that does nothing 
}

ref int fun(int x) {
  return identity(x); // escape the address of a parameter 
}

ref int gun() {
  int x;
  return identity(x); // escape the address of a local
}

struct S {
    int x;
    ref int get() { return x; }
}

ref int hun(S x) {
  return x.get; // escape the address of a part of a parameter 
}

ref int iun() {
  S s;
  return s.x; // see https://issues.dlang.org/show_bug.cgi?id=13902
}

ref int jun() {
  S s;
  return s.get; // escape the address of part of a local
}

ref int kun() {
  return S().get; // worst contender: escape the address of a part of an rvalue
}

The escape patterns are obvious in these simple examples that make all code available and use no recursion, and may be found automatically. The problem is that generally the compiler cannot see the body of identity or S.get(). We need to devise a method that derives enough information for safety analysis only given the function signatures, not their bodies.

This DIP devises rules that allow passing objects by reference down into functions, and return references up from functions, whilst disallowing cases such as the above when a reference passed up ends up referring to a deallocated temporary.

Enhancing inout

The main issue is typechecking functions that return a ref T. Those that attempt to return locals or parts thereof are already addressed directly, contingent to Issue 13902. The one case remaining is allowing a function returning ref T to return a parameter of type ref T, or a part of such a parameter.

The key is to distinguish legal from illegal cases. One simple but overly conservative option would be to simply disallow returning a ref parameter or part thereof. That makes identity impossible to implement, and as a consequence accessing elements of a container by reference becomes difficult or impossible to typecheck properly. Also, heap-allocated structures with deterministic destruction (e.g. reference counted) must insert member copies for all accesses.

Cases that should work include:

@safe ref int identity(ref int x) { 
    return x; // should work
}

@safe ref int fun(ref int input) {
    static int[42] data;
    return data[input]; // should work
}

@safe struct S {
    private int x;
    ref int get() { return x; } // should work 
}

This proposal promotes enhancing the charter of the inout qualifier to propagate the lifetime of a parameter to the return value of a function. With the proposed semantics, a function is disallowed to return a ref parameter of a part thereof UNLESS the parameter is also annotated with inout. Under the proposed semantics identity will be spelled as follows:

@safe ref int wrongIdentity(ref int x) { 
    return x; // ERROR! Cannot return a ref, please use "ref inout"
}
@safe ref int identity(ref inout int x) { 
    return x; // fine
}

Just by seeing the signature ref int identity(ref inout int x) the compiler assumes that the result of identity must have a shorter lifetime than x and typechecks callers accordingly. Example (given the previous definition of identity:

@safe ref int fun(ref inout int x) { 
    int a;
    return a; // ERROR per current language rules
    static int b;
    return b; // fine per current language rules
    return identity(a); // ERROR, this may escape the address of a local
    return x; // fine, propagate x's lifetime to output
    return identity(x); // fine, propagate x's lifetime through identity to the output
    return identity(identity(x)); // fine, propagate x's lifetime twice through identity to the output
}

Taking address

This proposal introduces a related restriction: taking the address of the following entities shall be disallowed, even in @system.

  • Parameters (either value or ref)
  • Stack-allocated locals.
  • Member variables of a struct if the struct is a parameter (either value or ref) or stack-allocated.
    • Note that using a pointer to a struct does allow taking the address of a member.
    • Also note that a struct that is part of a class object also allows address taking.
  • The result of functions that return ref.

This is because escaping pointers away from expressions is too dangerous and should be more explicit. The capability must still be present, otherwise very simple uses are not possible anymore. Consider:

bool parse1(ref double v) {
    // Use C's scanf
    return scanf("%f", &v) == 1; // Error: cannot take the address of v
}

double parse2() {
    // Use C's scanf, 2nd try
    double v;
    enforce(scanf("%f", &v) == 1); // Error: cannot take the address of v
    return v;
}

double parse3() {
    // Use C's scanf, 3rd try
    auto pv = new double;
    enforce(scanf("%f", pv) == 1); // Fine
    return *pv;
}

That would force many variables to exist on the heap even though it's easy to figure that the code is safe since the semantics of scanf is understood by the programmer. To address this issue, this proposal fosters introducing a standard function with the signature:

@system T* addressOf(ref T value);

The function returns the address of value and can only be used in @system or @trusted code. addressOf itself cannot use the & address-of operator because it's forbidden even in @system code. But there are many possible implementations, including escaping into C or assembler. One possible portable implementation is:

@system T* addressOf(ref T value) {
    static T* id(T* p) { return p; }
    auto pfun = cast(T* function(ref T)) id;
    return *pfun(value);
}

This relies on the fact that at binary level a ref parameter is passed as a pointer.

With this function available as part of the standard library, efficient code can be written that forwards to scanf without the compiler knowing its semantics:

@trusted bool parse1(ref double v) {
    // Use C's scanf
    return scanf("%f", addressOf(v)) == 1; // Fine
}

@trusted double parse2() {
    // Use C's scanf, 2nd try
    double v;
    enforce(scanf("%f", addressOf(v)) == 1); // Fine
    return v;
}

Note: Isn't replacing & with addressOf just shuffling? How does it mark an improvement?

Forbidding use of & against specific objects has two positive effects. First, it eliminates by design some thorny syntactic ambiguities discussed in DIP23. In the expression &fun or &expression.method, the & may apply to either the function/method itself or to the value returned by the function/method (which doesn't compile if the result is an rvalue, but does and is unsafe if the result is a ref). Forbidding the unsafe case leaves only one meaning for & in this context: take the address of the function or delegate. To get the address of the result, one would write addressof(fun) or addressof(expr.method), which has unsurprising syntax and semantics.

The second beneficial effect is that addressOf is annotated appropriately with @system and as such integrates naturally with the rest of the type system without a need to ascribe special rules and exceptions to &.

Copyright

This document has been placed in the Public Domain.