Difference between revisions of "DIP25"
m |
(→Description) |
||
Line 61: | Line 61: | ||
int x; | int x; | ||
return identity(x); // escape the address of a local | return identity(x); // escape the address of a local | ||
+ | } | ||
+ | |||
+ | struct S { | ||
+ | private int x; | ||
+ | ref int get() { return x; } | ||
+ | } | ||
+ | |||
+ | ref int hun(S x) { | ||
+ | return x.get; // escape the address of a part of a parameter | ||
+ | } | ||
+ | |||
+ | ref int iun() { | ||
+ | S s; | ||
+ | return s.get; // escape the address of part of a local | ||
+ | } | ||
+ | |||
+ | ref int jun() { | ||
+ | return S().get; // worst contender: escape the address of a part of an rvalue | ||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> |
Revision as of 23:12, 27 December 2014
Contents
DIP25: Sealed references
Title: | Sealed references |
---|---|
DIP: | 25 |
Version: | 1 |
Status: | Draft |
Created: | 2013-02-05 |
Last Modified: | 2013-02-05 |
Author: | Walter Bright and Andrei Alexandrescu |
Links: |
Abstract
D offers a number of features aimed at systems-level coding, such as unrestricted pointers, casting between integers and pointers, and the @system
attribute. These means, combined with the other features of D, make it a complete and expressive language for systems-level tasks. On the other hand, economy of means should be exercised in defining such powerful but dangerous features. Most other features should offer good safety guarantees with little or no loss in efficiency or expressiveness. This proposal makes ref
provide such a guarantee: with the proposed rules, it is impossible in safe code to have ref
refer to a destroyed object. The restrictions introduced are not entirely backward compatible, but disallow code that is stylistically questionable and that can be easily replaced either with equivalent and clearer code.
In a nutshell
Description
Currently, D has some provisions for avoiding dangling references:
ref int fun(int x) {
return x; // Error: escaping reference to local variable x
}
ref int gun() {
int x;
return x; // Error: escaping reference to local variable x
}
However, this enforcement is shallow. The following code compiles and allows reads and writes through defunct stack locations, bypassing scoping and lifetime rules:
ref int identity(ref int x) {
return x; // pass-through function that does nothing
}
ref int fun(int x) {
return identity(x); // escape the address of a parameter
}
ref int gun() {
int x;
return identity(x); // escape the address of a local
}
struct S {
private int x;
ref int get() { return x; }
}
ref int hun(S x) {
return x.get; // escape the address of a part of a parameter
}
ref int iun() {
S s;
return s.get; // escape the address of part of a local
}
ref int jun() {
return S().get; // worst contender: escape the address of a part of an rvalue
}
The escape pattern is obvious in this simple example with all code in sight, and may be found automatically. The problem is that generally the compiler cannot see the body of id
. We need to devise a method for compiling such functions separately.
We want to devise rules that allow us to pass objects by reference down into functions, and return references up from functions, while disallowing cases such as the above when a reference passed up ends up referring to a deallocated temporary.
Typechecking rules
The rules below discuss under what circumstances functions receiving and/or returning ref T
may be called, where T
is some arbitrary type. Let us also denote with ST
any struct
that has a non-static member variable of type T
.
- An invocation of a function that takes a parameter of type
ref T
may pass one of the following:- An lvalue of type
T
, including function arguments, array andstruct
members; - An incoming
ref T
parameter or a member of typeT
ofST
received asref ST
- The result of a function returning
ref T
, or a member ofST
returned asref ST
.
- An lvalue of type
- A function that returns a
ref T
may return one of the following:- A static lvalue of type
T
, including members of staticstruct
values; - A member variable of type
T
belonging to aclass
object; - A
ref T
parameter; - A member of type
T
ofST
that has been passed asref ST
into the function; - The invocation of a function
fun
returningref T
IFfun
does NOT take any parameters of typeT
orST
. - The invocation of a function
fun
returningref T
IF none offun
's parameters of typeref T
andref S
are bound to local variables.
- A static lvalue of type
Discussion and Examples
The rules allow unrestricted pass-down and conservatively restrict pass-up to avoid escaping values. Itemized discussion follows.
1.1 Regular lvalues can be passed down:
void fun(ref T);
struct S { int a; T b; }
void caller(T v1, S v2) {
static T v3;
T v4;
static S v5;
S v6;
// Fine: pass argument
fun(v1);
// Fine: pass member of argument
fun(v2.b);
// Fine: pass static lvalue
fun(v3);
// Fine: pass of stack variable
fun(v4);
// Fine: pass member of static struct
fun(v5.b);
// Fine: pass member of local struct
fun(v6.b);
}
1.2. This rule allows forwarding references transitively.
void fun(ref T);
struct S { int a; T b; }
void caller(ref T v1, ref S v2) {
// Fine: pass ref argument
fun(v1);
// Fine: pass member of ref argument
fun(v2.b);
}
1.3. This rule enables passing down references obtained from other function calls.
void fun(ref T);
ref T gun();
struct S { int a; T b; }
ref S hun();
void caller() {
// Fine: pass ref result
fun(gun());
// Fine: pass member of ref result
fun(hun().b);
}
2.1. Static lvalues can be returned:
struct S { int a; T b; }
static T v1;
static S v2;
ref T caller(bool condition) {
static T v3;
static S v4;
// Fine
if (condition) return fun(v1);
if (condition) return fun(v2);
if (condition) return fun(v3);
if (condition) return fun(v4);
}
2.2.Member variables of classes can be returned because they live on the garbage-collected heap:
class C { int a; T b; }
ref T caller() {
auto c = new C;
return c.b;
}
2.3. This rule allows returning back an incoming parameter, which in turn allows implementing the identity
function and idioms derived from it.
ref T fun(ref T v1) {
return v1;
}
2.4. As above, but for members of structs
.
struct S { int a; T b; }
ref T fun(ref S v1) {
return v1.b;
}
2.5. This allows to pass up the result of a function that has no chance at all to return a reference to a local.
// Assume T is not double or string
ref T fun(double, ref string);
struct S { int a; T b; }
ref S gun(double, ref string);
ref T caller(bool condition, ref T v1) {
string s = "asd";
if (condition) return fun(1, s);
return gun(1, s).b;
}
2.6. This is the most sophisticated rule. It allows passing up the result of a function while disallowing those cases in which the function may actually return a reference to a local.
ref T fun(T);
ref T gun(ref T);
struct S { int a; T b; }
ref S hun(S);
ref S iun(ref S);
ref T caller(bool condition, ref T v1, ref S v2, T v3, S v4) {
T v5;
S v6;
// Fine, no ref parameters
if (condition) return fun(v1);
if (condition) return fun(v2.b);
if (condition) return fun(v3);
if (condition) return fun(v4.b);
if (condition) return fun(v5);
if (condition) return fun(v6.b);
// Fine, bound to ref parameters
if (condition) return gun(v1);
if (condition) return gun(v2.b);
// Not fine, bound to locals
// if (condition) return gun(v3);
// if (condition) return gun(v4.b);
// if (condition) return gun(v5);
// if (condition) return gun(v6.b);
// Fine, no ref at all
if (condition) return hun(v2).b;
if (condition) return hun(v4).b;
if (condition) return hun(v6).b;
// Fine, ref bound to ref argument
if (condition) return iun(v2).b;
// Not fine, bound to locals
// if (condition) return iun(v4);
// if (condition) return iun(v6);
}
Member functions
The rules above apply to member functions as well, considering that the this
special parameter in a method belonging to type A
is passed as a ref A
parameter. This may cause problems with rvalue structs. (Currently, D allows calling a method against a struct rvalue.) Special rules concerning struct rvalues may be necessary.
Taking address
This proposal introduces a related restriction: taking the address of the following entities shall be disallowed, even in @system
.
- Parameters (either value or
ref
) - Stack-allocated locals.
- Member variables of a
struct
if thestruct
is a parameter (either value orref
) or stack-allocated.- Note that using a pointer to a
struct
does allow taking the address of a member. - Also note that a
struct
that is part of aclass
object also allows address taking.
- Note that using a pointer to a
- The result of functions that return
ref
.
This is because escaping pointers away from expressions is too dangerous and should be more explicit. The capability must still be present, otherwise very simple uses are not possible anymore. Consider:
bool parse1(ref double v) {
// Use C's scanf
return scanf("%f", &v) == 1; // Error: cannot take the address of v
}
double parse2() {
// Use C's scanf, 2nd try
double v;
enforce(scanf("%f", &v) == 1); // Error: cannot take the address of v
return v;
}
double parse3() {
// Use C's scanf, 3rd try
auto pv = new double;
enforce(scanf("%f", pv) == 1); // Fine
return *pv;
}
That would force many variables to exist on the heap even though it's easy to figure that the code is safe since the semantics of scanf
is understood by the programmer. To address this issue, this proposal fosters introducing a standard function with the signature:
@system T* addressOf(ref T value);
The function returns the address of value
and can only be used in @system
or @trusted
code. addressOf
itself cannot use the &
address-of operator because it's forbidden even in @system
code. But there are many possible implementations, including escaping into C or assembler. One possible portable implementation is:
@system T* addressOf(ref T value) {
static T* id(T* p) { return p; }
auto pfun = cast(T* function(ref T)) id;
return *pfun(value);
}
This relies on the fact that at binary level a ref
parameter is passed as a pointer.
With this function available as part of the standard library, efficient code can be written that forwards to scanf
without the compiler knowing its semantics:
@trusted bool parse1(ref double v) {
// Use C's scanf
return scanf("%f", addressOf(v)) == 1; // Fine
}
@trusted double parse2() {
// Use C's scanf, 2nd try
double v;
enforce(scanf("%f", addressOf(v)) == 1); // Fine
return v;
}
Note: Isn't replacing &
with addressOf
just shuffling? How does it mark an improvement?
Forbidding use of &
against specific objects has two positive effects. First, it eliminates by design some thorny syntactic ambiguities discussed in DIP23. In the expression &fun
or &expression.method
, the &
may apply to either the function/method itself or to the value returned by the function/method (which doesn't compile if the result is an rvalue, but does and is unsafe if the result is a ref
). Forbidding the unsafe case leaves only one meaning for &
in this context: take the address of the function or delegate. To get the address of the result, one would write addressof(fun)
or addressof(expr.method)
, which has unsurprising syntax and semantics.
The second beneficial effect is that addressOf
is annotated appropriately with @system
and as such integrates naturally with the rest of the type system without a need to ascribe special rules and exceptions to &
.
Copyright
This document has been placed in the Public Domain.