Difference between revisions of "DIP39"

From D Wiki
Jump to: navigation, search
(Safety against return by ref/nonref code evolution)
(Safety against return by ref/nonref code evolution)
Line 98: Line 98:
  
 
== Safety against return by ref/nonref code evolution ==
 
== Safety against return by ref/nonref code evolution ==
As Andrei pointed out recently, introducing rvalue references must take into account the dangers of ref/nonref code evolution. Our solution is safe in this aspect:
+
As Andrei pointed out recently, introducing RV references must take into account the dangers of ref/nonref code evolution. Our solution is safe in this aspect:
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
 
struct A{ref T opIndex(int i){...}}
 
struct A{ref T opIndex(int i){...}}
void fix(ref T x){if (isnan(x)) x = 0; }
+
void funRef(ref T x){if (isnan(x)) x = 0; }
 
A a;
 
A a;
fix(a[0]);//ok since a[0] is an lvalue.
+
funRef(a[0]);//ok since a[0] is an LV.
//later on the code changes: opIndex returns by value due to implementation changes.
+
//later on the code changes: opIndex returns by value 'T opIndex(int i)'.
fix(a[0]);//now becomes an error since a[0] becomes an rvalue! so we're prevented from accepting code that silently does the wrong thing.
+
funRef(a[0]);//now becomes an error since a[0] becomes an RV! so we're prevented from accepting code that silently does the wrong thing.
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
Let us consider the reverse evolution from nonref to ref:
 +
<syntaxhighlight lang="d">
 +
struct A{ T opIndex(int i){...}}
 +
void funRef(ref T x){...} // some function which may modify x
 +
funRef(a[0]^); //ok, a temporary is created from the RV a[0].
 +
// later on, code changes: opIndex returns by ref 'ref T opIndex(int i)'.
 +
funRef(a[0]^); //still ok, a temporary copy of a[0] (which is now an LV) is still created.
 +
</syntaxhighlight>
 +
In this second example, the code behavior remains unchanged before and after the opIndex signature change, as in both cases a temporary copy is created.
 +
Note: thanks to Dmitry S for proposing to accept LV^, motivated by this example (the original proposal only accepted RV^).
  
 
== Also: Safety against another type of code evolution: proposal to error on ignored nonref return value ==
 
== Also: Safety against another type of code evolution: proposal to error on ignored nonref return value ==

Revision as of 22:56, 12 May 2013

DIP 39: Safe rvalue references: backwards compatible, safe against ref/nonref code evolution, compatible with UFCS and DIP38

Title: Safe rvalue references: backwards compatible, safe against ref/nonref code evolution, compatible with UFCS and DIP38.
DIP: 39
Version: 1
Status: Draft
Created: 2013-05-10
Last Modified: 2013-05-10
Author: Timothee Cour
Links:

Abstract

We propose to introduce rvalue references that are:

  • safe: guarantees memory safety so that references will always point to valid memory.
  • compatible with DIP38: can use same inref/outref internal compiler annotation for input references that can be returned by ref by a function.
  • backwards compatible: current valid D code will continue to work without change. In addition, additional code becomes valid with call site rvalue ref annotation.
  • safe against ref/nonref code evolution: call site rvalue ref compulsory annotation turns ref/nonref changes into compile errors instead of silently changing code behavior.
  • both const ref or ref can be used with rvalue refs (more flexible than C++)
  • no call site ref annotation when input ref argument is already an lvalue (different from C#), for backwards compatibility (and making it less verbose)
  • compatible with UFCS

Details

Let us introduce some notation:

  • LV : denotes an lvalue expression
  • RV denotes an rvalue expression
  • Expr denotes an expression (could be an LV or RV)

We introduce the following new notation:

  • fun(Expr^) : creates a temporary variable from Expr before passing it to a function that takes Expr by ref.

I propose the symbol '^' to denote this temporary creation (^ is also used for XOR in D but shouldn't create ambiguity as XOR is binary), although there are alternatives, see section: 'Alternative symbols for temporary creation'.

// Suppose we have a function:
T2 funRef(ref T a);
// We can use it as before with an LV (backwards compatible):
funRef(LV); 
// Our proposed new syntax also allows to call fun by creating a temporary from an expression:
funRef(Expr^);//create a temporary before calling funRef: 'auto _tmp=Expr; funRef(_tmp);'

The rule is simple: funRef(ref T a) can still only take an LV, and 'Expr^' or '(Expr)^' creates a temporary LV from an Expr. With funRef(ref T a):

  • funRef(LV); //ok: LV passed by ref
  • funRef(RV); //error
  • funRef(Expr^); // ok: 'auto _tmp=Expr; funRef(_tmp);'

with funNonRef(T a):

  • funNonRef(Expr); // ok
  • funNonRef(Expr^); // error

If passing an LV to a ref-taking function that involves an implicit conversion, then binding is disallowed without the call-site annotation; this is the LRL (Lvalue-Rvalue-Lvalue) problem pointed out by Andrei:

void fix(ref double x) { if (isnan(x)) x = 0; }
float a;
fix(a); // error due to mismatched types

With those rules, current valid code will stay valid, and new code becomes possible in a safe way.

Implementation details

The compiler will create a temporary whose lifetime shall survive the entire expression where Expr^ occurs:

expr ( funRef(Expr^)  )
//rewritten by compiler as:
auto _tmp=Expr;
expr ( funRef(_tmp)  );

UFCS

The rule for UFCS is the same:

with 'funRef(ref T a)':

  • LV.funRef(); //ok
  • RV.funRef(); //error
  • Expr^.funRef(); //ok

with 'funNonRef(T a)':

  • Expr.funNonRef(); //ok
  • Expr^.funNonRef(); //error

Safety against return by ref/nonref code evolution

As Andrei pointed out recently, introducing RV references must take into account the dangers of ref/nonref code evolution. Our solution is safe in this aspect:

struct A{ref T opIndex(int i){...}}
void funRef(ref T x){if (isnan(x)) x = 0; }
A a;
funRef(a[0]);//ok since a[0] is an LV.
//later on the code changes: opIndex returns by value 'T opIndex(int i)'.
funRef(a[0]);//now becomes an error since a[0] becomes an RV! so we're prevented from accepting code that silently does the wrong thing.

Let us consider the reverse evolution from nonref to ref:

struct A{ T opIndex(int i){...}}
void funRef(ref T x){...} // some function which may modify x
funRef(a[0]^); //ok, a temporary is created from the RV a[0]. 
// later on, code changes: opIndex returns by ref 'ref T opIndex(int i)'.
funRef(a[0]^); //still ok, a temporary copy of a[0] (which is now an LV) is still created.

In this second example, the code behavior remains unchanged before and after the opIndex signature change, as in both cases a temporary copy is created. Note: thanks to Dmitry S for proposing to accept LV^, motivated by this example (the original proposal only accepted RV^).

Also: Safety against another type of code evolution: proposal to error on ignored nonref return value

This proposal is independant of this DIP39 as it is independent of rvalue refs but I mention it here. Actually another problem that has nothing to do with rvalue refs can also be addressed:

//suppose now the 'fix' code changes to :
T fix(T x){if (isnan(x)) x = 0; return x;}
fix(a[0]); //compiles but does nothing, regardless of opIndex returning value or ref.

I propose to make it an error to ignore a nonref return value, and to add a function 'ignore' for convenience that consumes and does nothing:

void ignore(T)(T a){}
// can be used as:
fix(a[0]); // error: nonref return value of array is ignored
fix(a[0]).ignore; // ok (although not very useful here; but would be for ignoring, say, error codes)

That'll address this issue. This can be enabled with a compiler flag -error_ignored_nonref_return.

Note, ref returned values are safer to ignore as an object still persists on the stack, but we could also choose to error on those.

Safety

Memory safety would be the same as the current situation in D with same existing pitfalls and no new pitfalls introduced. In conjunction with DIP 38, memory safety would be guaranteed at compile time. With the one introduced in Dconf13, it would be guaranteed with a runtime check.

Alternative symbols for temporary cration

2 things to decide on : prefix or postfix annotation, and which annotation to use:

prefix vs postfix:

  • postfix fun(RV^): (proposed): compatible with left-to-right pipelines in D: [1,2].sort.map!fun.uniq
  • prefix fun(^RV): compatible with '&' location wrt RV argument

This can affect ease of disambiguation wrt existing symbols.

which annotation to use (regardless of prefix/postfix):

  • fun(RV^);//(proposed). Used for XOR but should not be ambiguous; reminds of a C++ special reference extension
  • fun(RV@);//@ has UDA meaning in D, but that could be made unambiguous
  • fun(auto(RV)); // suggested by Dmitry S; reminds of creating a temporary variable with auto-deduced type, analog to int(x + 1)
  • fun(RV.auto); // UFCS version of auto(RV), makes UFCS much nicer
  • likewise with ref(RV) and RV.ref //reminds of C# call site annotation via 'ref', and reminds of function signature
  • fun(RV#); //# has a special line reordering meaning in D, but that could be made unambiguous
  • fun(RV?); //? has a special (a?b:c) meaning in D, but that could be made unambiguous
  • fun(RV&); //probably a bad idea, since for a templated function fun(T)(ref Ta ) this could call fun!(typeof(RV*))(RV&)

This could look like this:

T fun1(ref T2);
T fun2(ref T2);
//with RV^:
[1,2]^.fun1^.fun2.writeln;
//with auto(RV):
auto(auto([1,2]).fun1).fun2.writeln; // a bit verbose
//with RV.auto:
[1,2].auto.fun1.auto.fun2.writeln; // better

Copyright

This document has been placed in the Public Domain.

Thanks to Dmitry S for corrections.