Difference between revisions of "FatPointer"

From D Wiki
Jump to: navigation, search
(Created page with "This is a proposal for reference counting based on fat pointers. Several use cases are considered. TODO: * add missing use cases * destroy == Fat pointer == Slim pointer is ...")
 
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This is a proposal for reference counting based on fat pointers. Several use cases are considered.
+
A proposal for reference counting based on fat pointers. Most use cases are considered. The scheme takes advantage of D type system and distinction between shared and unshared data to decide on interlocked reference counting at compile time. Unshared mutable data and shallow shared data (like strings) are counted with the fastest possible non-atomic arithmetic. Cycles should be handled by the user, see an example at the end. Language interface is not considered, assess if the approach itself is ok.
 +
 
 +
[http://forum.dlang.org/post/outhxagpohmodjnkzzol@forum.dlang.org Discussion].
  
 
TODO:
 
TODO:
 
* add missing use cases
 
* add missing use cases
 
* destroy
 
* destroy
 +
  
 
== Fat pointer ==
 
== Fat pointer ==
Line 17: Line 20:
 
{
 
{
 
shared ReferenceCounter* sharedCounter;
 
shared ReferenceCounter* sharedCounter;
intptr_t sharingCount;
+
intptr_t flags; //unused
 
}
 
}
  
Line 39: Line 42:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
== Mutable data ==
+
Data pointer has chained local and shared counter. In each thread the pointer has a distinct local counter dedicated to that thread, all those counters are chained to the single shared counter. Unshared mutable data and shallow shared data (like strings) are counted by the local counter and non-atomic arithmetic. When the local counter reaches zero, the shared counter is decremented atomically (see <code>_d_rc_free</code>). When new local counter is created (for another thread), shared counter is incremented atomically. Local and shared counters probably should be allocated from separate pools to improve cache efficiency (shared counters are accessed less frequently).
Every mutable object allocated from heap has its own ReferenceCounter initialized to <code>{ count: 1, sharedCounter: null }</code> when the object is allocated.
 
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
 +
// initialization
 
FatPointer!(T) ptr;
 
FatPointer!(T) ptr;
 
ptr.__counter = referenceCounterAllocator.allocate();
 
ptr.__counter = referenceCounterAllocator.allocate();
 
ptr.__counter.count = 1;
 
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = null;
+
ptr.__counter.sharedCounter = referenceCounterAllocator.allocate();
 +
ptr.__counter.sharedCounter.count = 1;
 
ptr.__data = data;
 
ptr.__data = data;
</syntaxhighlight>
 
  
Usage:
 
<syntaxhighlight lang="d">
 
FatPointer!(T) ptr;
 
 
// increment
 
// increment
 
ptr.__counter.count++;
 
ptr.__counter.count++;
 
// decrement
 
// decrement
 
ptr.__counter.count--;
 
ptr.__counter.count--;
if(ptr.__counter.count==0)
+
if (ptr.__counter.count == 0)
 
_d_rc_free(ptr.__counter, ptr.__data);
 
_d_rc_free(ptr.__counter, ptr.__data);
  
 +
// handler, called when counter reaches zero
 
extern(C) void _d_rc_free(ReferenceCounter* counter, void* data)
 
extern(C) void _d_rc_free(ReferenceCounter* counter, void* data)
 
{
 
{
// const and mutable thread-local data have simple counter
 
if(counter.sharedCounter==null)
 
{
 
referenceCounterAllocator.deallocate(counter);
 
GC.free(data);
 
return;
 
}
 
 
// immutable and shared data have two chained counters
 
 
shared scounter = counter.sharedCounter;
 
shared scounter = counter.sharedCounter;
 +
assert(scounter, "scoped counter should not be released");
 
referenceCounterAllocator.deallocate(counter);
 
referenceCounterAllocator.deallocate(counter);
if (scounter.atomicDecrement() > 0) return;
+
//shared counter is 1 only if it's unique
 +
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
 
referenceCounterAllocator.deallocate(scounter);
 
referenceCounterAllocator.deallocate(scounter);
 
GC.free(data);
 
GC.free(data);
Line 79: Line 73:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
TODO: how to decrement counters of pointers in complex data?
+
TODO: how to decrement counters of pointers in complex data? There should a function, which would walk the data structure releasing references. It should work through interfaces, closures, structs and hierarchies. A possible solution is to associate a typeinfo bytecode with the allocated heap block and provide it to the releasing function.
 +
 
 +
<syntaxhighlight lang="d">
 +
// GC.free
 +
void free(void* data)
 +
{
 +
...
 +
byte[] structCode = getStructCode(data);
 +
_d_rc_release_recurse(data, structCode);
 +
...
 +
}
 +
</syntaxhighlight>
  
Mutable data allocated in tls section has a thread-local ReferenceCounter allocated in the same section and initialized to <code>{ count: 2, sharedCounter: null }</code> so that the counter never reaches zero even if the reference to the data is lost. When a reference to the data is taken, the counter is incremented non-atomically. This single statically allocated ReferenceCounter can be used for all objects in tls section.
+
TODO: investigate explicit and implicit conversions between fat and slim pointers.
  
Mutable data allocated (__gshared) in a writable non-tls section has a shared ReferenceCounter allocated in a writable section and initialized to <code>{ count: 2, sharedCounter: null }</code> so that the counter never reaches zero even if the reference to the data is lost. When a reference to the data is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter.
+
== Unshared data ==
 +
A reference is unshared if it's located in a mutable thread-local memory: stack, tls or a member of an unshared mutable object. Such references are counted by the local counter. When the local counter reaches zero, the shared counter is decremented atomically (see <code>_d_rc_free</code>). Unshared references include shallow shared objects like strings.
  
A fat pointer in __gshared data has only shared counter. When it's copied from __gshared data to thread-local data, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter. When it's copied from thread-local data to __gshared data, the shared counter is incremented atomically and stored as the counter for the pointer. If the pointer is only read with atomic increment, the operation is thread-safe, writing is not thread-safe if pointer at the destination is not null and should be decremented. User code can accelerate access to __gshared data by caching it in tls section, in that case only non-atomic increment of a thread-local counter is needed.
+
Mutable data allocated in tls section has chained ReferenceCounters allocated in the same section and initialized to <code>{1,1}</code> so that the local counter never reaches zero even if all references to the data are lost. When a reference to the data is taken, the local counter is incremented non-atomically, this applies to references stored in tls section too. This statically allocated pair of counters can be used for all objects in tls section, the counter value is <code>number of pointers + 1</code>. It's invalid to cast object in tls section to shared as it's always awailable directly from tls section, hence it can't be unique.
  
== Immutable data ==
+
== Shared data ==
Immutable data pointer has chained thread-local and shared counter. In each thread the pointer has a distinct thread-local counter dedicated to that thread, all those counters are chained to the single shared counter. The majority of reference counting is handled by the thread-local counter. When the thread-local counter reaches zero, the shared counter is decremented atomically (see <code>_d_rc_free</code>).
+
A reference is shared if it's located in a shared memory, which includes immutable (because immutable is shared) and const (because const can be immutable). Such references are not directly accessible and must be copied from shared memory into unshared before use. When such reference is copied from shared memory into unshared, its shared counter is taken and atomically incremented, ignoring the original local counter, a new local counter is allocated for the unshared reference and initialized to <code>1</code>. The resulting unshared reference is counted as any other unshared reference.
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
 +
// copying between shared and unshared
 +
shared FatPointer!(T)* src;
 
FatPointer!(T) ptr;
 
FatPointer!(T) ptr;
 
ptr.__counter = referenceCounterAllocator.allocate();
 
ptr.__counter = referenceCounterAllocator.allocate();
 
ptr.__counter.count = 1;
 
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = referenceCounterAllocator.allocate();
+
ptr.__counter.sharedCounter = src.__counter.sharedCounter;
ptr.__counter.sharedCounter.count = 1;
+
ptr.__counter.sharedCounter.atomicIncrement();
ptr.__data = data;
+
ptr.__data = src.__data;
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Usage:
+
The same for copying from unshared memory into shared. Usually the destination shared reference already exists and should be released before new reference is copied into it. The shared reference is released the usual way; local counter is decremented non-atomically, it's a not thread-safe operation anyway, though the local counter is usually 1, because it's allocated every time a reference is copied into shared memory.
<syntaxhighlight lang="d">
+
 
FatPointer!(immutable T) ptr;
+
Shared object statically allocated in a writable section has a shared ReferenceCounter allocated in a writable section and initialized to <code>1</code>. Additional references in static data to the static shared object share the local counter, initialized to <code>number of references + 1</code> and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. Additional references in tls section to the static shared object share the local counter, initialized to <code>number of references + 1</code> and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter.
// increment
+
 
ptr.__counter.count++;
+
Immutable object statically allocated in a readonly section has a shared ReferenceCounter allocated in a writable section and initialized to <code>1</code>. Additional references in readonly sections to the static immutable object share the local counter, initialized to <code>1</code> and chained with the single shared counter (nothing happens with this counter). Additional references in mutable shared data to the static immutable object have distinct local counters, initialized to <code>2</code> and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static immutable object share the local counter, initialized to <code>number of references + 1</code> and chained with the single shared counter, so that the local counter never reaches zero even if all references to the object are lost. When a reference to the static immutable object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter.
// decrement
+
 
ptr.__counter.count--;
+
Mutable data allocated (<code>__gshared</code>) in a non-tls section has a shared ReferenceCounter allocated in a writable section and initialized to <code>1</code>. Additional references in <code>__gshared</code> data to the static object have distinct local counters, initialized to <code>2</code> and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static object share the local counter, initialized to <code>number of references + 1</code> and chained with the single shared counter so that the local counter never reaches zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter. Same for copying pointers from the section. TODO: <code>__gshared</code> probably has race condition, user should try to minimize indirections.
if(ptr.__counter.count==0)
+
 
_d_rc_free(ptr.__counter, ptr.__data);
+
Writing to shared data is not thread-safe.
</syntaxhighlight>
+
 
 +
User code can accelerate access to shared data by caching it in tls section, in that case only non-atomic increment of a local counter is needed.
  
Immutable data statically allocated in a readonly section has a shared ReferenceCounter allocated in a writable section; due to constant folding, in static data can be many pointers to single immutable object, so the shared ReferenceCounter should be initialized to <code>number of pointers + 1</code> so that the counter never reaches zero even if the reference to the data is lost. When a reference to the static immutable data is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter.
+
When a module links with static data in another module, it allocates another shared counter, adjusted similarly so that it never reaches zero.
  
TODO: how to cast mutable data to immutable data (assumeUnique)? Mutable data can be casted to immutable - all thread-local counters in the data structure become shared. What to do with pointers to immutable data in mutable data? They have thread-local counters which must not be put to a shared context.
+
== Casts ==
 +
Cast mutable unshared ↔ mutable shared is mostly no-op, but it should be done more carefully than in GC environment. If the code still has a reference to unshared data, which was already put into shared context, it may share the local counter with another thread. This is normally achieved by releasing lock in an outer scope to the stack variables with unshared references.
  
TODO: what to do with thread-local counters when passing immutable data between threads (std.concurrency)?
+
Mostly no-op because immutable references should be rebound, because immutable data can't be realistically required to be unique as opposed to unshared data, which is required to be unique, when it's casted to shared. Local counters in immutable references should be replaced with fresh ones. The rebinding procedure walks unshared mutable data until it finds immutable references, it doesn't dive deeper into shared, const and immutable data.
  
== Const data ==
+
Since RC tracks all references, it's possible to check, that the casted mutable data is unique and throw an exception if it's not. Unfortunately, only mutable data can be checked: const data can be immutable, which is not unique. Note for language interface: code written for GC can benefit from this check too.
Const data can be mutable or immutable. In both cases the immediate counter is thread-local. If the data is mutable, sharedCounter can be null, otherwise it points to the shared counter.
 
  
 
<syntaxhighlight lang="d">
 
<syntaxhighlight lang="d">
FatPointer!(const T) ptr;
+
// rebinding
// increment
+
FatPointer!(immutable T) ptr;
ptr.__counter.count++;
+
shared sc = ptr.__counter.sharedCounter;
// decrement
+
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count--;
+
ptr.__counter.count = 1;
if(ptr.__counter.count==0)
+
ptr.__counter.sharedCounter = sc;
_d_rc_free(ptr.__counter, ptr.__data);
+
ptr.__counter.sharedCounter.atomicIncrement();
 
</syntaxhighlight>
 
</syntaxhighlight>
  
== Shared data ==
+
Cast mutable ↔ immutable is the same as unshared ↔ shared.
Shared data pointer has chained thread-local and shared counter.
 
  
<syntaxhighlight lang="d">
+
Cast mutable ↔ const is no-op, <code>inout</code> works like <code>const</code>.
FatPointer!(shared T) ptr;
 
// increment
 
ptr.__counter.count++;
 
// decrement
 
ptr.__counter.count--;
 
if(ptr.__counter.count==0)
 
_d_rc_free(ptr.__counter, ptr.__data);
 
</syntaxhighlight>
 
  
Shared data statically allocated in a writable section has a shared ReferenceCounter allocated in a writable section and initialized to <code>{ count: 2, sharedCounter: null }</code> so that the counter never reaches zero even if the reference to the data is lost. When a reference to the data is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter.
+
Cast shared ↔ const is no-op.
  
A fat pointer in shared data has only shared counter. When it's copied from shared data to thread-local data, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to <code>1</code> and chained with the shared counter. When it's copied from thread-local data to shared data, the shared counter is incremented atomically and stored as the counter for the pointer. If the pointer is only read with atomic increment, the operation is thread-safe, writing is not thread-safe if pointer at the destination is not null and should be decremented. User code can accelerate access to shared data by caching it in tls section, in that case only non-atomic increment of a thread-local counter is needed.
+
Cast immutable ↔ const rebinds the reference to account for the case of cast from unshared to shared. Also applies when passing immutable data to a <code>scope const</code> (<code>in</code>) parameter.
  
TODO: how to cast shared data to thread-local data (locking)? Shared counters must not be incremented non-atomically. Immutable data in shared data should have only shared counter - same problem with casting from shared to thread-local.
+
== Null pointer ==
 +
A new null pointer is created by copying a statically defined null pointer following shared contract. Then null pointer is counted as any other pointer without any special casing.
  
TODO: what to do with thread-local counters when passing shared data between threads (std.concurrency)?
+
<syntaxhighlight lang="d">
 +
// initialize null pointer
 +
FatPointer!(T) ptr;
 +
ptr.__counter = referenceCounterAllocator.allocate();
 +
ptr.__counter.count = 1;
 +
ptr.__counter.sharedCounter = &_d_nullPointerCounter;
 +
ptr.__counter.sharedCounter.atomicIncrement();
 +
</syntaxhighlight>
  
 
== Interior pointer ==
 
== Interior pointer ==
Line 154: Line 162:
  
 
== Scoped data ==
 
== Scoped data ==
Reference count for scoped data is not tracked, because the caller holds the reference. The reference count is incremented when escaped or copied to a fat pointer. Use scoped stack variables to save on reference counting.
+
Reference counter is passed with the scoped data, but the counter is not maintained, because the caller holds the reference. The reference count is incremented when escaped or copied to a fat pointer. Use scoped stack variables to save on reference counting. The compiler can infer scoping for the stack variables, if they are assigned with only scoped values, variables not guaranteed to be scoped, maintain reference counter.
 +
 
 +
When a reference to a stack-allocated object is taken, a local ReferenceCounter is allocated on stack, initialized to <code>1</code> and not chained with the shared counter, and passed with the reference. The first reference will increment the counter so that it never reaches zero. This approach allows to work with scoped pointer just like with any other fat pointer without any specialization of code, and escaping can be controlled manually. References to the scoped object should be maintained and released as usual. When the scoped object itself is released, it's possible to check that its counter is <code>1</code>, if it's greater, it means a reference to it was escaped and an error can be thrown at run time.
 +
 
 +
TODO: if [[User:Schuetzm/scope|strict scoping]] works, scoped data can be passed with slim pointers. It will be impossible to assign scoped pointer to fat pointer (escape). Data structures with slim pointers should be created to allow work on scoped data. With slim pointers there also will be no need to rebind immutable reference when passing it to a <code>scope const</code> (<code>in</code>) parameter.
  
 
== Implicit this parameter ==
 
== Implicit this parameter ==
 
Implicit <code>this</code> parameter is scoped. The reference counter is incremented when escaped or an interior pointer is taken.
 
Implicit <code>this</code> parameter is scoped. The reference counter is incremented when escaped or an interior pointer is taken.
  
TODO: how to call construstor when creating a new object with slim pointer, what to do with reference counter?
+
TODO: how to call constructor when creating a new object with slim pointer, what to do with reference counter?
  
 
== out, ref, lazy ==
 
== out, ref, lazy ==
The <code>out</code> and <code>ref</code> references themselves are slim. If they reference a pointer, that pointer must be fat.
+
The <code>out</code> and <code>ref</code> references themselves can be scoped. If they reference a pointer, that pointer must be fat.
 +
 
 +
Needs a fat pointer for a reference to value type (as per description above scoped pointers are compatible with non-scoped):
 +
<syntaxhighlight lang="d">
 +
int* a;
 +
int* b;
 +
void fun(ref int c)
 +
{
 +
b = &c;
 +
}
 +
fun(*a);
 +
</syntaxhighlight>
 +
 
  
 
The <code>lazy</code> delegate is scoped.
 
The <code>lazy</code> delegate is scoped.
 +
 +
== Function parameters ==
 +
Caller doesn't increment the counter when passing a pointer as a function argument. Callee increments the counter when copying the argument to another non-scoped variable, use scoped stack variables to save on reference counting.
 +
 +
When an r-value with fat pointer is moved directly to a function parameter, it should be first moved to the caller's stack, and only then passed to the callee, the reference should be released after the callee returns.
  
 
== Return value ==
 
== Return value ==
 
Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for <code>new</code> operator.
 
Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for <code>new</code> operator.
 +
 +
When the caller should release arguments passed to the callee or its return value, tail call optimization will be impossible.
  
 
== Function pointer ==
 
== Function pointer ==
Function pointer is slim. TODO: what to do with functions constructed at run time (trampolines)?
+
Function pointer is slim.
 +
 
 +
TODO: what to do with functions constructed at run time (trampolines)?
  
 
== Delegate pointer ==
 
== Delegate pointer ==
Delegate pointer is counted similar to const pointer: it may have shared counter. If the delegate contains thread-local data, it has no shared counter and must be not put to a shared context.
+
Delegate pointer is counted as other pointers.
 +
 
 +
Delegate is treated as immutable with all appropriate rebindings.
  
 
== Consuming pointers in generic code ==
 
== Consuming pointers in generic code ==
Line 182: Line 217:
  
 
== Interfacing with C ==
 
== Interfacing with C ==
C structs have slim pointers. In this way they differ from D structs. Code, interfacing with C and not declaring C structs as extern(C), is incorrect. Compiler should report error if it detects passing fat pointers to C code.
+
C structs have slim pointers. In this aspect they differ from D structs. Code, interfacing with C and not declaring C structs as extern(C), is incorrect. Compiler should report error if it detects passing fat pointers to C code.
 +
 
 +
C function arguments are slim too.
  
 
TODO: what to do with return value?
 
TODO: what to do with return value?
  
 
== Interfacing with Objective-C ==
 
== Interfacing with Objective-C ==
If D will use Objective-C reference counting scheme, it may be simpler to interact with Objective-C code.
+
Objective-C reference counting can be sped up by [[#Custom reference counting|custom reference counting]]. Objective-C object gets chained counters initialized to <code>{1,1}</code>. This way it's possible to share a thread-safe object. NSObject's release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.
 +
<syntaxhighlight lang="d">
 +
class NSObject : cocoa.id
 +
{
 +
void release();
 +
alias opRCFree = NSObjectRCFree;
 +
}
 +
 
 +
void NSObjectRCFree(@slim ReferenceCounter* counter, @slim NSObject obj)
 +
{
 +
shared scounter = counter.sharedCounter;
 +
referenceCounterAllocator.deallocate(counter);
 +
//shared counter is 1 only if it's unique
 +
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
 +
referenceCounterAllocator.deallocate(scounter);
 +
obj.release();
 +
}
 +
</syntaxhighlight>
 +
 
 +
== Interfacing with COM ==
 +
IUnknown's reference counting can be sped up by [[#Custom reference counting|custom reference counting]]. IUnknown gets chained counters initialized to <code>{1,1}</code>. This way it's possible to share a thread-safe COM server. IUnknown's Release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.
 +
<syntaxhighlight lang="d">
 +
interface IUnknown
 +
{
 +
HRESULT QueryInterface(const(IID)* riid, void** pvObject);
 +
ULONG AddRef();
 +
ULONG Release();
 +
alias opRCFree = IUnknownRCFree;
 +
}
 +
 
 +
void IUnknownRCFree(@slim ReferenceCounter* counter, @slim IUnknown punk)
 +
{
 +
shared scounter = counter.sharedCounter;
 +
referenceCounterAllocator.deallocate(counter);
 +
//shared counter is 1 only if it's unique
 +
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
 +
referenceCounterAllocator.deallocate(scounter);
 +
punk.Release();
 +
}
 +
</syntaxhighlight>
 +
 
 +
== Variadic arguments ==
  
 
== Custom reference counting ==
 
== Custom reference counting ==
Line 295: Line 373:
 
}
 
}
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
[[Category:Proposals]]

Latest revision as of 01:54, 14 February 2018

A proposal for reference counting based on fat pointers. Most use cases are considered. The scheme takes advantage of D type system and distinction between shared and unshared data to decide on interlocked reference counting at compile time. Unshared mutable data and shallow shared data (like strings) are counted with the fastest possible non-atomic arithmetic. Cycles should be handled by the user, see an example at the end. Language interface is not considered, assess if the approach itself is ok.

Discussion.

TODO:

  • add missing use cases
  • destroy


Fat pointer

Slim pointer is a traditional C-style pointer.

Fat pointer combines slim pointer with a reference counter.

struct ReferenceCounter
{
	intptr_t count;
	union
	{
		shared ReferenceCounter* sharedCounter;
		intptr_t flags; //unused
	}

	intptr_t atomicIncrement() shared
	{
		return atomicOp!"+="(count, 1);
	}

	intptr_t atomicDecrement() shared
	{
		return atomicOp!"-="(count, 1);
	}
}

struct FatPointer(TPayload)
{
	ReferenceCounter* __counter;
	TPayload __data;
	alias this = __data;
}

Data pointer has chained local and shared counter. In each thread the pointer has a distinct local counter dedicated to that thread, all those counters are chained to the single shared counter. Unshared mutable data and shallow shared data (like strings) are counted by the local counter and non-atomic arithmetic. When the local counter reaches zero, the shared counter is decremented atomically (see _d_rc_free). When new local counter is created (for another thread), shared counter is incremented atomically. Local and shared counters probably should be allocated from separate pools to improve cache efficiency (shared counters are accessed less frequently).

// initialization
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = referenceCounterAllocator.allocate();
ptr.__counter.sharedCounter.count = 1;
ptr.__data = data;

// increment
ptr.__counter.count++;
// decrement
ptr.__counter.count--;
if (ptr.__counter.count == 0)
	_d_rc_free(ptr.__counter, ptr.__data);

// handler, called when counter reaches zero
extern(C) void _d_rc_free(ReferenceCounter* counter, void* data)
{
	shared scounter = counter.sharedCounter;
	assert(scounter, "scoped counter should not be released");
	referenceCounterAllocator.deallocate(counter);
	//shared counter is 1 only if it's unique
	if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
	referenceCounterAllocator.deallocate(scounter);
	GC.free(data);
}

TODO: how to decrement counters of pointers in complex data? There should a function, which would walk the data structure releasing references. It should work through interfaces, closures, structs and hierarchies. A possible solution is to associate a typeinfo bytecode with the allocated heap block and provide it to the releasing function.

// GC.free
void free(void* data)
{
	...
	byte[] structCode = getStructCode(data);
	_d_rc_release_recurse(data, structCode);
	...
}

TODO: investigate explicit and implicit conversions between fat and slim pointers.

Unshared data

A reference is unshared if it's located in a mutable thread-local memory: stack, tls or a member of an unshared mutable object. Such references are counted by the local counter. When the local counter reaches zero, the shared counter is decremented atomically (see _d_rc_free). Unshared references include shallow shared objects like strings.

Mutable data allocated in tls section has chained ReferenceCounters allocated in the same section and initialized to {1,1} so that the local counter never reaches zero even if all references to the data are lost. When a reference to the data is taken, the local counter is incremented non-atomically, this applies to references stored in tls section too. This statically allocated pair of counters can be used for all objects in tls section, the counter value is number of pointers + 1. It's invalid to cast object in tls section to shared as it's always awailable directly from tls section, hence it can't be unique.

Shared data

A reference is shared if it's located in a shared memory, which includes immutable (because immutable is shared) and const (because const can be immutable). Such references are not directly accessible and must be copied from shared memory into unshared before use. When such reference is copied from shared memory into unshared, its shared counter is taken and atomically incremented, ignoring the original local counter, a new local counter is allocated for the unshared reference and initialized to 1. The resulting unshared reference is counted as any other unshared reference.

// copying between shared and unshared
shared FatPointer!(T)* src;
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = src.__counter.sharedCounter;
ptr.__counter.sharedCounter.atomicIncrement();
ptr.__data = src.__data;

The same for copying from unshared memory into shared. Usually the destination shared reference already exists and should be released before new reference is copied into it. The shared reference is released the usual way; local counter is decremented non-atomically, it's a not thread-safe operation anyway, though the local counter is usually 1, because it's allocated every time a reference is copied into shared memory.

Shared object statically allocated in a writable section has a shared ReferenceCounter allocated in a writable section and initialized to 1. Additional references in static data to the static shared object share the local counter, initialized to number of references + 1 and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. Additional references in tls section to the static shared object share the local counter, initialized to number of references + 1 and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to 1 and chained with the shared counter.

Immutable object statically allocated in a readonly section has a shared ReferenceCounter allocated in a writable section and initialized to 1. Additional references in readonly sections to the static immutable object share the local counter, initialized to 1 and chained with the single shared counter (nothing happens with this counter). Additional references in mutable shared data to the static immutable object have distinct local counters, initialized to 2 and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static immutable object share the local counter, initialized to number of references + 1 and chained with the single shared counter, so that the local counter never reaches zero even if all references to the object are lost. When a reference to the static immutable object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to 1 and chained with the shared counter.

Mutable data allocated (__gshared) in a non-tls section has a shared ReferenceCounter allocated in a writable section and initialized to 1. Additional references in __gshared data to the static object have distinct local counters, initialized to 2 and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static object share the local counter, initialized to number of references + 1 and chained with the single shared counter so that the local counter never reaches zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a local ReferenceCounter is allocated, initialized to 1 and chained with the shared counter. Same for copying pointers from the section. TODO: __gshared probably has race condition, user should try to minimize indirections.

Writing to shared data is not thread-safe.

User code can accelerate access to shared data by caching it in tls section, in that case only non-atomic increment of a local counter is needed.

When a module links with static data in another module, it allocates another shared counter, adjusted similarly so that it never reaches zero.

Casts

Cast mutable unshared ↔ mutable shared is mostly no-op, but it should be done more carefully than in GC environment. If the code still has a reference to unshared data, which was already put into shared context, it may share the local counter with another thread. This is normally achieved by releasing lock in an outer scope to the stack variables with unshared references.

Mostly no-op because immutable references should be rebound, because immutable data can't be realistically required to be unique as opposed to unshared data, which is required to be unique, when it's casted to shared. Local counters in immutable references should be replaced with fresh ones. The rebinding procedure walks unshared mutable data until it finds immutable references, it doesn't dive deeper into shared, const and immutable data.

Since RC tracks all references, it's possible to check, that the casted mutable data is unique and throw an exception if it's not. Unfortunately, only mutable data can be checked: const data can be immutable, which is not unique. Note for language interface: code written for GC can benefit from this check too.

// rebinding
FatPointer!(immutable T) ptr;
shared sc = ptr.__counter.sharedCounter;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = sc;
ptr.__counter.sharedCounter.atomicIncrement();

Cast mutable ↔ immutable is the same as unshared ↔ shared.

Cast mutable ↔ const is no-op, inout works like const.

Cast shared ↔ const is no-op.

Cast immutable ↔ const rebinds the reference to account for the case of cast from unshared to shared. Also applies when passing immutable data to a scope const (in) parameter.

Null pointer

A new null pointer is created by copying a statically defined null pointer following shared contract. Then null pointer is counted as any other pointer without any special casing.

// initialize null pointer
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = &_d_nullPointerCounter;
ptr.__counter.sharedCounter.atomicIncrement();

Interior pointer

A pointer to a field of an object and a slice of an array share the reference counter with the enclosing object. Counting is done transparently, no need to compute the beginning of the object.

Scoped data

Reference counter is passed with the scoped data, but the counter is not maintained, because the caller holds the reference. The reference count is incremented when escaped or copied to a fat pointer. Use scoped stack variables to save on reference counting. The compiler can infer scoping for the stack variables, if they are assigned with only scoped values, variables not guaranteed to be scoped, maintain reference counter.

When a reference to a stack-allocated object is taken, a local ReferenceCounter is allocated on stack, initialized to 1 and not chained with the shared counter, and passed with the reference. The first reference will increment the counter so that it never reaches zero. This approach allows to work with scoped pointer just like with any other fat pointer without any specialization of code, and escaping can be controlled manually. References to the scoped object should be maintained and released as usual. When the scoped object itself is released, it's possible to check that its counter is 1, if it's greater, it means a reference to it was escaped and an error can be thrown at run time.

TODO: if strict scoping works, scoped data can be passed with slim pointers. It will be impossible to assign scoped pointer to fat pointer (escape). Data structures with slim pointers should be created to allow work on scoped data. With slim pointers there also will be no need to rebind immutable reference when passing it to a scope const (in) parameter.

Implicit this parameter

Implicit this parameter is scoped. The reference counter is incremented when escaped or an interior pointer is taken.

TODO: how to call constructor when creating a new object with slim pointer, what to do with reference counter?

out, ref, lazy

The out and ref references themselves can be scoped. If they reference a pointer, that pointer must be fat.

Needs a fat pointer for a reference to value type (as per description above scoped pointers are compatible with non-scoped):

int* a;
int* b;
void fun(ref int c)
{
	b = &c;
}
fun(*a);


The lazy delegate is scoped.

Function parameters

Caller doesn't increment the counter when passing a pointer as a function argument. Callee increments the counter when copying the argument to another non-scoped variable, use scoped stack variables to save on reference counting.

When an r-value with fat pointer is moved directly to a function parameter, it should be first moved to the caller's stack, and only then passed to the callee, the reference should be released after the callee returns.

Return value

Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for new operator.

When the caller should release arguments passed to the callee or its return value, tail call optimization will be impossible.

Function pointer

Function pointer is slim.

TODO: what to do with functions constructed at run time (trampolines)?

Delegate pointer

Delegate pointer is counted as other pointers.

Delegate is treated as immutable with all appropriate rebindings.

Consuming pointers in generic code

TODO: how?

Garbage collector

GC can be used to collect cycles. The client code can disable it only if you are sure you have no cycles.

Interfacing with C

C structs have slim pointers. In this aspect they differ from D structs. Code, interfacing with C and not declaring C structs as extern(C), is incorrect. Compiler should report error if it detects passing fat pointers to C code.

C function arguments are slim too.

TODO: what to do with return value?

Interfacing with Objective-C

Objective-C reference counting can be sped up by custom reference counting. Objective-C object gets chained counters initialized to {1,1}. This way it's possible to share a thread-safe object. NSObject's release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.

class NSObject : cocoa.id
{
	void release();
	alias opRCFree = NSObjectRCFree;
}

void NSObjectRCFree(@slim ReferenceCounter* counter, @slim NSObject obj)
{
	shared scounter = counter.sharedCounter;
	referenceCounterAllocator.deallocate(counter);
	//shared counter is 1 only if it's unique
	if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
	referenceCounterAllocator.deallocate(scounter);
	obj.release();
}

Interfacing with COM

IUnknown's reference counting can be sped up by custom reference counting. IUnknown gets chained counters initialized to {1,1}. This way it's possible to share a thread-safe COM server. IUnknown's Release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.

interface IUnknown
{
	HRESULT QueryInterface(const(IID)* riid, void** pvObject);
	ULONG AddRef();
	ULONG Release();
	alias opRCFree = IUnknownRCFree;
}

void IUnknownRCFree(@slim ReferenceCounter* counter, @slim IUnknown punk)
{
	shared scounter = counter.sharedCounter;
	referenceCounterAllocator.deallocate(counter);
	//shared counter is 1 only if it's unique
	if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
	referenceCounterAllocator.deallocate(scounter);
	punk.Release();
}

Variadic arguments

Custom reference counting

If a data structure needs custom reference counting, it declares opRCFree alias to a function with signature void function(ReferenceCounter*, void*), which handles freeing.

extern void NodeRCFree(@slim ReferenceCounter* counter, @slim void* data);

interface INode
{
	INode parent();
	INode previousSibling();
	INode nextSibling();
	INode child();
	alias opRCFree = NodeRCFree;
}

Usage:

FatPointer!(INode) ptr;
// increment
ptr.__counter.count++;
// decrement
ptr.__counter.count--;
if(ptr.__counter.count==0)
	INode.opRCFree(ptr.__counter, ptr.__data);

Cyclic data structures

Cyclic data structures can be designed for reliable reference counting.

See (incomplete) example below. The cyclic data structure (graph) has single reference counter shared among all nodes, the count is equal to the number of external references to the graph nodes. The client code increments the counter as usual. When the counter reaches zero, the custom opRCFree is called, which frees the entire graph. This way the entire graph stays alive as long as there is a reference to one of its nodes, and the code can navigate the entire graph from any node. The nodes are owned by the graph, to split or merge graphs, ownership should be changed.

interface INode
{
	INode parent();
	INode previousSibling();
	INode nextSibling();
	INode child();
	void child(INode);
	alias opRCFree = NodeRCFree;
}

class Node: INode
{
	INode parent()
	{
		FatPointer!(INode) ptr;
		_counter.count++;
		ptr.__counter = _counter;
		ptr.__data = _parent;
		return ptr;
	}
	//INode previousSibling();
	//INode nextSibling();
	//INode child();
	alias opRCFree = NodeRCFree;

	Node newNode(string name)
	{
		@slim Node node = new Node(name);
		node._counter = _counter;
		node._parent = this;
		FatPointer!(Node) ptr;
		_counter.count++;
		ptr.__counter = _counter;
		ptr.__data = node;
		return ptr;
	}

	void child(INode node) // setter
	{
		Node newChild = cast(Node)node;
		enforce(newChild._counter == _counter, "node should belong to this graph");
		//TODO: how to assign? What to do with orphan child node?
	}

	private:
	string _name; // fat pointers for non-cyclic data
	@slim ReferenceCounter* _counter;
	// slim pointers for cyclic data
	@slim Node _parent;
	@slim Node _previousSibling;
	@slim Node _nextSibling;
	@slim Node _child;

	void freeRecurse()
	{
		if (_previousSibling != null)
			_previousSibling.freeRecurse(), GC.free(_previousSibling);
		if (_nextSibling != null)
			_nextSibling.freeRecurse(), GC.free(_nextSibling);
		if (_child != null)
			_child.freeRecurse(), GC.free(_child);
	}
}

void NodeRCFree(@slim ReferenceCounter* counter, @slim void* data)
{
	@slim Node root = cast(Node)data;
	while (root._parent!=null) root = root._parent;
	root.freeRecurse(); // free entire graph
	GC.free(root);
	referenceCounterAllocator.deallocate(counter);
}