Difference between revisions of "FatPointer"
(protocol for parameters, COM and Obj-C) |
|||
Line 42: | Line 42: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | Data pointer has chained local and shared counter. In each thread the pointer has a distinct local counter dedicated to that thread, all those counters are chained to the single shared counter. Unshared mutable data and shallow shared data (like strings) are counted by the local counter and non-atomic arithmetic. When the local counter reaches zero, the shared counter is decremented atomically (see <code>_d_rc_free</code>). When new local counter is created (for another thread), shared counter is incremented atomically. | + | Data pointer has chained local and shared counter. In each thread the pointer has a distinct local counter dedicated to that thread, all those counters are chained to the single shared counter. Unshared mutable data and shallow shared data (like strings) are counted by the local counter and non-atomic arithmetic. When the local counter reaches zero, the shared counter is decremented atomically (see <code>_d_rc_free</code>). When new local counter is created (for another thread), shared counter is incremented atomically. Local and shared counters probably should be allocated from separate pools to improve cache efficiency (shared counters are accessed less frequently). |
<syntaxhighlight lang="d"> | <syntaxhighlight lang="d"> | ||
Line 198: | Line 198: | ||
Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for <code>new</code> operator. | Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for <code>new</code> operator. | ||
− | When the caller should release arguments passed to the callee or its return | + | When the caller should release arguments passed to the callee or its return value, tail call optimization will be impossible. |
== Function pointer == | == Function pointer == | ||
Line 208: | Line 208: | ||
Delegate pointer is counted as other pointers. | Delegate pointer is counted as other pointers. | ||
− | TODO: should it be rebound during shared ↔ unshared cast? Probably yes as it's not required to be unique for the cast. | + | TODO: should it be rebound during shared ↔ unshared cast? Probably yes as it's not required to be unique for the cast. The same from conversion to const. |
== Consuming pointers in generic code == | == Consuming pointers in generic code == |
Revision as of 12:37, 27 July 2014
A proposal for reference counting based on fat pointers. Most use cases are considered. The scheme takes advantage of D type system and distinction between shared and unshared data to decide on interlocked reference counting at compile time. Unshared mutable data and shallow shared data (like strings) are counted with the fastest possible non-atomic arithmetic. Cycles should be handled by the user, see an example at the end. Language interface is not considered, assess if the approach itself is ok.
TODO:
- add missing use cases
- destroy
Contents
- 1 Fat pointer
- 2 Unshared data
- 3 Shared data
- 4 Casts
- 5 Null pointer
- 6 Interior pointer
- 7 Scoped data
- 8 Implicit this parameter
- 9 out, ref, lazy
- 10 Function parameters
- 11 Return value
- 12 Function pointer
- 13 Delegate pointer
- 14 Consuming pointers in generic code
- 15 Garbage collector
- 16 Interfacing with C
- 17 Interfacing with Objective-C
- 18 Interfacing with COM
- 19 Variadic arguments
- 20 Custom reference counting
- 21 Cyclic data structures
Fat pointer
Slim pointer is a traditional C-style pointer.
Fat pointer combines slim pointer with a reference counter.
struct ReferenceCounter
{
intptr_t count;
union
{
shared ReferenceCounter* sharedCounter;
intptr_t flags; //unused
}
intptr_t atomicIncrement() shared
{
return atomicOp!"+="(count, 1);
}
intptr_t atomicDecrement() shared
{
return atomicOp!"-="(count, 1);
}
}
struct FatPointer(TPayload)
{
ReferenceCounter* __counter;
TPayload __data;
alias this = __data;
}
Data pointer has chained local and shared counter. In each thread the pointer has a distinct local counter dedicated to that thread, all those counters are chained to the single shared counter. Unshared mutable data and shallow shared data (like strings) are counted by the local counter and non-atomic arithmetic. When the local counter reaches zero, the shared counter is decremented atomically (see _d_rc_free
). When new local counter is created (for another thread), shared counter is incremented atomically. Local and shared counters probably should be allocated from separate pools to improve cache efficiency (shared counters are accessed less frequently).
// initialization
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = referenceCounterAllocator.allocate();
ptr.__counter.sharedCounter.count = 1;
ptr.__data = data;
// increment
ptr.__counter.count++;
// decrement
ptr.__counter.count--;
if (ptr.__counter.count == 0)
_d_rc_free(ptr.__counter, ptr.__data);
// handler, called when counter reaches zero
extern(C) void _d_rc_free(ReferenceCounter* counter, void* data)
{
shared scounter = counter.sharedCounter;
assert(scounter, "scoped counter should not be released");
referenceCounterAllocator.deallocate(counter);
//shared counter is 1 only if it's unique
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
referenceCounterAllocator.deallocate(scounter);
GC.free(data);
}
TODO: how to decrement counters of pointers in complex data? There should a function, which would walk the data structure releasing references. It should work through interfaces, closures, structs and hierarchies. A possible solution is to associate a typeinfo bytecode with the allocated heap block and provide it to the releasing function.
// GC.free
void free(void* data)
{
...
byte[] structCode = getStructCode(data);
_d_rc_release_recurse(data, structCode);
...
}
TODO: investigate explicit and implicit conversions between fat and slim pointers.
A reference is unshared if it's located in a mutable thread-local memory: stack, tls or a member of an unshared mutable object. Such references are counted by the local counter. When the local counter reaches zero, the shared counter is decremented atomically (see _d_rc_free
). Unshared references include shallow shared objects like strings.
Mutable data allocated in tls section has chained ReferenceCounters allocated in the same section and initialized to {1,1}
so that the local counter never reaches zero even if all references to the data are lost. When a reference to the data is taken, the local counter is incremented non-atomically, this applies to references stored in tls section too. This statically allocated pair of counters can be used for all objects in tls section, the counter value is number of pointers + 1
. It's invalid to cast object in tls section to shared as it's always awailable directly from tls section, hence it can't be unique.
A reference is shared if it's located in a shared memory, which includes immutable (because immutable is shared) and const (because const can be immutable). Such references are not directly accessible and must be copied from shared memory into unshared before use. When such reference is copied from shared memory into unshared, its shared counter is taken and atomically incremented, ignoring the original local counter, a new local counter is allocated for the unshared reference and initialized to 1
. The resulting unshared reference is counted as any other unshared reference.
// copying between shared and unshared
shared FatPointer!(T)* src;
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = src.__counter.sharedCounter;
ptr.__counter.sharedCounter.atomicIncrement();
ptr.__data = src.__data;
The same for copying from unshared memory into shared. Usually the destination shared reference already exists and should be released before new reference is copied into it. The shared reference is released the usual way; local counter is decremented non-atomically, it's a not thread-safe operation anyway, though the local counter is usually 1, because it's allocated every time a reference is copied into shared memory.
Shared object statically allocated in a writable section has a shared ReferenceCounter allocated in a writable section and initialized to 1
. Additional references in static data to the static shared object share the local counter, initialized to number of references + 1
and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. Additional references in tls section to the static shared object share the local counter, initialized to number of references + 1
and chained with the single shared counter so that the counters never reach zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to 1
and chained with the shared counter.
Immutable object statically allocated in a readonly section has a shared ReferenceCounter allocated in a writable section and initialized to 1
. Additional references in readonly sections to the static immutable object share the local counter, initialized to 1
and chained with the single shared counter (nothing happens with this counter). Additional references in mutable shared data to the static immutable object have distinct local counters, initialized to 2
and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static immutable object share the local counter, initialized to number of references + 1
and chained with the single shared counter, so that the local counter never reaches zero even if all references to the object are lost. When a reference to the static immutable object is taken, the shared counter is incremented atomically, a thread-local ReferenceCounter is allocated, initialized to 1
and chained with the shared counter.
Mutable data allocated (__gshared
) in a non-tls section has a shared ReferenceCounter allocated in a writable section and initialized to 1
. Additional references in __gshared
data to the static object have distinct local counters, initialized to 2
and chained with the single shared counter, so that the local counters never reach zero even if all mutable references to the object are lost. Additional references in tls section to the static object share the local counter, initialized to number of references + 1
and chained with the single shared counter so that the local counter never reaches zero even if all references to the object are lost. When a reference to the object is taken, the shared counter is incremented atomically, a local ReferenceCounter is allocated, initialized to 1
and chained with the shared counter. Same for copying pointers from the section. TODO: __gshared
probably has race condition, user should try to minimize indirections.
Writing to shared data is not thread-safe.
User code can accelerate access to shared data by caching it in tls section, in that case only non-atomic increment of a local counter is needed.
When a module links with static data in another module, it allocates another shared counter, adjusted similarly so that it never reaches zero.
Casts
Cast mutable unshared ↔ mutable shared is mostly no-op, but it should be done more carefully than in GC environment. If the code still has a reference to unshared data, which was already put into shared context, it may share the local counter with another thread. This is normally achieved by releasing lock in an outer scope to the stack variables with unshared references.
Mostly no-op because immutable references should be rebound, because immutable data can't be realistically required to be unique as opposed to unshared data, which is required to be unique, when it's casted to shared. Local counters in immutable references should be replaced with fresh ones. The rebinding procedure walks unshared mutable data until it finds immutable references, it doesn't dive deeper into shared, const and immutable data.
Since RC tracks all references, it's possible to check, that the casted mutable data is unique and throw an exception if it's not. Unfortunately, only mutable data can be checked: const data can be immutable, which is not unique. Note for language interface: code written for GC can benefit from this check too.
// rebinding
FatPointer!(immutable T) ptr;
shared sc = ptr.__counter.sharedCounter;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = sc;
ptr.__counter.sharedCounter.atomicIncrement();
Cast mutable ↔ immutable is the same as unshared ↔ shared.
Cast mutable ↔ const is no-op, inout
works like const
.
Cast shared ↔ const is no-op.
Cast immutable ↔ const rebinds the reference to account for the case of cast from unshared to shared. Also applies when passing immutable data to a scope const
(in
) parameter.
Null pointer
A new null pointer is created by copying a statically defined null pointer following shared contract. Then null pointer is counted as any other pointer without any special casing.
// initialize null pointer
FatPointer!(T) ptr;
ptr.__counter = referenceCounterAllocator.allocate();
ptr.__counter.count = 1;
ptr.__counter.sharedCounter = &_d_nullPointerCounter;
ptr.__counter.sharedCounter.atomicIncrement();
Interior pointer
A pointer to a field of an object and a slice of an array share the reference counter with the enclosing object. Counting is done transparently, no need to compute the beginning of the object.
Scoped data
Reference counter is passed with the scoped data, but the counter is not maintained, because the caller holds the reference. The reference count is incremented when escaped or copied to a fat pointer. Use scoped stack variables to save on reference counting. The compiler can infer scoping for the stack variables, if they are assigned with only scoped values, variables not guaranteed to be scoped, maintain reference counter.
When a reference to a stack-allocated object is taken, a local ReferenceCounter is allocated on stack, initialized to 1
and not chained with the shared counter, and passed with the reference. The first reference will increment the counter so that it never reaches zero. This approach allows to work with scoped pointer just like with any other fat pointer without any specialization of code, and escaping can be controlled manually. References to the scoped object should be maintained and released as usual. When the scoped object itself is released, it's possible to check that its counter is 1
, if it's greater, it means a reference to it was escaped and an error can be thrown at run time.
TODO: if strict scoping works, scoped data can be passed with slim pointers. It will be impossible to assign scoped pointer to fat pointer (escape). Data structures with slim pointers should be created to allow work on scoped data. With slim pointers there also will be no need to rebind immutable reference when passing it to a scope const
(in
) parameter.
Implicit this parameter
Implicit this
parameter is scoped. The reference counter is incremented when escaped or an interior pointer is taken.
TODO: how to call constructor when creating a new object with slim pointer, what to do with reference counter?
out, ref, lazy
The out
and ref
references themselves can be scoped. If they reference a pointer, that pointer must be fat.
Needs a fat pointer for a reference to value type (as per description above scoped pointers are compatible with non-scoped):
int* a;
int* b;
void fun(ref int c)
{
b = &c;
}
fun(*a);
The lazy
delegate is scoped.
Function parameters
Caller doesn't increment the counter when passing a pointer as a function argument. Callee increments the counter when copying the argument to another non-scoped variable, use scoped stack variables to save on reference counting.
When an r-value with fat pointer is moved directly to a function parameter, it should be first moved to the caller's stack, and only then passed to the callee, the reference should be released after the callee returns.
Return value
Callee returns a value with incremented reference counter, caller should decrement it if not used. Same for new
operator.
When the caller should release arguments passed to the callee or its return value, tail call optimization will be impossible.
Function pointer
Function pointer is slim.
TODO: what to do with functions constructed at run time (trampolines)?
Delegate pointer
Delegate pointer is counted as other pointers.
TODO: should it be rebound during shared ↔ unshared cast? Probably yes as it's not required to be unique for the cast. The same from conversion to const.
Consuming pointers in generic code
TODO: how?
Garbage collector
GC can be used to collect cycles. The client code can disable it only if you are sure you have no cycles.
Interfacing with C
C structs have slim pointers. In this aspect they differ from D structs. Code, interfacing with C and not declaring C structs as extern(C), is incorrect. Compiler should report error if it detects passing fat pointers to C code.
C function arguments are slim too.
TODO: what to do with return value?
Interfacing with Objective-C
Objective-C reference counting can be sped up by custom reference counting. Objective-C object gets chained counters initialized to {1,1}
. This way it's possible to share a thread-safe object. NSObject's release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.
class NSObject : cocoa.id
{
void release();
alias opRCFree = NSObjectRCFree;
}
void NSObjectRCFree(@slim ReferenceCounter* counter, @slim NSObject obj)
{
shared scounter = counter.sharedCounter;
referenceCounterAllocator.deallocate(counter);
//shared counter is 1 only if it's unique
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
referenceCounterAllocator.deallocate(scounter);
obj.release();
}
Interfacing with COM
IUnknown's reference counting can be sped up by custom reference counting. IUnknown gets chained counters initialized to {1,1}
. This way it's possible to share a thread-safe COM server. IUnknown's Release is called only after all references to it are lost. This way majority of counting is done with non-atomic arithmetic again.
interface IUnknown
{
HRESULT QueryInterface(const(IID)* riid, void** pvObject);
ULONG AddRef();
ULONG Release();
alias opRCFree = IUnknownRCFree;
}
void IUnknownRCFree(@slim ReferenceCounter* counter, @slim IUnknown punk)
{
shared scounter = counter.sharedCounter;
referenceCounterAllocator.deallocate(counter);
//shared counter is 1 only if it's unique
if (scounter.count > 1 && scounter.atomicDecrement() > 0) return;
referenceCounterAllocator.deallocate(scounter);
punk.Release();
}
Variadic arguments
Custom reference counting
If a data structure needs custom reference counting, it declares opRCFree
alias to a function with signature void function(ReferenceCounter*, void*)
, which handles freeing.
extern void NodeRCFree(@slim ReferenceCounter* counter, @slim void* data);
interface INode
{
INode parent();
INode previousSibling();
INode nextSibling();
INode child();
alias opRCFree = NodeRCFree;
}
Usage:
FatPointer!(INode) ptr;
// increment
ptr.__counter.count++;
// decrement
ptr.__counter.count--;
if(ptr.__counter.count==0)
INode.opRCFree(ptr.__counter, ptr.__data);
Cyclic data structures
Cyclic data structures can be designed for reliable reference counting.
See (incomplete) example below. The cyclic data structure (graph) has single reference counter shared among all nodes, the count is equal to the number of external references to the graph nodes. The client code increments the counter as usual. When the counter reaches zero, the custom opRCFree is called, which frees the entire graph. This way the entire graph stays alive as long as there is a reference to one of its nodes, and the code can navigate the entire graph from any node. The nodes are owned by the graph, to split or merge graphs, ownership should be changed.
interface INode
{
INode parent();
INode previousSibling();
INode nextSibling();
INode child();
void child(INode);
alias opRCFree = NodeRCFree;
}
class Node: INode
{
INode parent()
{
FatPointer!(INode) ptr;
_counter.count++;
ptr.__counter = _counter;
ptr.__data = _parent;
return ptr;
}
//INode previousSibling();
//INode nextSibling();
//INode child();
alias opRCFree = NodeRCFree;
Node newNode(string name)
{
@slim Node node = new Node(name);
node._counter = _counter;
node._parent = this;
FatPointer!(Node) ptr;
_counter.count++;
ptr.__counter = _counter;
ptr.__data = node;
return ptr;
}
void child(INode node) // setter
{
Node newChild = cast(Node)node;
enforce(newChild._counter == _counter, "node should belong to this graph");
//TODO: how to assign? What to do with orphan child node?
}
private:
string _name; // fat pointers for non-cyclic data
@slim ReferenceCounter* _counter;
// slim pointers for cyclic data
@slim Node _parent;
@slim Node _previousSibling;
@slim Node _nextSibling;
@slim Node _child;
void freeRecurse()
{
if (_previousSibling != null)
_previousSibling.freeRecurse(), GC.free(_previousSibling);
if (_nextSibling != null)
_nextSibling.freeRecurse(), GC.free(_nextSibling);
if (_child != null)
_child.freeRecurse(), GC.free(_child);
}
}
void NodeRCFree(@slim ReferenceCounter* counter, @slim void* data)
{
@slim Node root = cast(Node)data;
while (root._parent!=null) root = root._parent;
root.freeRecurse(); // free entire graph
GC.free(root);
referenceCounterAllocator.deallocate(counter);
}