Difference between revisions of "DIP62"
(Add NG discussion link) |
|||
Line 10: | Line 10: | ||
|- | |- | ||
|Status: | |Status: | ||
− | | | + | |'''Rejected''' |
|- | |- | ||
|Created: | |Created: |
Revision as of 16:09, 4 August 2014
Title: | Volatile type qualifier for unoptimizable variables in embedded programming |
---|---|
DIP: | 62 |
Version: | 1 |
Status: | Rejected |
Created: | 2014-06-01 |
Last Modified: | 2014-07-09 |
Author: | Johannes Pfau |
Links: | NG discussion |
Contents
- 1 Abstract
- 2 Prelude: Java/C# volatile vs. C/C++ volatile
- 3 Description
- 4 Rationale
- 5 Further reading
- 6 People involved
- 7 Copyright
Abstract
The volatile qualifier is often used for embedded programming in C/C++[1]. At some point shared was meant to replace volatile but as shared is slowly being defined it's becoming obvious that shared can not replace volatile. So D currently doesn't have a similar mechanism which means programming for certain embedded systems has to rely on implementation defined behavior [2] or is currently not possible at all. This DIP provides a clear specification of a volatile type qualifier based on the interpretation of the C/C++ volatile qualifier common in embedded programming.
Prelude: Java/C# volatile vs. C/C++ volatile
It is very important to realize that Java's volatile and C's volatile are completely different concepts [3]. In short: C/C++ volatile was meant to be used in embedded programming with unusual memory like memory mapped registers. Java/C# volatile provides more guarantees and is used for multi-threading (on the other hand these additional guarantees ruin performance for the unusual memory use case! [4]). The equivalent to Java's volatile is shared in D and atomic<T> in c++. This DIP describes a volatile similar to C/C++ volatile, for unusual memory. Volatile and atomics are usually not used at the same time, but in some cases it's necessary to mix volatile and atomics. [5]
Description
A new type qualifier, volatile will be introduced.
Effects of volatile on code generation
Accesses to any variable marked as volatile may not be optimized in any way by the compiler. volatile explicitly only affects compiler optimizations. Instruction scheduling by the CPU, atomic accesses and memory barriers are explicitly not in the scope of volatile, it'll only affect compiler optimizations.
This document will provide detailed rules for code generation, but it's useful to understand the basic idea behind all these detailed rules:
- Accesses to volatile variables must always be executed as written literally in code because these accesses may have arbitrary side effects.
- Again, this only applies for the code generated by the compiler. If the CPU schedules instructions, etc, this is not the business of volatile!
- A compiler must not emit special read/write barriers or atomic instructions trying to enforce the actual execution on the CPU. (This is important! See Rationale)
For example, omitting accesses will omit unknown side effects which may affect program behavior. Reordering accesses may change the evaluation order of these side effects which may change program behavior. A more detailed description and examples are given in the Rationale.
Instruction scheduling / reordering
- Reads or writes of volatile variables may not be reordered with other reads or writes of the same or different volatile variables.
- Reads or writes of normal variables can be reordered with reads or writes of volatile variables.
- Existing rules for normal variables must of course still be obeyed for normal and volatile variables. (Not moving certain reads/writes across function calls etc)
volatile int x;
volatile int y;
void test()
{
x = 0;
y = 1;
}
//==> not allowed
void test()
{
y = 1;
x = 0;
}
volatile int x;
int y;
void test()
{
x = 0;
y = 1;
}
//==> allowed
void test()
{
y = 1;
x = 0;
}
Caching reads
Caching or omitting reads is basically illegal. Every read may have a side effect and these effects must be executed.
volatile int x;
void test()
{
int y;
for(int i = 0; i < 5; i++)
y = x;
}
//==> not allowed
void test()
{
int y = x;
}
//==> allowed
//y is a normal variable. The side-effects of reading x must be always
//executed, but the result can be thrown away in such cases
void test()
{
for(int i = 0; i < 5; i++)
int __tmp = x;
}
Technically speaking there is one case where it is legal to cache the value, but the side effects must be executed nevertheless! In practice this case will be very rare and there's probably no benefit from caching the value, as it'll have to be re-read anyway to execute side effects.
//X is immutable as well! Reading x can have arbitrary side effects, but the value read
//is always the same
volatile immutable int x;
void test()
{
int y = x;
y = x;
y = x;
}
//==> allowed
void test()
{
int y = x;
int __tmp = x; //throw away result
int __tmp = x; //throw away result
}
Merging writes
Merging or omitting writes is illegal, as it changes the side effects.
volatile int x;
void test()
{
for(int i = 0; i < 5; i++)
x = i;
}
//==> not allowed
void test()
{
x = 4;
}
Transitivity
volatile is transitive just like shared or immutable.
Volatile on aggregate types
volatile on an aggregate type also applies to all members, similar to const. volatile also marks the this pointer as volatile, which means that only volatile methods can be called on volatile objects. The rules are:
- volatile methods can be called on volatile or nonvolatile objects
- nonvolatile methods can't be called on volatile objects
- It's possible to overload volatile and nonvolatile methods. The nonvolatile overload should be preferred when calling a function on a nonvolatile object.
Note the analogy to const.
struct Data
{
int x;
volatile int getX()
{
return x;
}
int getX2()
{
return x;
}
}
volatile(Data)* vdata;
Data* data;
void access()
{
vdata.x = 42; //vdata.x is volatile!
vdata.getX(); //valid
vdata.getX2(); //invalid
data.x = 42; //data.x is not volatile!
data.getX(); //valid
data.getX2(); //valid
}
Removing the volatile qualifier
Similar to const the volatile qualifier can be removed if the value of a volatile type without indirection is copied. As the copy no longer refers volatile memory this is safe to do.
volatile int* x;
void readX()
{
int x = *x;
}
Converting to / from volatile types
Everywhere where a volatile type is expected, a non-volatile object can be used as well. Volatile objects can not be used where a non-volatile type is expected.
int readPTR(volatile int* ptr)
{
return *ptr;
}
int readPTR2(int* ptr)
{
return *ptr;
}
volatile int* x;
int* y;
void main()
{
readPTR(x); //valid
readPTR(y); //valid
readPTR2(x); //invalid
readPTR2(y); //valid
}
Optional implementation recommendations
- Compilers should provide a switch to disable the code generation restrictions imposed by volatile. This means that by using a switch like '-foptimize-volatile' volatile is only used as a type qualifier but it does not affect code generation.
- CISC machines: CISC machines may allow other operations to memory than simple load/store operations. The question is then if a read-modify-write operation like 'x++' should translate to 'r = load x; r++; x = store r' or to a single add instruction 'add x;'. This DIP does not dictate any specific behaviour, but this is recommended: If it is known that instructions operating on memory do not work for memory mapped IO as expected, compilers should default to generating a load/modify/store sequence. Otherwise compilers should generate the same code sequence as for regular variables. It is recommended to provide a compiler switch on CISC architectures to allow the user to choose between these two different behaviours.
Rationale
Motivation
Many of D's main features are not applicable for small, embedded systems. Templates will be used sparingly as they cause code bloat. CTFE is often not necessary for simple (feedback) control systems. On the other hand small embedded systems often don't do much more than reading and writing IO ports. This requires well-designed access to volatile memory. If D is worse or more cumbersome than C in this regard it can't compete with C. [6] shows common volatile-related problems in C. We should solve most of them. This DIP and normal D syntax address problem 3, 4, 5. Problem 6 and 8 are covered by shared although documentation must make clear that shared must be used in this case. 1,2,5 are user logic errors which can only be avoided by good documentation. Problems 5, 8 need standardized fence/barrier functions.
volatile
History
At some point shared was meant to replace volatile. It should "imply volatile" [7] and additionally provide further guarantees. However, as explained in "Why volatile and shared can't be merged" this approach has got serious issues. Further changes to shared like disallowing RMW operations [8] make shared even less attractive for embedded programming and as a replacement for volatile. DIP17 [9] already tried to provide a replacement for volatile. Unlike this proposal, DIP17 proposed a volatile statement. This DIP makes some strong points why volatile should be a type qualifier and not a statement, see Why a type qualifier. Some of the points DIP17 raised are now invalid with shared becoming well-defined, but the most important point still holds:
If ever implemented, it will result in memory fences and/or atomic operations, which is **not** what volatile memory operations are about. This will severely affect pipelining and performance in general.
DIP17 has also been discussed recently [10] as part of the preparations for this DIP. For reference: List of newsgroup discussions about volatile [11]
Why is volatile necessary
Volatility is a property of a memory location. For example, if you have a memory mapped register at 0xABCD which represents the current time as a 32bit uint, then every read from that memory will return a different result. The memory at address 0xABCD is volatile, it can change at any time and therefore does not behave like normal memory. The compiler should not optimize reads from this address. All accesses to 0xABCD can't be optimized.
For another example consider 0xABCD to be the address of a memory mapped 8bit GPIO register. In this case voltages on 8 IO pins of a microcontroller are directly controlled by writing to 0xABCD. If bit0 is set the voltage on GPIO-PIN0 is high, if the bit is not set the voltage is low. This time writing to 0xABCD has side effects. If we delete,set,delete bit0 we'll output a low-high-low sequence on the IO pin. But if the compiler treats 0xABCD like normal memory, it could optimize delete,set,delete into a single delete and we only get a low signal on PIN0.
ubyte* GPIO1 = 0xABCD;
void blinkLED()
{
*GPIO1 = 0;
*GPIO1 = 1;
*GPIO1 = 0;
//Output on PIN1: __|‾‾|__
}
//valid optimization without volatile:
//==>
void blinkLED()
{
*GPIO1 = 0;
//Output on PIN1: ________
}
For more examples how volatile is used in embedded programming, see [12].
Why a type qualifier
The example in the previous section should already suggest that volatility is always a property of the memory location, not of the memory access. All accesses to 0xABCD must consider the special accessing rules outlined in this DIP. D used to have volatile statements (deprecated in 2.013 [13]) which only marked a statement as volatile. There were also proposals to add peek / poke like primitives which allow reading/writing to/from a memory location avoiding compiler optimizations. Consider this example:
ubyte* GPIO1 = 0xABCD;
void main()
{
blink(GPIO1); //OK
blink2(GPIO1); //OK
irandom(GPIO1); //wrong!
}
void blink(ubyte* memory)
{
volatile
{
*memory = 0;
*memory = 1;
*memory = 0;
}
}
void blink2(ubyte* memory)
{
poke(memory, 0);
poke(memory, 1);
poke(memory, 0);
}
//Write inverted random value to memory
void irandom(ubyte* memory)
{
*memory = genRand(); //store temporarily to memory. Caller will only see final result anyway
*memory = ~(*memory);
}
As you can see without a type qualifier, we can pass pointers to volatile memory to normal functions. This can lead to serious problems. In this artificial example the IO pins will first be set according to the value from genRand. Then it will be set to the inverted memory. But there's more! We're also reading from 0xABCD in irandom. But we never defined what reading from 0xABCD does. Maybe it does not return the current PIN output status, but some other value (e.g. if the pin is connected). Then calling irandom will produce completely bogus results! Because of this we need to ensure normal functions which were not written considering volatile memory should not be callable with volatile memory parameters. And we need to make sure that all accesses to 0xABCD are respecting the rules for volatile memory. The best way to achieve this is by adding a type qualifier, similar to immutable for read only memory.
Unoptimizable variable vs. (memory) barrier
C's volatile qualifier is often deemed to be useless as it does not provide memory barriers or fences [14]. For example the linux kernel developers say The use of volatile in kernel code is almost never correct [15] and Java/C# added fences to their volatile implementation [16]. There have been discussions about adding visibility semantics to C as well [17]. As Herb Sutter points out Java's and C#'s volatile have different purposes than C's volatile [18]. Linux developers consider volatile to be mostly useless because they use special memory barriers, locks and memory accessor functions instead which provide the same functionality. Nevertheless they admit that volatile is useful in certain areas and they actually use volatile to implement these accessor functions on some architectures [19].
This DIP describes volatile variables as unoptimizable variables. They are not even compiler barriers which means we'll provide even less guarantees than some C implementations [20]. We should provide all kinds of barriers and fences (compiler, memory) as platform-agnostic functions similar to the functions the linux kernel uses [21], but this is not part of this DIP.
So why are volatile variables still useful?
- They provide type-safety by introducing a new type qualifier. It's not possible to pass volatile variables to functions which can't deal with volatile variables
- Many embedded systems don't have out-of-order processors. For these architectures, preventing compiler optimizations is all that is needed. Other architectures provide out-of-order execution but are able to detect accesses to volatile memory by looking at the memory address and explicitly do not reorder these accesses without requiring any additional memory barriers [22]. Marking a variable as volatile is again good enough. Even worse: If the compiler added memory barriers on every access to a volatile variable on ARM this would actually hurt performance, as these barriers are often unnecessary.
- Even on systems where additional memory barriers are necessary, adding these automatically on every access could hurt performance. Consider this example: Here we have to make sure we start the timer after we have configured it. But it does not matter in which order STEP, START_VALUE and ALARM_VALUE are set. A compiler adding barriers automatically on every access to a volatile variable would reduce performance.
volatile(uint)* TIMERA_STEP;
volatile(uint)* TIMERA_START_VALUE;
volatile(uint)* TIMERA_ALARM_VALUE;
volatile(uint)* TIMERA_CONTROL;
void startTimer()
{
*TIMERA_STEP = 10;
*TIMERA_START_VALUE = 100;
*TIMERA_ALARM_VALUE = 1000;
barrier(); //Whatever barrier is necessary
*TIMERA_CONTROL = 1; //start timer
}
Volatile will prevent certain optimizations which may be valid on shared data. For example multiple atomic writes to shared variables can be merged [23], but this is not true for volatile variables.
volatile(uint)* vol;
shared(uint)* sha;
void access()
{
*vol = 0;
*vol = 1;
//--> invalid optimization: vol = 1;
atomicSet(*sha,0);
atomicSet(*sha,1);
//valid optimization --> atomicSet(*sha,1);
}
But the other way round is more problematic: Access to volatile variables needn't be atomic. Although reads/writes to volatile variables on embedded systems are usually atomic as registers are usually word-sized this is not necessary. Volatile variables can span multiple registers and reading/writing might be non-atomic.
volatile(long)* GPIO1;
void access()
{
long a = *GPIO1;
//Even if reading occurs in two steps this is not a problem as 1 bit represents the state of an IO port
//Although first/last may be measured at a different time (e.g. interrupted by an interrupt) the results are still valid
}
Forcing all word-sized accesses through atomicSet would be quite annoying.
Effects of volatile on code generation
Reads or writes of volatile variables may not be reordered with other reads or writes of the same or different volatile variables.
As accessing a volatile variable can cause visible side-effects the order of these side effects must be preserved.
Reads or writes of normal variables can be reordered with reads or writes of volatile variables.
As already explained volatile is not a barrier, not even a compile barrier.
Transitivity
If data is in volatile memory and it refers to other memory (e.g. by using a pointer) it is very likely that the referred memory will also be volatile. The reasoning is basically the same as for shared: if an interrupt routine (or hardware side effects) can access the root object, it can also access all referenced objects. And everything an interrupt routine can access is potentially volatile.
struct Data
{
int x;
}
volatile(Data*) data;
void interrupt()
{
data.x = 42;
}
void func1()
{
while(data.x != 42)
{}
}
Volatile on aggregate types
As volatile essentially marks a memory location it's clear that it has to apply to aggregate members as well: As these members are in the same volatile memory location accesses to them can't be optimized.
As member functions can manipulate data via the this pointer the this pointer needs to be qualified for volatile objects as well. We then follow the same rules used for const/immutable for consistency.
volatile methods can be called on volatile or nonvolatile objects
Calling volatile methods on volatile objects should of course work. The compiler makes sure accesses to the struct data via this respect the rules for volatile data, so there is no problem. Calling volatile on non-volatile objects is also valid. This may prevent some valid optimizations, but it's not a code-correctness problem.
It's possible to overload volatile and nonvolatile methods. The nonvolatile overload should be preferred when calling a function on a nonvolatile object.
Some functions may need to deal with volatile or nonvolatile data. Although the volatile version should work for both cases allowing a special version for non-volatile data can allow for some performance optimizations. As the non-volatile version might provide better performance it should be preferred for non-volatile data.
nonvolatile methods can't be called on volatile objects
As already explained nonvolatile methods can optimize access to their this pointer or member variables which may be invalid for volatile data.
Removing the volatile qualifier
Rationale is already given in 3.4. Such situations occur often in embedded programming when a register value is being read. Register values do usually not have indirection and the resulting value should not impose any restrictions related to volatile.
Converting to / from volatile types
Everywhere where a volatile type is expected, a non-volatile object can be used as well.
The volatile qualifier only imposes more restrictions on variable access, so it can't break accessing normal variables. Although performance might be slightly reduced the code is valid.
Volatile objects can not be used where a non-volatile type is expected.
Accessing volatile objects has to follow special rules which may not be followed by a function expecting normal variables.
Volatile stack variables
Another special case that should be discussed are volatile stack variables. Although they seem mostly useless[24] there's actually one real-world case where qualifying a stack variables as volatile is necessary:
struct Data
{
int x;
}
volatile(int)* data;
void func1()
{
volatile int d;
data = &d; //Escaping the address of a local variable (e.g. to an interrupt)
while(d != 42)
{}
}
Optional implementation recommendations
- On some architectures preventing the backend from doing certain optimizations may not be enough and users will always use memory barriers anyway. Or users might want to restrict accesses to volatile memory to a small part of the code and use compiler barriers explicitly. In such cases there's no need to prevent optimizations on volatile variables, as the user will use other means to notify the compiler to avoid these optimizations. '-foptimize-volatile' then allows to produce slightly more optimized code. The volatile type qualifier is still useful as it makes sure addresses to volatile memory are only passed to functions which explicitly state they can handle volatile memory.
- CISC machines are not the main target for volatile, most small embedded processors are RISC machines. Nevertheless volatile should work well on CISC machines. The suggested rules should be the most intuitive for users and in most cases 'just work'. Providing a switch allows users to explicitly use the desired behaviour if they know what they're doing.
Further reading
References
- ↑ barrgroup.com C Volatile Keyword
- ↑ GDC provides some non-standard extensions to shared
- ↑ Herb Sutter: volatile vs. volatile
- ↑ Hans Boehm: Should volatile Acquire Atomicity and Thread Visibility Semantics? (C)
- ↑ stackoverflow.com: Why is the volatile qualifier used through out std::atomic?
- ↑ regehr.org: Nine ways to break your systems code using volatile (C)
- ↑ Walter Bright: shared should imply volatile (D)
- ↑ github.com: Read-modify-write operations should not be allowed for shared variables (D)
- ↑ wiki.dlang.org: DIP17 (D)
- ↑ forum.dlang.org: GDC discussion leading to this DIP
- ↑ wiki.dlang.org: NG discussions about volatile
- ↑ mcuoneclipse.com: volatile can be harmful (C)
- ↑ dlang.org: volatile deprecation
- ↑ gcc.gnu.org: GCC volatile variables
- ↑ kernel.org: volatile considered harmful
- ↑ albahari.com: The volatile keyword (C#)
- ↑ open-std.org: Should volatile Acquire Atomicity and Thread Visibility Semantics? (C)
- ↑ Herb Sutter: volatile vs. volatile
- ↑ kernel.org: volatile considered harmful
- ↑ stackoverflow.com: volatile qualifier and compiler reorderings
- ↑ kernel.org: memory barriers
- ↑ infocenter.arm.com: ARM Cortex-M Programming Guide to Memory Barrier Instructions Chapter 2, Memory Type and Memory Ordering (C)
- ↑ stackoverflow.com: Combining loads/stores of consecutive atomic variables (C++)
- ↑ stackoverflow.com: Volatile local variables (C)
Barriers, fences
This DIP explicitly doesn't deal with memory or 'compiler' barriers as differences between architectures and different use cases for volatile variables make it impractical to implement memory barriers for volatile variables automatically. Nevertheless readers of this DIP might want to read up on memory/compiler barriers to understand why they are usually not necessary for embedded programming and if they're necessary why a compiler can't automatically insert them in an efficient way for something as low-level as embedded programming. As an appetizer, consider this example:
size_t* TIMER_VALUE = 0xABCD;
size_t* TIMER_MODE = 0xABCE;
size_t* TIMER_CONTROL = 0xABCF;
void startCountdown()
{
//We assume that the CPU can reorder read/write operations as it wants. For simplicity we assume an non-optimizing compiler, i.e. no compiler-reordering.
*TIMER_VALUE = 10;
*TIMER_MODE = TIMER_MODE_COUNTDOWN;
*TIMER_CONTROL |= TIMER_START_BIT;
}
//Now the write to TIMER_CONTROL could occur before the write to TIMER_VALUE. We start a countdown without actually setting the countdown value first...
//So we need to add a barrier before TIMER_CONTROL access. However, note that the TIMER_VALUE/TIMER_MODE order does not matter.
//A compiler can't figure out when barriers are necessary. Adding memory barriers for all (volatile) variable accesses would be very wasteful,
//even on architectures which need memory barriers.
- Introduction to lock free programming
- Memory ordering at compile time
- Memory barriers are like source control operations
- Acquire and Release Semantics
- Weak vs strong memory models
- This is why they call it a weakly ordered cpu
- Atomic vs non-atomic operations
- The happens-before relation
- The synchronizes-with relation
- Acquire and release fences
- Acquire and release fences don't work the way you expect
People involved
Thanks to Iain Buclaw, Mike Franklin, Martin Nowak, Timo Sintonen, and Marc Schütz for early feedback and input for this DIP. This DIP has been discussed on the GDC bugtracker / in this forum post as well: http://forum.dlang.org/post/mailman.1081.1400818840.2907.d.gnu@puremagic.com
Copyright
This document has been placed in the Public Domain.