https://wiki.dlang.org/api.php?action=feedcontributions&user=Berni44&feedformat=atomD Wiki - User contributions [en]2022-11-26T23:07:52ZUser contributionsMediaWiki 1.31.2https://wiki.dlang.org/?title=User:Berni44/RealProblems&diff=9970User:Berni44/RealProblems2021-03-08T16:46:05Z<p>Berni44: Created page with "==Real Problems== Most people agree, that floating point numbers are weird. At least somewhat. D also features these numbers. It has got three floating point types: <code>fl..."</p>
<hr />
<div>==Real Problems==<br />
<br />
Most people agree, that floating point numbers are weird. At least somewhat.<br />
<br />
D also features these numbers. It has got three floating point types: <code>float</code> (32 bit) , <code>double</code> (64 bit) and <code>real</code> (largest floating point size available). While the first two are well known, <code>real</code> may differ on different computers (and maybe even with different compilers, but I don't know that for sure).<br />
<br />
This in itself is not a problem. If your code needs to produce identical results on different computers, you just stick to <code>double</code> or <code>float</code>. If the code only targets one computer, you may use <code>real</code> for greater precision.<br />
<br />
Unfortunately D has got another feature...</div>Berni44https://wiki.dlang.org/?title=Guidelines_for_maintainers&diff=9967Guidelines for maintainers2021-02-24T14:20:09Z<p>Berni44: Warn about [SQUASH]</p>
<hr />
<div>This document is intended to be a guide maintainer of <code>dlang</code> repositories on GitHub. However, it might contain interesting details for contributors who are interested in knowing more about the inner workings of the D GitHub processes.<br />
<br />
== Quick checklist ==<br />
<br />
* Do all CI pass?<br />
* Are there any changes on the documentation or coverage output?<br />
* Is it a regression fix? (check: it should be based ontro <code>stable</code>)<br />
* Is it a non-trivial change? (check: either linkage with Bugzilla or a changelog entry)<br />
* Is it a new symbols? (check: has it been pre-approved by Andrei?)<br />
<br />
This is an initial list. Please help to extend it based on your own experiences.<br />
<br />
== Review workflow (squashed commits & write access to PRs) ==<br />
<br />
The ideal workflow is that a PR gets commits '''appended''' until its final approval, s.t. you only need to review the added changes.<br />
<br />
=== General ideas ===<br />
<br />
* PRs should only contain a small set of changes<br />
* Contributors can prefix their appended commit messages with e.g. "[SQUASH]" (Please not, that this will squash all commits of a PR not only the ones marked. Additionally it is not a squash, but a fixup, which means, that commit messages get lost.)<br />
<br />
GitHub has two features to help us here:<br />
<br />
=== Commit squashing ===<br />
<br />
* All commits get squashed into one commit before the merge<br />
* This is enabled for all DLang repos<br />
* "auto-merge-squash" does squashing as auto-merge behavior<br />
<br />
For more infos please see the [https://github.com/blog/2141-squash-your-commits official article].<br />
<br />
=== Write access to PRs ===<br />
<br />
* This is an '''awesome''' feature that hasn't been used much so far<br />
* It allows maintainer to do those nitpicks themselves (squashing all commits, fixing typos, ...) instead of going with the usual ping-pong cycle<br />
* It's enabled by default for new PRs<br />
* If someone turned it accidentally off, it's really okay to ask him/her as this is a massive time saver<br />
<br />
<code>git</code> allows to define alliases in your <code>~/.gitconfig</code>:<br />
<br />
<pre><br />
[alias]<br />
pr = "!f() { git fetch -fu ${2:-upstream} refs/pull/$1/head:pr/$1 && git checkout pr/$1; }; f"<br />
</pre><br />
<br />
With this <code>git alias</code> you can checkout any PR:<br />
<br />
<pre><br />
> git pr 5150<br />
</pre><br />
<br />
In case you don't want to enter the branch to push to, you can use this small snippet:<br />
<br />
<syntaxhighlight lang='bash'><br />
#!/bin/bash<br />
tmpfile=$(mktemp)<br />
repoSlug=$(git remote -v | grep '^upstream' | head -n1 | perl -lne 's/github.com:?\/?(.*)\/([^.]*)([.]git| )// or next; print $1,"/",$2')<br />
prNumber=$(git rev-parse --abbrev-ref HEAD | cut -d/ -f 2)<br />
curl -s https://api.github.com/repos/${repoSlug}/pulls/${prNumber} > $tmpfile<br />
trap "{ rm -f $tmpfile; }" EXIT<br />
headRef=$(cat $tmpfile | jq -r '.head.ref')<br />
headSlug=$(cat $tmpfile | jq -r '.head.repo.full_name')<br />
git push -f git@github.com:${headSlug} HEAD:${headRef}<br />
</syntaxhighlight><br />
<br />
For more details on pushing changes to a PR, please see [https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork the official article].<br />
<br />
== Auto-Merge ==<br />
<br />
There are "auto-merge" and "auto-merge-squash" labels which are based on an auto-merge system that takes the status of all required CIs into account (the auto-tester tries the merge after its test passed, but doesn't look out for other CIs.)<br />
You can toggle a PR for auto-merge by simply adding this label or for the keyboard-enthusiasts: press "l", press "a" and hit enter.<br />
<br />
Warning: this new auto-merge system is officially still WIP because:<br />
- "auto-merge"-labelled PRs [https://github.com/dlang-bots/dlang-bot/pull/50 aren't yet prioritized on the auto-tester]<br />
- We would like to set _all_ CIs to enforced<br />
<br />
For more details, please see [https://github.com/dlang-bots/dlang-bot#auto-merge-wip its documentation].<br />
<br />
== CI ==<br />
<br />
* We are working on making the CIs more reliable. If you see a transient error, please let us know!<br />
* In any other case, the "red cross" on a CI has a meaning - surprise!<br />
<br />
Here's a small summary of the CIs and their tasks:<br />
<br />
=== auto-tester ===<br />
<br />
* runs autotester that automatically compiles and runs the dmd, druntime, and phobos unittests on all officially-supported platforms<br />
* if there are spordiadic failures, they can be deprecated by clicking on the "Deprecate" button of a PR<br />
* new contributors need to be approved (this can be done on the [https://auto-tester.puremagic.com main page] of the auto-tester)<br />
* Maintainer: [https://github.com/braddr @braddr]<br />
* [https://github.com/braddr/d-tester/issues Support repo]<br />
<br />
=== DAutoTest ===<br />
<br />
It builds the entire documentation and allows to preview changes.<br />
In particular<br />
* <code>/library-prerelease</code> is the DDox built on the documentation (one file per function)<br />
* <code>/phobos-prerelease</code> is the DDoc built of the documentation (one file per module)<br />
* for PRs to <code>master</code>: only <code>-prerelease</code> should appear in the list of changes (the documentation on <code>/library</code> or <code>/phobos</code> is built from the <code>stable</code> branch)<br />
* Maintainer: [https://github.com/CyberShadow @CyberShadow]<br />
<br />
<br />
Tip: install [https://chrome.google.com/webstore/detail/git-patch-viewer/hkoggakcdopbgnaeeidcmopfekipkleg Git Patch Viewer] and set it to automatically recognize DAutoTest's difs:<br />
<br />
<syntaxhighlight lang=bash><br />
dtest.dlang.io/diff/.*<br />
</syntaxhighlight><br />
<br />
[[File:DAutoTest Git Patch Viewer.png|1200px]]<br />
<br />
=== ProjectTester ===<br />
<br />
* A couple of selected projects are run to ensure that no regressions are introduced<br />
* Maintainers: [https://github.com/MartinNowak @MartinNowak], [https://github.com/Dicebot Dicebot]<br />
<br />
=== CircleCi ===<br />
<br />
* Tests are run with code coverage enabled<br />
* Phobos only: Over the last weeks, we have tried to unify Phobos to share a more common coding style. The CI is in place to ensure this status quo for the future<br />
* Phobos only: All unittest blocks are separated from the main file and compiled separately. This helps to ensure that the examples on dlang.org that are runnable and don't miss any imports.<br />
* Phobos only: runs [https://github.com/Hackerpilot/dscanner Dscanner]<br />
* It is [https://github.com/Dicebot/dlangci/issues/18 planned] to move these checks to the ProjectTester<br />
<br />
It should be fairly trivial to find out the regarding error here. Just click on the "CircleCi" link and open the red tab that is marked as failing and scroll down to the error message. On a posix system the CircleCi tests can also be executed locally with the <code>style</code> target:<br />
<br />
<pre><br />
make -f posix.mak style<br />
</pre><br />
<br />
== Code coverage ==<br />
<br />
If you are too lazy to click on the annotated coverage link, you can install the [https://github.com/codecov/browser-extension browser extension] which will enrich the PR with code coverage information.<br />
<br />
Warning: unfortunately `codecov/project` is often showing "random" movement. Please see [https://github.com/dlang/phobos/pull/5202 #5202] for more infos on this.<br />
<br />
== Phantom Zone ==<br />
<br />
[http://forum.dlang.org/post/ouuutodvhmnghzbeoqen@forum.dlang.org Phantom Zone] is a state assigned to PRs that have value, but would require too much effort from the maintainers to revive without providing enough benefit.<br />
When sending a PR to the "Phantom Zone", please consider the following:<br />
<br />
# Before sending a PR to the Phantom Zone make an effort to invest in the contribution and the contributor.<br />
# Justify why the PR is being placed in the Phantom Zone<br />
# Explain to the contributor how the PR can get out of the Phantom Zone<br />
<br />
== Saved replies ==<br />
<br />
GitHub allows to [https://github.com/blog/2135-saved-replies save replies]. You can set them in [https://github.com/settings/replies your GitHub settings]<br />
Here are a couple of commonly used replies:<br />
<br />
=== Missing changelog ===<br />
<br />
<pre><br />
This still lacks a changelog entry. Please see [the changelog folder](../tree/master/changelog) for instructions.<br />
Hence, I added the "pending changelog" label.<br />
</pre><br />
<br />
=== Missing spec PR ===<br />
<br />
<pre><br />
This still needs a PR to the [specification](https://github.com/dlang/dlang.org/tree/master/spec) at [dlang.org](https://github.com/dlang/dlang.org). Hence, I added the label "missing spec PR".<br />
<br />
Please refer to the [dlang.org CONTRIBUTING guide](https://github.com/dlang/dlang.org/blob/master/CONTRIBUTING.md) for instructions to build dlang.org locally. If you use Windows, don't worry, you can do your changes "blindly" and preview them at DAutoTest.<br />
</pre><br />
<br />
=== Phantom Zone ===<br />
<br />
<pre><br />
This PR entered the Phantom Zone<br />
-----------------------------------------------<br />
<br />
<br />
This PR has entered the [Phantom Zone](http://forum.dlang.org/post/ouuutodvhmnghzbeoqen@forum.dlang.org) as it still needs to have the reviewers' concerns addressed and rebased.<br />
<br />
Reason for entering the Phantom Zone<br />
----------------------------------------------------<br />
<br />
This PR is nice, and normally I would revive such a PR if the author was no longer active. I would also revive it if it were an important bug fix or something of higher priority. This PR, however, is just a refactoring, so I'm going to put it in the [Phantom Zone](http://forum.dlang.org/post/ouuutodvhmnghzbeoqen@forum.dlang.org) and close it for now.<br />
<br />
How do I get this PR out of the Phantom Zone<br />
-------------------------------------------------------------<br />
<br />
Easy: Address the comments -> open a new PR (mention this one + short summary in the description).<br />
</pre><br />
<br />
== Milestones ==<br />
<br />
* They are intended to show which PRs are basically ready to be shipped OR should be shipped soon<br />
* Please use them whenever you see nothing blocking a PR (except for a final merge decision)<br />
* Ideally about one week before the close of the merge window (i.e. the end of the milestone), the focus on the remaining items of the current milestone should be increased<br />
<br />
So far it worked well in the past and present:<br />
<br />
* [https://github.com/dlang/phobos/milestone/9?closed=1 2.072]<br />
* [https://github.com/dlang/phobos/milestone/11?closed=1 2.073]<br />
* [https://github.com/dlang/phobos/milestone/12 2.074]<br />
<br />
See also the [https://wiki.dlang.org/DIP75 Release process].<br />
<br />
== Dlang-Bot ==<br />
<br />
The friendly [https://github.com/dlang-bots/dlang-bot Dlang-Bot] is trying to automate boring tasks.<br />
<br />
What it does atm:<br />
* Shows whether a PR will be part of the changelog ('X' means NO, '✔' means YES)<br />
* Auto-merges a PR once all required CI pass<br />
* Closes a Bugzilla issue if the respective PR has been merged<br />
* Moves a Trello card if the respective PR has been merged<br />
* Cancels stale Travis builds (this helps to free the Travis queue at dlang/dmd)<br />
<br />
What is planned for the future:<br />
* Automatically remove "needs work" / "needs rebase" on a push event<br />
* Recognize common labels in the title (e.g. "[WIP]")<br />
* Automatically tag inactive and unmergable PRs<br />
* Add a "needs review" label to unreviewed PRs with passing CIs<br />
* Show auto-detectable warnings (e.g. regression PR that isn't targeted at 'stable')<br />
* [https://github.com/dlang-bots/dlang-bot/issues <add your issue to the wish list>]<br />
<br />
Please see also [https://github.com/dlang-bots/dlang-bot its documentation for a more up-to-date list].<br />
<br />
== Changelog ==<br />
<br />
There's a changelog folder for all three repos (DMD, Druntime, Phobos).<br />
<br />
* The key idea is to have a file per changelog entry, which has the advantage that a changelog can be added to a PR ''without'' creating merge conflicts.<br />
* Take advantage of this feature and don't merge a PR without a changelog entry ;-)<br />
* The separation in the DMD repo between "compiler changes" and "language changes" has been moved, s.t. the "changelog" repo at DMD just contains compiler changes and the [https://github.com/dlang/dlang.org/tree/master/language-changelog changelog at dlang.org] is supposed to contain language changes of the upcoming release.<br />
<br />
<br />
== Use the formal "Approve" button ==<br />
<br />
* GH formalized the review process<br />
* Please use a the "approve" feature instead of LGTM comment<br />
<br />
This is important, because all PRs require an approval!<br />
So please approve before you auto-merge (it won't work otherwise).<br />
<br />
In the same way GH also allows to attach a "request for changes" on a PR.<br />
If you have a serious remark, please use this "request a change" feature instead of a plain comment as these "request" will be nicely shown as warning at the end of a PR (GH will even block the merge of a PR until the criticized bits are fixed). Moreover "changes requested" will also be shown on the summary of all PRs and helps others when they browse the PR list.<br />
<br />
== New name additions to Phobos ==<br />
<br />
* All new symbols to Phobos should be pre-approved by Andrei (<code>@andralex</code>). Please request an review from him.<br />
* As Andrei gets quite a load of emails, you should email him directly.<br />
<br />
== See Also ==<br />
<br />
* [[Starting as a Contributor]]<br />
* [[Contributing to Phobos]]<br />
* [[Contributing to dlang.org]]<br />
* [[Get involved]]<br />
<br />
[[Category: Contribution Guidelines]]</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9961User:Berni44/Floatingpoint2021-02-15T09:59:04Z<p>Berni44: /* Back to our example from the beginning */</p>
<hr />
<div>= An introduction to floating point numbers =<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
If we want to understand this strange behavior we have to have a deeper look into floating point numbers. Doing so involves understanding the bit representation of the numbers involved. Unfortunately, floats have already 32 bits (the code for <code>a</code> above is <code>01000100011110100000000000000000</code>) and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary representation of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code>, which is a binary number. As decimal number it is <code>1.25</code>. (I assume you are familiar with the conversion from binary numbers to decimal numbers. If not, take a look at the [https://en.wikipedia.org/wiki/Binary_number#Conversion_to_and_from_other_numeral_systems article in wikipedia])<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line &mdash; all numbers, a nano float can hold, are marked by a vertical stroke:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa (instead of the usuas 1 bit) and the exponent is increased by one (to get a uniform distribution of the numbers in the vicinity of 0, see diagrams below). So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat, with the benefit of having infinity and NaN (which obviously cannot be displayed on that numberline).<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Back to our example from the beginning ==<br />
<br />
Now, that we know how floating point numbers look like, we can go back to the example given at the beginning of this article. We start with a slightly changed version of the example, using nano floats this time. 1000 cannot be represented as a nano float. It would be infinity. But with 1.75 and nano floats we can have a similar effect, like the 1000 with floats:<br />
<br />
<syntaxhighlight lang=D><br />
nanofloat a = 1.75;<br />
nanofloat b = 1/a;<br />
nanofloat c = 1/a;<br />
</syntaxhighlight><br />
<br />
This time we can use our tables from the exercises to look up the results:<br />
<br />
<code>1/1.75 = 0.57142857...</code>, which cannot exactly be coded with a nanofloat. We have to choose between <code>0.5</code> and <code>0.625</code> as an approximation. Normally, floating point units are supposed to round to the nearest possibility in such cases. (There are other rounding modes, but I do not want to go deeper into this at this place.) Here <code>0.625</code> is closer to <code>0.57142857...</code> than <code>0.5</code>; that is, we finally arrive at <code>b = 0.625</code>. <br />
<br />
<code>1/0.625 = 1.6</code>. Again a value, that cannot exactly be coded as nanofloat. We've got <code>1.5</code> and <code>1.75</code> as a choice. <code>1.5</code> is closer to <code>1.6</code> than <code>1.75</code> and therefore <code>c = 1.5</code>. We end up with <code>1.75 == a != c == 1.5</code>.<br />
<br />
For the sake of understanding the real floating point numbers, we repeat this with 1000 and float: The exponents of floats have 8 bits (and therefore a bias of 127) and a mantissa of 23 bits (plus one implied bit).<br />
<br />
1000 can be exactly represented with a float: <code>a = 0 10001000 11110100000000000000000</code>. The exponent is 136, reduced by the bias we arrive at 9. The mantissa is <code>1.111101</code>, which is 1.953125 in decimal notation. And <code>1.953125 * 2^^9 = 1000</code>.<br />
<br />
The reciprocal of 1000 is 0.001. This is a number, that cannot be represented exactly with a float. The best approximation is <code>b = 0 01110101 00000110001001001101111</code>, which is <code>0.001000000047497451305389404296875</code> in decimal notation.<br />
<br />
The reciprocal of that last number is <code>999.999952502550950618369056343...</code>, which cannot be represented with a final number of fractional digits in decimal notation, nor can it be represented exactly with a float. This time <code>c = 0 10001000 11110011111111111111111</code> is the best approximation, which is <code>999.99993896484375</code> in decimal notation.<br />
<br />
Now it should be clear, why the program above answers the question, whether <code>a</code> and <code>c</code> are the same with "No!": They are not equal.<br />
<br />
== Printing floating point numbers ==<br />
<br />
But there is still one question open. When <code>a</code> and <code>c</code> are two different numbers, why does the program print <code>1000 == 1000</code>. Shouldn't it print two different numbers?<br />
<br />
To find an answer to this question, we have to peek into the inner workings of <code>writeln</code>. In <code>std.stdio</code> we find the following code for writing floats:<br />
<br />
<syntaxhighlight lang=D><br />
import std.format : formattedWrite;<br />
<br />
// Most general case<br />
formattedWrite(w, "%s", arg);<br />
</syntaxhighlight><br />
<br />
That means, <code>writeln</code> just forwards the work to <code>formattedWrite</code> in <code>std.format</code> with the format specifier <code>%s</code>. Peeking into <code>std.format</code> we can see, that <code>%s</code> is just treated like <code>%g</code> and the final formatting is done by a call to a function from C's standard library: <code>snprintf</code>.<br />
<br />
Looking into the description of this function, we find out, that <code>%g</code> prints the shorter of <code>%e</code> and <code>%f</code>. It turns out, that this is not the complete truth &mdash; there are several additional changes done. One of them is, that the precision does not denote the number of fractional digits (like it does for <code>%e</code> and <code>%f</code>, but the number of significant digits. The default precision is 6, which means, that <code>writeln(c)</code> only prints the first six 9s of <code>999.99993896484375</code>, which is rounded to <code>1000</code>.<br />
<br />
This is a clear design flaw of <code>snprintf</code>, that D has inherited. In my opinion the default (at least for <code>%s</code> if one wants to keep the behavior of <code>%g</code>) should be to print as many digits, as necessary to distinguish this number from all other numbers that can be represented by a float. It would cause less trouble: The program above would print:<br />
<br />
<pre>Is 1000 == 999.99994? No!</pre><br />
<br />
And everything would be fine.<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9960User:Berni44/Floatingpoint2021-02-15T09:30:07Z<p>Berni44: /* Back to our example from the beginning */</p>
<hr />
<div>= An introduction to floating point numbers =<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
If we want to understand this strange behavior we have to have a deeper look into floating point numbers. Doing so involves understanding the bit representation of the numbers involved. Unfortunately, floats have already 32 bits (the code for <code>a</code> above is <code>01000100011110100000000000000000</code>) and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary representation of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code>, which is a binary number. As decimal number it is <code>1.25</code>. (I assume you are familiar with the conversion from binary numbers to decimal numbers. If not, take a look at the [https://en.wikipedia.org/wiki/Binary_number#Conversion_to_and_from_other_numeral_systems article in wikipedia])<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line &mdash; all numbers, a nano float can hold, are marked by a vertical stroke:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa (instead of the usuas 1 bit) and the exponent is increased by one (to get a uniform distribution of the numbers in the vicinity of 0, see diagrams below). So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat, with the benefit of having infinity and NaN (which obviously cannot be displayed on that numberline).<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Back to our example from the beginning ==<br />
<br />
Now, that we know how floating point numbers look like, we can go back to the example given at the beginning of this article. We start with a slightly changed version of the example, using nano floats this time. 1000 cannot be represented as a nano float. It would be infinity. But with 1.75 and nano floats we can have a similar effect, like the 1000 with floats:<br />
<br />
<syntaxhighlight lang=D><br />
nanofloat a = 1.75;<br />
nanofloat b = 1/a;<br />
nanofloat c = 1/a;<br />
</syntaxhighlight><br />
<br />
This time we can use our tables from the exercises to look up the results:<br />
<br />
<code>1/1.75 = 0.57142857...</code>, which cannot exactly be coded with a nanofloat. We have to choose between <code>0.5</code> and <code>0.625</code> as an approximation. Normally, floating point units are supposed to round to the nearest possibility in such cases. (There are other rounding modes, but I do not want to go deeper into this at this place.) Here <code>0.625</code> is closer to <code>0.57142857...</code> than <code>0.5</code>; that is, we finally arrive at <code>b = 0.625</code>. <br />
<br />
<code>1/0.625 = 1.6</code>. Again a value, that cannot exactly be coded as nanofloat. We've got <code>1.5</code> and <code>1.75</code> as a choice. <code>1.5</code> is closer to <code>1.6</code> than <code>1.75</code> and therefore <code>c = 1.5</code>. We end up with <code>1.75 == a != c == 1.5</code>.<br />
<br />
For the sake of understanding the real floating point numbers, we repeat this with 1000 and float: The exponents of floats have 8 bits (and therefore a bias of 127) and a mantissa of 23 bits (plus one implied bit).<br />
<br />
1000 can be exactly represented with a float: <code>a = 0 10001000 11110100000000000000000</code>. The exponent is 136, reduced by the bias we arrive at 9. The mantissa is <code>1.111101</code>, which is 1.953125 in decimal notation. And <code>1.953125 * 2^^9 = 1000</code>.<br />
<br />
The reciprocal of 1000 is 0.001. This is a number, that cannot be represented exactly with a float. The best approximation is <code>b = 0 01110101 00000110001001001101111</code>, which is <code>0.001000000047497451305389404296875</code> in decimal notation.<br />
<br />
The reciprocal of that last number is <code>999.999952502550950618369056343...</code>, which cannot be represented with a final number of fractional digits in decimal notation, nor can it be represented exactly with a float. This time <code>c = 0 10001000 11110011111111111111111</code> is the best approximation, which is <code>999.99993896484375</code> in decimal notation.<br />
<br />
Now it should be clear, why the program above answers the question, whether <code>a</code> and <code>c</code> are the same with "No!": They are not equal.<br />
<br />
<br />
... to be continued<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9959User:Berni44/Floatingpoint2021-02-15T09:27:39Z<p>Berni44: </p>
<hr />
<div>= An introduction to floating point numbers =<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
If we want to understand this strange behavior we have to have a deeper look into floating point numbers. Doing so involves understanding the bit representation of the numbers involved. Unfortunately, floats have already 32 bits (the code for <code>a</code> above is <code>01000100011110100000000000000000</code>) and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary representation of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code>, which is a binary number. As decimal number it is <code>1.25</code>. (I assume you are familiar with the conversion from binary numbers to decimal numbers. If not, take a look at the [https://en.wikipedia.org/wiki/Binary_number#Conversion_to_and_from_other_numeral_systems article in wikipedia])<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line &mdash; all numbers, a nano float can hold, are marked by a vertical stroke:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa (instead of the usuas 1 bit) and the exponent is increased by one (to get a uniform distribution of the numbers in the vicinity of 0, see diagrams below). So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat, with the benefit of having infinity and NaN (which obviously cannot be displayed on that numberline).<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Back to our example from the beginning ==<br />
<br />
Now, that we know how floating point numbers look like, we can go back to the example given at the beginning of this article. We start with a slightly changed version of the example, using nano floats this time. 1000 cannot be represented as a nano float. It would be infinity. But with 1.75 and nano floats we can have a similar effect, like the 1000 with floats:<br />
<br />
<syntaxhighlight lang=D><br />
nanofloat a = 1.75;<br />
nanofloat b = 1/a;<br />
nanofloat c = 1/a;<br />
</syntaxhighlight><br />
<br />
This time we can use our tables from the exercises to look up the results:<br />
<br />
<code>1/1.75 = 0.57142857...</code>, which cannot exactly be coded with a nanofloat. We have to choose between <code>0.5</code> and <code>0.625</code> as an approximation. Normally, floating point units are supposed to round to the nearest possibility in such cases. (There are other rounding modes, but I do not want to go deeper into this at this place.) Here <code>0.625</code> is closer to <code>0.57142857...</code> than <code>0.5</code>; that is, we finally arrive at <code>b = 0.625</code>. <br />
<br />
<code>1/0.625 = 1.6</code>. Again a value, that cannot exactly be coded as nanofloat. We've got <code>1.5</code> and <code>1.75</code> as a choice. <code>1.5</code> is closer to <code>1.6</code> than <code>1.75</code> and therefore <code>c = 1.5</code>. We end up with <code>1.75 == a != c == 1.5</code>.<br />
<br />
For the sake of understanding the real floating point numbers, we repeat this with 1000 and float: The exponents of floats have 8 bits (and therefore a bias of 127) and a mantissa of 23 bits (plus one implied bit).<br />
<br />
1000 can be exactly represented with a float: <code>a = 0 10001000 11110100000000000000000</code>. The exponent is 136, reduced by the bias we arrive at 9. The mantissa is <code>1.111101</code>, which is 1.953125 in decimal notation. And <code>1.953125 * 2^^9 = 1000</code>.<br />
<br />
The reciprocal of 1000 is 0.001. This is a number, that cannot be represented exactly with a float. The best approximation is <code>b = 0 01110101 00000110001001001101111</code>, which is <code>0.001000000047497451305389404296875</code> in decimal notation.<br />
<br />
The reciprocal of that last number is <code>999.999952502550950618369056343...</code>, which cannot be represented with a final number of fractional digits in decimal notation, nor can it be represented exactly with a float. This time <code>c = 0 10001000 11110011111111111111111</code> is the best approximation, which is <code>999.99993896484375</code> in decimal notation.<br />
<br />
No, it should be clear, why the program above answers the question, whether <code>a</code> and <code>c</code> are the same with "No!": They are not equal.<br />
<br />
<br />
... to be continued<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9958User:Berni44/Floatingpoint2021-02-14T13:26:09Z<p>Berni44: /* Back to our example from the beginning */</p>
<hr />
<div>= An introduction to floating point numbers =<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa and the exponent is increased by one. So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat.<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Back to our example from the beginning ==<br />
<br />
Now we can track down the problems in the example at the beginning somewhat. Nanofloats cannot show 1000. But with 1.75, which can be displayed as a nanofloat, we have a similar calculation:<br />
<br />
<syntaxhighlight lang=D><br />
nanofloat a = 1.75;<br />
nanofloat b = 1/a;<br />
nanofloat c = 1/a;<br />
</syntaxhighlight><br />
<br />
This time we can use our tables from the exercises to look up the results:<br />
<br />
<code>1/1.75 = 0.57142857...</code>, which cannot exactly be coded with a nanofloat. We have to choose between <code>0.5</code> and <code>0.625</code>. Normally, floating point units are supposed to round to the nearest possibility in such cases. In this case <code>0.625</code> is closer to <code>0.57142857...</code> than <code>0.5</code>; that is, <code>b = 0.625</code>. <br />
<br />
<code>1/0.625 = 1.6</code>. Again a value, that cannot exactly be coded as nanofloat. We've got <code>1.5</code> and <code>1.75</code> as a choice. <code>1.5</code> is closer to <code>1.6</code> than <code>1.75</code> and therefore <code>c = 1.5</code>. We end up with <code>a != c</code>.<br />
<br />
But what would<br />
<syntaxhighlight lang=D><br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
</syntaxhightlight><br />
produce here?<br />
<br />
Well, it would be<br />
<br />
<pre>Is 1.75 == 1.5? No!</pre><br />
<br />
Now you may ask, why the example with the floats produced <code>1000 == 1000</code> and did not display two different values. <br />
<br />
The answer to this question can be found, when looking into the implementation of <code>writeln</code>: You'll find a call to <code>formattedWrite</code> which can be found in <code>std.format</code>, with a format specifier of <code>%s</code>. For floating point numbers, <code>%s</code> is treated identical to <code>%g</code>, which has some design flaws. In our case, <code>%g</code> rounds too eagerly: The correct value of <code>c</code> is <code>999.99993896484375</code>. But the default output of <code>%g</code> is limited to 6 significant digits, which means, for the sake of writing the number, it is rounded to <code>1000</code>.<br />
<br />
... to be continued<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9957User:Berni44/Floatingpoint2021-02-14T13:24:08Z<p>Berni44: </p>
<hr />
<div>= An introduction to floating point numbers =<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa and the exponent is increased by one. So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat.<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Back to our example from the beginning ==<br />
<br />
Now we can track down the problems in the example at the beginning somewhat. Nanofloats cannot show 1000. But with 1.75, which can be displayed as a nanofloat, we have a similar calculation:<br />
<br />
<syntaxhighlight lang=D><br />
nanofloat a = 1.75;<br />
nanofloat b = 1/a;<br />
nanofloat c = 1/a;<br />
</syntaxhighlight><br />
<br />
This time we can use our tables from the exercises to look up the results:<br />
<br />
<code>1/1.75 = 0.57142857...</code>, which cannot exactly be coded with a nanofloat. We have to choose between <code>0.5</code> and <code>0.625</code>. Normally, floating point units are supposed to round to the nearest possibility in such cases. In this case <code>0.625</code> is closer to <code>0.57142857...</code> than <code>0.5</code>; that is, <code>b = 0.625</code>. <br />
<br />
<code>1/0.625 = 1.6</code>. Again a value, that cannot exactly be coded as nanofloat. We've got <code>1.5</code> and <code>1.75</code> as a choice. <code>1.5</code> is closer to <code>1.6</code> than <code>1.75</code> and therefore <code>c = 1.5</code>. We end up with <code>a != c</code>.<br />
<br />
But what would<br />
<syntaxhighlight lang=D><br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
</syntaxhightlight><br />
produce here?<br />
<br />
Well, it would be<br />
<br />
<pre>Is 1.75 == 1.5? No!</pre><br />
<br />
Now you may ask, why the example with the floats produced <code>1000 == 1000</code> and did not display two different values. <br />
<br />
The answer to this question can be found, when looking into the implementation of <code>writeln</code>: You'll find a call to <code>formattedWrite</code> which can be found in <code>std.format</code>, with a format specifier of <code>%s</code>. For floating point numbers, <code>%s</code> is treated identical to <code>%g</code>, which has some design flaws. In our case, <code>%g</code> rounds too eagerly: The correct value of <code>c</code> is <code>999.99993896484375</code>. But <code>%g</code> is limited to 6 significant digits, which means, for the sake of writing the number, it is rounded to 1000.<br />
<br />
... to be continued<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9956User:Berni44/Floatingpoint2021-02-14T12:57:09Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa and the exponent is increased by one. So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article. And yes, there are also the minus versions of all of these NaNs.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat.<br />
<br />
... to be continued<br />
<br />
=== Exercise ===<br />
<br />
''Exercise 2: Add a third column to the table from exercise 1 and write the special values in that column.''<br />
<br />
== Solutions ==<br />
<br />
''Exercise 1 and 2:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
!Special value<br />
|-<br />
|0 000 00||0.125||0.0<br />
|-<br />
|0 000 01||0.15625||0.0625<br />
|-<br />
|0 000 10||0.1875||0.125<br />
|-<br />
|0 000 11||0.21875||0.1875<br />
|-<br />
|0 001 00||0.25||<br />
|-<br />
|0 001 01||0.3125||<br />
|-<br />
|0 001 10||0.375||<br />
|-<br />
|0 001 11||0.4375||<br />
|-<br />
|0 010 00||0.5||<br />
|-<br />
|0 010 01||0.625||<br />
|-<br />
|0 010 10||0.75||<br />
|-<br />
|0 010 11||0.875||<br />
|-<br />
|0 011 00||1||<br />
|-<br />
|0 011 01||1.25||<br />
|-<br />
|0 011 10||1.5||<br />
|-<br />
|0 011 11||1.75||<br />
|-<br />
|0 100 00||2||<br />
|-<br />
|0 100 01||2.5||<br />
|-<br />
|0 100 10||3||<br />
|-<br />
|0 100 11||3.5||<br />
|-<br />
|0 101 00||4||<br />
|-<br />
|0 101 01||5||<br />
|-<br />
|0 101 10||6||<br />
|-<br />
|0 101 11||7||<br />
|-<br />
|0 110 00||8||<br />
|-<br />
|0 110 01||10||<br />
|-<br />
|0 110 10||12||<br />
|-<br />
|0 110 11||14||<br />
|-<br />
|0 111 00||16||infinity<br />
|-<br />
|0 111 01||20||special NaN<br />
|-<br />
|0 111 10||24||special NaN<br />
|-<br />
|0 111 11||28||NaN<br />
|-<br />
|1 000 00||&minus;0.125||&minus;0,0<br />
|-<br />
|1 000 01||&minus;0.15625||&minus;0.0625<br />
|-<br />
|1 000 10||&minus;0.1875||&minus;0.125<br />
|-<br />
|1 000 11||&minus;0.21875||&minus;0.1875<br />
|-<br />
|1 001 00||&minus;0.25||<br />
|-<br />
|1 001 01||&minus;0.3125||<br />
|-<br />
|1 001 10||&minus;0.375||<br />
|-<br />
|1 001 11||&minus;0.4375||<br />
|-<br />
|1 010 00||&minus;0.5||<br />
|-<br />
|1 010 01||&minus;0.625||<br />
|-<br />
|1 010 10||&minus;0.75||<br />
|-<br />
|1 010 11||&minus;0.875||<br />
|-<br />
|1 011 00||&minus;1||<br />
|-<br />
|1 011 01||&minus;1.25||<br />
|-<br />
|1 011 10||&minus;1.5||<br />
|-<br />
|1 011 11||&minus;1.75||<br />
|-<br />
|1 100 00||&minus;2||<br />
|-<br />
|1 100 01||&minus;2.5||<br />
|-<br />
|1 100 10||&minus;3||<br />
|-<br />
|1 100 11||&minus;3.5||<br />
|-<br />
|1 101 00||&minus;4||<br />
|-<br />
|1 101 01||&minus;5||<br />
|-<br />
|1 101 10||&minus;6||<br />
|-<br />
|1 101 11||&minus;7||<br />
|-<br />
|1 110 00||&minus;8||<br />
|-<br />
|1 110 01||&minus;10||<br />
|-<br />
|1 110 10||&minus;12||<br />
|-<br />
|1 110 11||&minus;14||<br />
|-<br />
|1 111 00||&minus;16||&minus;infinity<br />
|-<br />
|1 111 01||&minus;20||&minus;special NaN<br />
|-<br />
|1 111 10||&minus;24||&minus;special NaN<br />
|-<br />
|1 111 11||&minus;28||&minus;NaN<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9955User:Berni44/Floatingpoint2021-02-14T12:48:06Z<p>Berni44: /* Infinity and Not a Number */</p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa and the exponent is increased by one. So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
There is an other exponent, which is treated special: <code>111</code>. This time, if the mantissa is <code>00</code> it is considered to be infinity and if it is <code>11</code> it denotes a number, that is not a number, a so called ''NaN''. Other values for the mantissa are also considered to be NaNs, but this time with some special meanings attached, which is beyond the scope of this article.<br />
<br />
With that, our number line looks like this:<br />
<br />
[[File:Nano float3.png|600px]]<br />
<br />
And the zoomed in version looks like this:<br />
<br />
[[File:Nano float4.png|600px]]<br />
<br />
It can now clearly be seen, that the center is equispaced and the outsides have been truncated somewhat.<br />
<br />
... to be continued<br />
<br />
== Solutions ==<br />
<br />
``Exercise 1:``<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||0,125<br />
|-<br />
|0 000 01||0,15625<br />
|-<br />
|0 000 10||0,1875<br />
|-<br />
|0 000 11||0,21875<br />
|-<br />
|0 001 00||0,25<br />
|-<br />
|0 001 01||0,3125<br />
|-<br />
|0 001 10||0,375<br />
|-<br />
|0 001 11||0,4375<br />
|-<br />
|0 010 00||0,5<br />
|-<br />
|0 010 01||0,625<br />
|-<br />
|0 010 10||0,75<br />
|-<br />
|0 010 11||0,875<br />
|-<br />
|0 011 00||1<br />
|-<br />
|0 011 01||1,25<br />
|-<br />
|0 011 10||1,5<br />
|-<br />
|0 011 11||1,75<br />
|-<br />
|0 100 00||2<br />
|-<br />
|0 100 01||2,5<br />
|-<br />
|0 100 10||3<br />
|-<br />
|0 100 11||3,5<br />
|-<br />
|0 101 00||4<br />
|-<br />
|0 101 01||5<br />
|-<br />
|0 101 10||6<br />
|-<br />
|0 101 11||7<br />
|-<br />
|0 110 00||8<br />
|-<br />
|0 110 01||10<br />
|-<br />
|0 110 10||12<br />
|-<br />
|0 110 11||14<br />
|-<br />
|0 111 00||16<br />
|-<br />
|0 111 01||20<br />
|-<br />
|0 111 10||24<br />
|-<br />
|0 111 11||28<br />
|-<br />
|1 000 00||&minus;0,125<br />
|-<br />
|1 000 01||&minus;0,15625<br />
|-<br />
|1 000 10||&minus;0,1875<br />
|-<br />
|1 000 11||&minus;0,21875<br />
|-<br />
|1 001 00||&minus;0,25<br />
|-<br />
|1 001 01||&minus;0,3125<br />
|-<br />
|1 001 10||&minus;0,375<br />
|-<br />
|1 001 11||&minus;0,4375<br />
|-<br />
|1 010 00||&minus;0,5<br />
|-<br />
|1 010 01||&minus;0,625<br />
|-<br />
|1 010 10||&minus;0,75<br />
|-<br />
|1 010 11||&minus;0,875<br />
|-<br />
|1 011 00||&minus;1<br />
|-<br />
|1 011 01||&minus;1,25<br />
|-<br />
|1 011 10||&minus;1,5<br />
|-<br />
|1 011 11||&minus;1,75<br />
|-<br />
|1 100 00||&minus;2<br />
|-<br />
|1 100 01||&minus;2,5<br />
|-<br />
|1 100 10||&minus;3<br />
|-<br />
|1 100 11||&minus;3,5<br />
|-<br />
|1 101 00||&minus;4<br />
|-<br />
|1 101 01||&minus;5<br />
|-<br />
|1 101 10||&minus;6<br />
|-<br />
|1 101 11||&minus;7<br />
|-<br />
|1 110 00||&minus;8<br />
|-<br />
|1 110 01||&minus;10<br />
|-<br />
|1 110 10||&minus;12<br />
|-<br />
|1 110 11||&minus;14<br />
|-<br />
|1 111 00||&minus;16<br />
|-<br />
|1 111 01||&minus;20<br />
|-<br />
|1 111 10||&minus;24<br />
|-<br />
|1 111 11||&minus;28<br />
|}</div>Berni44https://wiki.dlang.org/?title=File:Nano_float4.png&diff=9954File:Nano float4.png2021-02-14T12:45:29Z<p>Berni44: </p>
<hr />
<div></div>Berni44https://wiki.dlang.org/?title=File:Nano_float3.png&diff=9953File:Nano float3.png2021-02-14T12:45:17Z<p>Berni44: </p>
<hr />
<div></div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9952User:Berni44/Floatingpoint2021-02-14T12:06:33Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Zero and denormalized numbers ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One clearly sees, that on the outside, the numbers are sparse. The count of numbers increases, while approaching 0 from both sides. But then, suddenly they stop and leave a gap at zero. Let's zoom in:<br />
<br />
[[File:Nano float2.png|600px]]<br />
<br />
Now we can clearly see the gap: There is no 0. The reason for this is, that 0 is the only number, where the integral bit of the mantissa has to be 0, because there is no 1 bit available.<br />
<br />
To get around this, numbers with exponent <code>000</code>, which are called ''denormalized numbers'' or ''subnormal numbers'', are treated special: These numbers are considered to have an integral 0 bit implied to the mantissa and the exponent is increased by one. So <code>1 000 10</code> would be decoded as <code>- 0.5 * 2^(-2) = -0.125</code>. <br />
<br />
And when the mantissa is <code>00</code> we've got our zero: <code>0 000 00 = +0</code>. Unfortunately, there is a second zero: <code>1 000 00 = &minus;0</code>.<br />
<br />
== Infinity and Not a Number ==<br />
<br />
... to be continued<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Solutions ==<br />
<br />
``Exercise 1:``<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||0,125<br />
|-<br />
|0 000 01||0,15625<br />
|-<br />
|0 000 10||0,1875<br />
|-<br />
|0 000 11||0,21875<br />
|-<br />
|0 001 00||0,25<br />
|-<br />
|0 001 01||0,3125<br />
|-<br />
|0 001 10||0,375<br />
|-<br />
|0 001 11||0,4375<br />
|-<br />
|0 010 00||0,5<br />
|-<br />
|0 010 01||0,625<br />
|-<br />
|0 010 10||0,75<br />
|-<br />
|0 010 11||0,875<br />
|-<br />
|0 011 00||1<br />
|-<br />
|0 011 01||1,25<br />
|-<br />
|0 011 10||1,5<br />
|-<br />
|0 011 11||1,75<br />
|-<br />
|0 100 00||2<br />
|-<br />
|0 100 01||2,5<br />
|-<br />
|0 100 10||3<br />
|-<br />
|0 100 11||3,5<br />
|-<br />
|0 101 00||4<br />
|-<br />
|0 101 01||5<br />
|-<br />
|0 101 10||6<br />
|-<br />
|0 101 11||7<br />
|-<br />
|0 110 00||8<br />
|-<br />
|0 110 01||10<br />
|-<br />
|0 110 10||12<br />
|-<br />
|0 110 11||14<br />
|-<br />
|0 111 00||16<br />
|-<br />
|0 111 01||20<br />
|-<br />
|0 111 10||24<br />
|-<br />
|0 111 11||28<br />
|-<br />
|1 000 00||&minus;0,125<br />
|-<br />
|1 000 01||&minus;0,15625<br />
|-<br />
|1 000 10||&minus;0,1875<br />
|-<br />
|1 000 11||&minus;0,21875<br />
|-<br />
|1 001 00||&minus;0,25<br />
|-<br />
|1 001 01||&minus;0,3125<br />
|-<br />
|1 001 10||&minus;0,375<br />
|-<br />
|1 001 11||&minus;0,4375<br />
|-<br />
|1 010 00||&minus;0,5<br />
|-<br />
|1 010 01||&minus;0,625<br />
|-<br />
|1 010 10||&minus;0,75<br />
|-<br />
|1 010 11||&minus;0,875<br />
|-<br />
|1 011 00||&minus;1<br />
|-<br />
|1 011 01||&minus;1,25<br />
|-<br />
|1 011 10||&minus;1,5<br />
|-<br />
|1 011 11||&minus;1,75<br />
|-<br />
|1 100 00||&minus;2<br />
|-<br />
|1 100 01||&minus;2,5<br />
|-<br />
|1 100 10||&minus;3<br />
|-<br />
|1 100 11||&minus;3,5<br />
|-<br />
|1 101 00||&minus;4<br />
|-<br />
|1 101 01||&minus;5<br />
|-<br />
|1 101 10||&minus;6<br />
|-<br />
|1 101 11||&minus;7<br />
|-<br />
|1 110 00||&minus;8<br />
|-<br />
|1 110 01||&minus;10<br />
|-<br />
|1 110 10||&minus;12<br />
|-<br />
|1 110 11||&minus;14<br />
|-<br />
|1 111 00||&minus;16<br />
|-<br />
|1 111 01||&minus;20<br />
|-<br />
|1 111 10||&minus;24<br />
|-<br />
|1 111 11||&minus;28<br />
|}</div>Berni44https://wiki.dlang.org/?title=File:Nano_float2.png&diff=9951File:Nano float2.png2021-02-14T11:53:29Z<p>Berni44: </p>
<hr />
<div></div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9950User:Berni44/Floatingpoint2021-02-14T08:40:58Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
== Special values ==<br />
<br />
The table from the exercise above can be visualized on a number line:<br />
<br />
[[File:Nano float1.png|600px]]<br />
<br />
One can clearly see, that there are few numbers on the outsides. The count of numbers increases, while approaching 0. But then, there is a gap around 0.<br />
<br />
... to be continued<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Solutions ==<br />
<br />
``Exercise 1:``<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||0,125<br />
|-<br />
|0 000 01||0,15625<br />
|-<br />
|0 000 10||0,1875<br />
|-<br />
|0 000 11||0,21875<br />
|-<br />
|0 001 00||0,25<br />
|-<br />
|0 001 01||0,3125<br />
|-<br />
|0 001 10||0,375<br />
|-<br />
|0 001 11||0,4375<br />
|-<br />
|0 010 00||0,5<br />
|-<br />
|0 010 01||0,625<br />
|-<br />
|0 010 10||0,75<br />
|-<br />
|0 010 11||0,875<br />
|-<br />
|0 011 00||1<br />
|-<br />
|0 011 01||1,25<br />
|-<br />
|0 011 10||1,5<br />
|-<br />
|0 011 11||1,75<br />
|-<br />
|0 100 00||2<br />
|-<br />
|0 100 01||2,5<br />
|-<br />
|0 100 10||3<br />
|-<br />
|0 100 11||3,5<br />
|-<br />
|0 101 00||4<br />
|-<br />
|0 101 01||5<br />
|-<br />
|0 101 10||6<br />
|-<br />
|0 101 11||7<br />
|-<br />
|0 110 00||8<br />
|-<br />
|0 110 01||10<br />
|-<br />
|0 110 10||12<br />
|-<br />
|0 110 11||14<br />
|-<br />
|0 111 00||16<br />
|-<br />
|0 111 01||20<br />
|-<br />
|0 111 10||24<br />
|-<br />
|0 111 11||28<br />
|-<br />
|1 000 00||&minus;0,125<br />
|-<br />
|1 000 01||&minus;0,15625<br />
|-<br />
|1 000 10||&minus;0,1875<br />
|-<br />
|1 000 11||&minus;0,21875<br />
|-<br />
|1 001 00||&minus;0,25<br />
|-<br />
|1 001 01||&minus;0,3125<br />
|-<br />
|1 001 10||&minus;0,375<br />
|-<br />
|1 001 11||&minus;0,4375<br />
|-<br />
|1 010 00||&minus;0,5<br />
|-<br />
|1 010 01||&minus;0,625<br />
|-<br />
|1 010 10||&minus;0,75<br />
|-<br />
|1 010 11||&minus;0,875<br />
|-<br />
|1 011 00||&minus;1<br />
|-<br />
|1 011 01||&minus;1,25<br />
|-<br />
|1 011 10||&minus;1,5<br />
|-<br />
|1 011 11||&minus;1,75<br />
|-<br />
|1 100 00||&minus;2<br />
|-<br />
|1 100 01||&minus;2,5<br />
|-<br />
|1 100 10||&minus;3<br />
|-<br />
|1 100 11||&minus;3,5<br />
|-<br />
|1 101 00||&minus;4<br />
|-<br />
|1 101 01||&minus;5<br />
|-<br />
|1 101 10||&minus;6<br />
|-<br />
|1 101 11||&minus;7<br />
|-<br />
|1 110 00||&minus;8<br />
|-<br />
|1 110 01||&minus;10<br />
|-<br />
|1 110 10||&minus;12<br />
|-<br />
|1 110 11||&minus;14<br />
|-<br />
|1 111 00||&minus;16<br />
|-<br />
|1 111 01||&minus;20<br />
|-<br />
|1 111 10||&minus;24<br />
|-<br />
|1 111 11||&minus;28<br />
|}</div>Berni44https://wiki.dlang.org/?title=File:Nano_float1.png&diff=9949File:Nano float1.png2021-02-14T08:35:55Z<p>Berni44: </p>
<hr />
<div></div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9948User:Berni44/Floatingpoint2021-02-14T08:15:15Z<p>Berni44: /* Nano floats */</p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What we still need, is to know how the parts of that number are decoded. Let's start with the sign bit, which is easy. A <code>0</code> is <code>+</code> and a <code>1</code> is <code>&minus;</code>. We now know, that our number is negative.<br />
<br />
Next the exponent: <code>100</code> is the binary code of <code>4</code>. So our exponent is <code>4</code>? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If ''r'' is the number of bits of the exponent, the bias is <code>2^^(r&minus;1)&minus;1</code>. Here, we've got ''r''=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above, that the mantissa was <code>2.9979</code>. Note, that it is usual for scientific notation, that there is always exactly one integral digit in the mantissa, in this case <code>2</code>. Additionally there are four fractional digits: <code>9979</code>. Now, floating point numbers use binary code instead of decimal code. This implies, that the integral digit is (almost, see below) always <code>1</code>. It would be a waste to save this <code>1</code> in our number. Therefore it's omitted. Adding it to our mantissa, we've got <code>1.01</code> in binary code, which is <code>1.25</code> in decimal code.<br />
<br />
Putting all together we have: <code>1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5</code>.<br />
<br />
=== Exercise ===<br />
<br />
I'll add exercises throughout this document. I recommend to do them &mdash; you'll acquire a much better feeling for floating point numbers, when you do this on your own, instead of peeking at the answers. But of course, it's up to you.<br />
<br />
''Exercise 1: Write down all 64 bit patterns of nano floats in a table and calculate the value, which is represented by that value:''<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
<br />
... to be continued<br />
<br />
== Solutions ==<br />
<br />
``Exercise 1:``<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||0,125<br />
|-<br />
|0 000 01||0,15625<br />
|-<br />
|0 000 10||0,1875<br />
|-<br />
|0 000 11||0,21875<br />
|-<br />
|0 001 00||0,25<br />
|-<br />
|0 001 01||0,3125<br />
|-<br />
|0 001 10||0,375<br />
|-<br />
|0 001 11||0,4375<br />
|-<br />
|0 010 00||0,5<br />
|-<br />
|0 010 01||0,625<br />
|-<br />
|0 010 10||0,75<br />
|-<br />
|0 010 11||0,875<br />
|-<br />
|0 011 00||1<br />
|-<br />
|0 011 01||1,25<br />
|-<br />
|0 011 10||1,5<br />
|-<br />
|0 011 11||1,75<br />
|-<br />
|0 100 00||2<br />
|-<br />
|0 100 01||2,5<br />
|-<br />
|0 100 10||3<br />
|-<br />
|0 100 11||3,5<br />
|-<br />
|0 101 00||4<br />
|-<br />
|0 101 01||5<br />
|-<br />
|0 101 10||6<br />
|-<br />
|0 101 11||7<br />
|-<br />
|0 110 00||8<br />
|-<br />
|0 110 01||10<br />
|-<br />
|0 110 10||12<br />
|-<br />
|0 110 11||14<br />
|-<br />
|0 111 00||16<br />
|-<br />
|0 111 01||20<br />
|-<br />
|0 111 10||24<br />
|-<br />
|0 111 11||28<br />
|-<br />
|1 000 00||&minus;0,125<br />
|-<br />
|1 000 01||&minus;0,15625<br />
|-<br />
|1 000 10||&minus;0,1875<br />
|-<br />
|1 000 11||&minus;0,21875<br />
|-<br />
|1 001 00||&minus;0,25<br />
|-<br />
|1 001 01||&minus;0,3125<br />
|-<br />
|1 001 10||&minus;0,375<br />
|-<br />
|1 001 11||&minus;0,4375<br />
|-<br />
|1 010 00||&minus;0,5<br />
|-<br />
|1 010 01||&minus;0,625<br />
|-<br />
|1 010 10||&minus;0,75<br />
|-<br />
|1 010 11||&minus;0,875<br />
|-<br />
|1 011 00||&minus;1<br />
|-<br />
|1 011 01||&minus;1,25<br />
|-<br />
|1 011 10||&minus;1,5<br />
|-<br />
|1 011 11||&minus;1,75<br />
|-<br />
|1 100 00||&minus;2<br />
|-<br />
|1 100 01||&minus;2,5<br />
|-<br />
|1 100 10||&minus;3<br />
|-<br />
|1 100 11||&minus;3,5<br />
|-<br />
|1 101 00||&minus;4<br />
|-<br />
|1 101 01||&minus;5<br />
|-<br />
|1 101 10||&minus;6<br />
|-<br />
|1 101 11||&minus;7<br />
|-<br />
|1 110 00||&minus;8<br />
|-<br />
|1 110 01||&minus;10<br />
|-<br />
|1 110 10||&minus;12<br />
|-<br />
|1 110 11||&minus;14<br />
|-<br />
|1 111 00||&minus;16<br />
|-<br />
|1 111 01||&minus;20<br />
|-<br />
|1 111 10||&minus;24<br />
|-<br />
|1 111 11||&minus;28<br />
|}</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9947User:Berni44/Floatingpoint2021-02-13T19:48:53Z<p>Berni44: /* Nano floats */</p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What now misses, is to decode the parts of that number. Let's start with the sign bit: That is easy. A 0 is + and a 1 is &minus;. We now know, that our number is negative.<br />
<br />
Next the exponent: 100 is the binary code of 4. So our exponent is 4? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If r is the number of bits of the exponent, the bias is 2^^(r&minus;1)&minus;1. Here, we've got r=3, and therefore the bias is 2^^2&minus;1=3, and finally we get our exponent, it's 4&minus;3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above the mantissa 2.9979. That was exactly one integral digit (2) and four fractional digits (9979). In binary system, the integral digit is (almost, see below) always 1. It would be a waste to save this 1 in our number. Therefore it's omitted. Adding it, our mantissa is 1.01 in binary code, or 1.25 in decimal code.<br />
<br />
So putting all together we have: 1 100 01 = &minus; 1.25 * 2 ^^ 1 = &minus;2.5.<br />
<br />
If you like you can do an exercise now: Write down all 64 bit patterns of nano floats on a piece of paper and calculate the value, which is represented by that value:<br />
<br />
{|class="wikitable"<br />
!Bit pattern<br />
!Value<br />
|-<br />
|0 000 00||<br />
|-<br />
|0 000 01||<br />
|-<br />
|0 000 10||<br />
|-<br />
|0 000 11||<br />
|-<br />
|0 001 00||<br />
|-<br />
|...||<br />
|-<br />
|1 100 01|| &minus;2.5<br />
|-<br />
|...||<br />
|-<br />
|1 111 11||<br />
|}<br />
... to be continued</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9946User:Berni44/Floatingpoint2021-02-13T19:36:43Z<p>Berni44: /* Nano floats */</p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What now misses, is to decode the parts of that number. Let's start with the sign bit: That is easy. A 0 is + and a 1 is &minus;. We now know, that our number is negative.<br />
<br />
Next the exponent: 100 is the binary code of 4. So our exponent is 4? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If r is the number of bits of the exponent, the bias is 2^^(r-1)-1. Here, we've got r=3, and therefore the bias is 2^^2-1=3, and finally we get our exponent, it's 4-3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above the mantissa 2.9979. That was exactly one integral digit (2) and four fractional digits (9979). In binary system, the integral digit is (almost, see below) always 1. It would be a waste to save this 1 in our number. Therefore it's omitted. Adding it, our mantissa is 1.01 in binary code, or 1.25 in decimal code.<br />
<br />
So putting all together we have: 1 100 01 = - 1.25 * 2 ^^ 1 = -2.5.<br />
<br />
... to be continued</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9945User:Berni44/Floatingpoint2021-02-13T19:35:01Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. Nano floats have only 6 bits.<br />
<br />
== Nano floats ==<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What now misses, is to decode the parts of that number. Let's start with the sign bit: That is easy. A 0 is + and a 1 is &minus;. Next the exponent: 100 is the binary code of 4. So our exponent is 4? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If r is the number of bits of the exponent, the bias is 2^^(r-1)-1. Here, we've got r=3, and therefore the bias is 2^^2-1=3, and finally we get our exponent, it's 4-3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above the mantissa 2.9979. That was exactly one integral digit (2) and four fractional digits (9979). In binary system, the integral digit is (almost, see below) always 1. It would be a waste to save this 1 in our number. Therefore it's omitted. Adding it, our mantissa is 1.01 in binary code, or 1.25 in decimal code.<br />
<br />
So putting all together we have: 1 100 01 = - 1.25 * 2 ^^ 1 = -2.5.<br />
<br />
... to be continued</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9944User:Berni44/Floatingpoint2021-02-13T19:32:30Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
== Nano floats ==<br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved. Unfortunately, floats have already 32 bits and with that many 0s and 1s it can easily happen that one can't see the forest for the trees.<br />
<br />
For that reason I'll start with smaller floating point numbers, that I call ''nano floats''. They have only 6 bits.<br />
<br />
Floating point numbers consist of three parts: A sign bit, which is always exactly one bit, an exponent and a mantissa. Nano floats use 3 bits for the exponent and 2 bits for the mantissa. For example <code>1 100 01</code> is the bit representation of a nano float. Which number does this bit pattern represent?<br />
<br />
You can think of floating point numbers as numbers written in scientific notation, known from physics: For example the speed of light is about <code>+2.9979 * 10^^8 m/s</code>. Here we've got the sign bit (<code>+</code>), the exponent (<code>8</code>) and the mantissa (<code>2.9979</code>). Putting this together, we could write that number as <code>+ 8 2.9979</code>. This looks already a little bit like our number <code>1 100 01</code>.<br />
<br />
What now misses, is to decode the parts of that number. Let's start with the sign bit: That is easy. A 0 is + and a 1 is &minus;. Next the exponent: 100 is the binary code of 4. So our exponent is 4? No, it's not that easy. Exponents can also be negative. To achieve this, we have to subtract the so called ''bias''. The bias can be calculated from the number of bits of the exponent. If r is the number of bits of the exponent, the bias is 2^^(r-1)-1. Here, we've got r=3, and therefore the bias is 2^^2-1=3, and finally we get our exponent, it's 4-3=1.<br />
<br />
Now the mantissa. We've seen in the speed of light example above the mantissa 2.9979. That was exactly one integral digit (2) and four fractional digits (9979). In binary system, the integral digit is (almost, see below) always 1. It would be a waste to save this 1 in our number. Therefore it's omitted. Adding it, our mantissa is 1.01 in binary code, or 1.25 in decimal code.<br />
<br />
So putting all together we have: 1 100 01 = - 1.25 * 2 ^^ 1 = 2.5.<br />
<br />
... to be continued</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9943User:Berni44/Floatingpoint2021-02-13T18:38:10Z<p>Berni44: </p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
You probably already know, that strange things can happen, when using floating point numbers. <br />
<br />
An example:<br />
<br />
<syntaxhighlight lang=D><br />
import std.stdio;<br />
<br />
void main()<br />
{<br />
float a = 1000;<br />
float b = 1/a;<br />
float c = 1/b;<br />
writeln("Is ",a," == ",c,"? ",a==c?"Yes!":"No!");<br />
}<br />
</syntaxhighlight><br />
<br />
Did you guess the answer?<br />
<br />
<pre>Is 1000 == 1000? No!</pre><br />
<br />
== Nano floats ==<br />
<br />
To understand this strange behavior we have to look at the bit representation of the numbers involved.<br />
<br />
... to be continued</div>Berni44https://wiki.dlang.org/?title=User:Berni44/Floatingpoint&diff=9941User:Berni44/Floatingpoint2021-02-13T17:18:06Z<p>Berni44: Created page with "== An introduction to floating point numbers == ... to be continued"</p>
<hr />
<div>== An introduction to floating point numbers ==<br />
<br />
... to be continued</div>Berni44