Difference between revisions of "Element type of string ranges"
(→Comparison: Add correctness) |
|||
Line 10: | Line 10: | ||
| {{Yes}} <tt>s.canFind('é')</tt> | | {{Yes}} <tt>s.canFind('é')</tt> | ||
|rowspan=2| {{No}} Will result in a pragma warning in some places, will fail silently in others (when specified via predicate). | |rowspan=2| {{No}} Will result in a pragma warning in some places, will fail silently in others (when specified via predicate). | ||
+ | |||
+ | Note: this should not be recommended practice (not all languages have notions of characters, and not all characters (glyphs/graphemes) can be represented in one <tt>dchar</tt>). | ||
|- | |- | ||
| Searching for a particular <tt>dchar</tt> in a non-normalized string. || {{No}} Above fails for [http://forum.dlang.org/post/hxudajoutambsznfdydb@forum.dlang.org combining marks], as that requires normalization. | | Searching for a particular <tt>dchar</tt> in a non-normalized string. || {{No}} Above fails for [http://forum.dlang.org/post/hxudajoutambsznfdydb@forum.dlang.org combining marks], as that requires normalization. | ||
Line 16: | Line 18: | ||
| {{Yes}} <tt>s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é") </tt> | | {{Yes}} <tt>s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é") </tt> | ||
|rowspan=2| {{No}} Fails silently. | |rowspan=2| {{No}} Fails silently. | ||
+ | |||
+ | Note: this should not be recommended practice (correct case conversion and comparison for all languages is more complicated, and depends on locale - e.g. Turkish I / ı and İ / i). | ||
|- | |- | ||
| Case conversion, insensitive comparison in ranges for other languages | | Case conversion, insensitive comparison in ranges for other languages | ||
− | | {{No}} | + | | {{No}} Fails. |
|- | |- | ||
| Correctness || {{No}} Only works for certain languages and alphabets || {{No}} Only works for ASCII | | Correctness || {{No}} Only works for certain languages and alphabets || {{No}} Only works for ASCII | ||
Line 26: | Line 30: | ||
| Implementation difficulty || {{No}}<br><tt>phobos/std $ grep ElementEncodingType *.d | wc -l<br>80</tt> || {{Yes}} Strings are treated as any other arrays | | Implementation difficulty || {{No}}<br><tt>phobos/std $ grep ElementEncodingType *.d | wc -l<br>80</tt> || {{Yes}} Strings are treated as any other arrays | ||
|- | |- | ||
− | | Consistency || {{No}} [http://forum.dlang.org/post/jbxfkpxzuozcdbnddcuw@forum.dlang.org Range algorithms return values different from array algorithms] || {{Yes}} String ranges work like ranges of any other arrays | + | | Consistency || {{No}} [http://forum.dlang.org/post/lfdnk5$pt1$1@digitalmars.com Inconsistencies between array and range types]<br>{{No}} [http://forum.dlang.org/post/jbxfkpxzuozcdbnddcuw@forum.dlang.org Range algorithms return values different from array algorithms]|| {{Yes}} String ranges work like ranges of any other arrays |
|} | |} |
Revision as of 03:25, 9 March 2014
This article attempts to summarize the arguments in the thread Major performance problem with std.array.front().
Comparison
One of the proposals in the thread is to switch the iteration type of string ranges from dchar to the string's character type.
Argument | Old | New |
---|---|---|
Searching for a particular dchar in a string. | s.canFind('é') | Will result in a pragma warning in some places, will fail silently in others (when specified via predicate).
Note: this should not be recommended practice (not all languages have notions of characters, and not all characters (glyphs/graphemes) can be represented in one dchar). |
Searching for a particular dchar in a non-normalized string. | Above fails for combining marks, as that requires normalization. | |
Case conversion, insensitive comparison in ranges for certain languages | s.count!((a, b) => std.uni.toLower(a) == std.uni.toLower(b))("é") | Fails silently.
Note: this should not be recommended practice (correct case conversion and comparison for all languages is more complicated, and depends on locale - e.g. Turkish I / ı and İ / i). |
Case conversion, insensitive comparison in ranges for other languages | Fails. | |
Correctness | Only works for certain languages and alphabets | Only works for ASCII |
Performance | Implicit decoding everywhere, unless each algorithm is specialized not to | As fast as ubyte[] |
Implementation difficulty | phobos/std $ grep ElementEncodingType *.d | wc -l 80 |
Strings are treated as any other arrays |
Consistency | Inconsistencies between array and range types Range algorithms return values different from array algorithms |
String ranges work like ranges of any other arrays |