Monday, May 16, 2011

String Equality and Performance in C#

Strings in .Net are one of the more used primitives but are also the most problematic performance wise.  In college and while learning through books, etc. , we’re told that to concatenate two strings, use the + operator.  The + operator seems to become habit fairly quickly and sticks with us, and we wonder why our programs are taking seconds between operations and eating megs of memory.  It’s only after running a performance analyzer we realize the culprit – the old standby + operator which is taking seconds to concat two strings.  We learn through that that we can increase performance and reduce memory usage by using StringBuilder and its Append method instead and bring seconds down to milliseconds and drop megs off the memory footprint.  While this may not be the fastest method out there, it sure beats the + operator that we get into the habit of using.
We’re also told to use the == operator (or .equals in Java) for string equality.  Is == the best operator to use when doing string equality operations?  And, is there a fast case-insensitive way to do string equality?
As usual, there’s more than one way to do a task in .Net and string equality is no exception.  I counted at least 24 different ways using the .Net built-in methods.  A lot of the different ways come from doing string comparisons and doing the comparisons based on culture (Windows-installed, current, etc) or on doing a binary comparison…speed is also determined based on whether you’re doing a culture-specific comparison or doing a simple binary comparison (i.e. via ASCII/Unicode-table).  Sting equality is not exactly the same as string comparisons so I’m ignoring culture in this post, with the exception of which performs the best given the circumstances.
Grouping the equality operations together without their culture options, there are 6.  The == operator, .equals, .CompareTo, the static methods string.equals, string.Compare, and string.CompareOrdinal.
Since, in general, string equality operations are very fast when done as a single operation, I’ve ran tests based on 6,250,000 (2,5002) string equality tests.  For these tests, I have a routine to create 2,500 randomly-generated strings.  I tested with string lengths between 1 and 1k (1024).
Ok, so the answer to which is faster – well, guess what, it depends.  [nothing is ever as easy as it seems, is it?].  The answer is different depending on if you’re dealing with same-length strings vs different length strings, and on whether you’re wanting to ignore case or not.  There are sure to be other reasons for the speed difference as well but here’re my results.  I’ll do the comparison based on different vs same length strings – just use the best function listed for whether to ignore the case or not.

DIFFERENT Length Strings

.equals and string.equals seem to be pretty much tied for 1st and 2nd place with time around .09 seconds (the .Net timer object I was using for the time calculation has an accuracy to .001 seconds).  Both of these are using ordinal ignore case comparison.
Here’s the other results for comparison:
.equals (ordinal ignore case) .097
string.equals (ordinal ignore case) .098
.equals (ordinal) .120
.equals .121
string.equals (ordinal) .127
== .135
string.CompareOrdinal .239
.equals (invariant culture) .627
.equals (invariant ignore case) .635
string.compare (ignore case) .658
.equals (current culture ignore case) .659
string.compare .660
string.equals (current culture ignore case) .673
string.equals (invariant ignore case) .676
string.equals (current culture) .682
.CompareTo .684
.equals (current culture) .689
== (w/ ToLowerInvariant inline) 24.153 (yes, seconds)
== (w/ ToUpper inline) 25.071
== (w/ ToUpper external) 25.029
== (w/ ToLower inline) 25.504
== (w/ ToLower external) 25.730

 

SAME Length Strings

For same length strings, things are slightly different – string.equals (ordinal) and .equals (ordinal) are at the top, not the ordinal ignore case.  Here’s the rundown for same length strings.  I’ve omitted the ToLower/ToUpper results as they were just horrible in performance.
string.Equals (ordinal) .124
.equals (ordinal) .128
string.compareOrdinal .132
.equals .133
== .146
.equals (ordinal ignore case) .213
string.Equals (ordinal ignore case) .230
string.Equals (invariant culture ignore case) .577
string.Equals (invariant culture) .578
.equals (invariant culture ignore case) .611
string.Equals (current culture ignore case) .620
.equals (current culture) .622
string.Equals (current culture) .630
.CompareTo .634
.equals (invariant culture) .636
string.compare (ignore case) .638
string.compare .686

 

Code

Ok, so you don’t want to look up how to do an ordinal ignore case equality against two strings…well, here’s the code snipets.  Variables str1 and str2 are strings.

==

This one is easy – not much explanation needed.
1: if (str1 == str2)


.Equals


String object method – similar to Java’s equals method.
1: if (str1.Equals(str2))


string.Compare


One of the static string compare functions.  Returns 0 if strings are equal.  Several options available.
1: if (string.Compare(str1, str2) == 0)


string.CompareOrdinal


Another static string compare function – compares using binary comparison instead of text so it uses the ASCII/Unicode table for its comparison.
1: if (string.CompareOrdinal(str1, str2) == 0)


string.Compare (ignore case)


An overload to string.Compare that ignores case of the characters.
1: if (string.Compare(str1, str2,true) == 0)


string.Equals, .Equals (current culture)


Compares strings using the current machine/user’s culture).  Overload of the equals method.
1: if (string.Equals(str1, str2,StringComparison.CurrentCulture))
1: if (str1.Equals(str2,StringComparison.CurrentCulture))
string.Equals, .Equals (current culture ignore case)
Same as equals current culture – just ignores case
1: if (string.Equals(str1, str2, StringComparison.CurrentCultureIgnoreCase))
1: if (str1.Equals(str2, StringComparison.CurrentCultureIgnoreCase))


string.Equals, .Equals (invariant culture)


Compares based on the installed Windows culture.
1: if (string.Equals(str1, str2, StringComparison.InvariantCulture))
1: if (str1.Equals(str2, StringComparison.InvariantCulture))


string.Equals, .Equals (invariant culture ignore case)


Ignores case but compares on the currently installed Windows culture.
1: if (string.Equals(str1, str2, StringComparison.InvariantCultureIgnoreCase))
1: if (str1.Equals(str2, StringComparison.InvariantCultureIgnoreCase))


string.Equals, .Equals (ordinal)


Performs binary (ASCII/Unicode) comparison.
1: if (string.Equals(str1, str2, StringComparison.Ordinal))
1: if (str1.Equals(str2, StringComparison.Ordinal))


string.Equals, .Equals (ordinal ignore case)


Binary comparison but ignores case.
1: if (string.Equals(str1, str2, StringComparison.OrdinalIgnoreCase))
1: if (str1.Equals(str2, StringComparison.OrdinalIgnoreCase))


.CompareTo


String object compare to method – similar to string compare method except no overloads other than comparing strings to objects.
1: if (str1.CompareTo(str2) == 0)


== (w/ ToLower), == (w/ ToUpper), == (ToLowerInvariant)


Sadly, this is how I’ve done case insensitive searches for years (even in .Net) thanks for my many years programming in BASIC and QuickBasic – neither of which had the many ways to do string equality that .Net has.  This is also, sadly, one of the slowest ways to do string case insensitive equality.  The true cost here is the ToLower case conversion, which is just slow.  This, just like the + operator for strings, is one of those “don’t do these” features of strings.  If you’re needing to do case insensitive equality, use something else…
1: if (str1.ToLower()==str2.ToLower())
1: if (str1.ToUpper() == str2.ToUpper())
1: if (str1.ToLowerInvariant() == str2.ToLowerInvariant())


== (w/ ToLower external), == (w/ ToUpper external)


Essentially the only difference here is ToLower is pulled outside of the equality – so essentially the case conversion is done elsewhere…still takes a LOT of time to do the conversion on top of the equality.
1: string a = str1.ToUpper();
2: string b = str2.ToUpper();
3: if (a == b)


Conclusion


Well, since most strings in .Net (and Java) are different length strings, it’s best to use .equals over the == operator.  If the program does not care about case nor culture comparisons, adding in the case insensitive ordinal comparison also helps as well.  Of course, the performance is minor in comparison to the + operator, but, if you’re dealing with millions (or billions) of string equalities, every millisecond gained helps.  Definitely avoid use of ToLower or ToUpper to do comparisons/equality.

No comments:

Post a Comment