I used to do small functions at the time, trying to find ways to refactor them (I recently read Martin Fowler's book Refactoring: Improving the Design of Existing Code ). I discovered the following MakeNiceString() function, updating another part of the code base next to it, and it looked like a good candidate to talk to. Be that as it may, there is no real reason to replace it, but it is small enough and does something small, so itβs easy to follow and still get a βgoodβ experience.
private static string MakeNiceString(string str) { char[] ca = str.ToCharArray(); string result = null; int i = 0; result += System.Convert.ToString(ca[0]); for (i = 1; i <= ca.Length - 1; i++) { if (!(char.IsLower(ca[i]))) { result += " "; } result += System.Convert.ToString(ca[i]); } return result; } static string SplitCamelCase(string str) { string[] temp = Regex.Split(str, @"(?<!^)(?=[AZ])"); string result = String.Join(" ", temp); return result; }
The first MakeNiceString() function is the function that I found in some code that I updated at work. The purpose of this function is to translate ThisIsAString to an It string . It was used in half a dozen places in the code, and it is pretty minor in the whole scheme of things.
I built the second function only as an academic exercise to see if using regular expressions will last longer or not.
Well, here are the results:
With 10 iterations:
MakeNiceString took 2649 ticks
SplitCamelCase took 2502 ticks
However, it varies greatly along the long course:
With 10,000 iterations:
MakeNiceString took 121625 ticks
SplitCamelCase took 443001 ticks
Refactoring MakeNiceString()
The refactoring process of MakeNiceString() began with a simple removal of the transformations that took place. This gave the following results:
MakeNiceString took 124716 ticks
ImprovedMakeNiceString took 118486
Here is the code after Refactor # 1:
private static string ImprovedMakeNiceString(string str) {
Reactor # 2 - Use StringBuilder
My second task was to use StringBuilder instead of String . Because String immutable, unnecessary copies are created during the loop. The test for using this is below, as well as the code:
static string RefactoredMakeNiceString(string str) { char[] ca = str.ToCharArray(); StringBuilder sb = new StringBuilder((str.Length * 5 / 4)); int i = 0; sb.Append(ca[0]); for (i = 1; i <= ca.Length - 1; i++) { if (!(char.IsLower(ca[i]))) { sb.Append(" "); } sb.Append(ca[i]); } return sb.ToString(); }
This leads to the following benchmark:
MakeNiceString Took: 124497 Ticks // Original
SplitCamelCase Took: 464459 Ticks // Regex
ImprovedMakeNiceString Took: 117369 Ticks // Remove Conversion
RefactoredMakeNiceString Took: 38542 Ticks // Using StringBuilder
Changing the for loop in the foreach resulted in the following test result:
static string RefactoredForEachMakeNiceString(string str) { char[] ca = str.ToCharArray(); StringBuilder sb1 = new StringBuilder((str.Length * 5 / 4)); sb1.Append(ca[0]); foreach (char c in ca) { if (!(char.IsLower(c))) { sb1.Append(" "); } sb1.Append(c); } return sb1.ToString(); }
RefactoredForEachMakeNiceString Took: 45163 Ticks
As you can see, maintenance, the foreach cycle will be the easiest to maintain and have the βcleanestβ look. It is slightly slower than the for loop, but infinitely easier to follow.
Alternative refactoring: use compiled Regex
I moved Regex right before the start of the loop, hoping that since it only compiles it once, it will run faster. What I found out (and I'm sure I have a mistake somewhere) is that this is not the way it should:
static void runTest5() { Regex rg = new Regex(@"(?<!^)(?=[AZ])", RegexOptions.Compiled); for (int i = 0; i < 10000; i++) { CompiledRegex(rg, myString); } } static string CompiledRegex(Regex regex, string str) { string result = null; Regex rg1 = regex; string[] temp = rg1.Split(str); result = String.Join(" ", temp); return result; }
The final test results:
MakeNiceString Took 139363 Ticks
SplitCamelCase Took 489174 Ticks
ImprovedMakeNiceString Took 115478 Ticks
RefactoredMakeNiceString Took 38819 Ticks
RefactoredForEachMakeNiceString Took 44700 Ticks
CompiledRegex Took 227021 Ticks
Or, if you prefer milliseconds:
MakeNiceString Took 38 ms
SplitCamelCase Took 123 ms
ImprovedMakeNiceString Took 33 ms
RefactoredMakeNiceString Took 11 ms
RefactoredForEachMakeNiceString Took 12 ms
CompiledRegex Took 63 ms
Thus, interest income:
MakeNiceString 38 ms Baseline
SplitCamelCase 123 ms 223% slower
ImprovedMakeNiceString 33 ms 13.15% faster
RefactoredMakeNiceString 11 ms 71.05% faster
RefactoredForEachMakeNiceString 12 ms 68.42% faster
CompiledRegex 63 ms 65.79% slower
(Please check my math)
In the end, I'm going to replace what is there with RefactoredForEachMakeNiceString() , and while I'm in it, I will rename it something useful, like SplitStringOnUpperCase .
Performance test:
For comparison, I just call a new Stopwatch for each method call:
string myString = "ThisIsAUpperCaseString"; Stopwatch sw = new Stopwatch(); sw.Start(); runTest(); sw.Stop(); static void runTest() { for (int i = 0; i < 10000; i++) { MakeNiceString(myString); } }
Questions
- What makes these functions be so "long" as well
- How can I improve this feature a) be more maintainable or b) work faster?
- How would I do memory tests on them to find out which one has less memory?
Thank you for your responses. I added all the suggestions made by @Jon Skeet, and would like to receive feedback on the updated questions that I asked as a result.
NB . This question is intended to learn how to refactor string processing functions in C #. I copied / pasted the first as is code. I know well that you can remove System.Convert.ToString() in the first method, and I did just that. If anyone is aware of any consequences of removing System.Convert.ToString() , this would also be helpful to know.