Improve your code: Regex creation is expensive
December 16, 2008 12:07 pm .Net, Improve Your CodeOne more Improve your code for an issue that I found in every .Net project I’ve ever worked that used Regex(es): People instantiating them too often.
I don’t remember a single project where I’ve seen them used properly (from the code-usage perspective not from the Regular Expression perspective).
Before a recommendation it’s worth noticing this critical piece of information from the MSDN documentation:
Thread Safety: The Regex class is immutable (read-only) and is inherently thread safe. Regex objects can be created on any thread and shared between threads [...]
Yes, you can create one Regex and use it as many times as you want without issues.
Issue: Creating Regex classes is very very expensive
The Regex has to be parsed, a full execution tree has to be build and lots of code generated under the covers. Then, you use it once and it’s left hanging in memory for a long long time.
Thus, this is very expensive and wrong:
Regex regex = new Regex(@"^\d{13,19}$");
It’s even worse when it’s used inside a for-loop for example or multiple times in a page.
Recommendation
The proper way to initialise your Regex for the best performance is declaring them at class level as static read-only and with the compiled flag set.
Like this:
private static readonly Regex valueFormatMatch = new Regex(@"(\[*\])", RegexOptions.Compiled);
Why:
- Make it static: so you always have access to it. It’s thread safe so it’s ok to have it static.
- Make it read-only: so you avoid someone changing it half way through the run plus you help the JIT optimizer.
- If the expression is complex flag it as RegexOptions.Compiled: Improves performance as the parsing is tree is exported to an assembly which should yield better performance.
- Note: from personal experience I’ve noticed that this only works better if you have a complex expression. For simple expressions the version without Compiled seems to be slightly faster
Running some tests
For for the fun I’ve put together a small performance test that will run a simple Regex over several strings:
Test 1: Static readonly Regex with compiled flag
private static readonly Regex valueFormatMatch = new Regex(@"(\[*\])", RegexOptions.Compiled);
private static void Test1() { Stopwatch s1 = new Stopwatch(); s1.Start(); for(int i = 0; i < _iterations; i++) { valueFormatMatch.IsMatch("123:04"); valueFormatMatch.IsMatch("23:34:56"); valueFormatMatch.IsMatch("12345678"); } s1.Stop(); Console.WriteLine("Test1: " + s1.ElapsedMilliseconds ); }
Test 2: Creating the Regex inside the for-loop
private static void Test2() { Stopwatch s1 = new Stopwatch(); s1.Start(); for (int i = 0; i < _iterations; i++) { Regex test = new Regex(@"(\[*\])"); test.IsMatch("123:04"); test.IsMatch("23:34:56"); test.IsMatch("12345678"); } s1.Stop(); Console.WriteLine("Test2: " + s1.ElapsedMilliseconds); GC.Collect(); GC.Collect(); }
Test 3: Creating the Regex inside the for-loop with the Compile flag set
private static void Test3() { Stopwatch s1 = new Stopwatch(); s1.Start(); for (int i = 0; i < _iterations; i++) { Regex test = new Regex(@"(\[*\])", RegexOptions.Compiled); test.IsMatch("123:04"); test.IsMatch("23:34:56"); test.IsMatch("12345678"); } s1.Stop(); Console.WriteLine("Test3: " + s1.ElapsedMilliseconds); GC.Collect(); GC.Collect(); }
Performance results over 10000 iterations:
- Test 1: Static readonly Regex with compiled flag: 10ms
- Test 2: Creating the Regex inside the for-loop: 153ms
- Test 3: Creating the Regex inside the for-loop with the Compile flag set: 13725ms
So, quite clearly the static-readonly Regex is your best option.
The test 3 all it proves is that it’s very expensive to do the compilation of the Regex. I like the idea and I apply it but the compiled flag it’s not really required. Just make sure you don’t have Regex-ex created everywhere through your code and you’ll be ok.
December 17th, 2008 at 2:19 am
Blogged response: http://pentonizer.com/csharp/regex-precompilation/
December 17th, 2008 at 11:29 pm
You missed the obvious test case: Creating the Regex outside the for loop but not static. Of course creating an object 10000 times is going to be a bit more expensive than creating it once.
And besides, how often are you going to create a Regex 10000 times? It doesn’t seem to make a human-noticeable difference even if you mistakenly stick it in the loop, but the question remains.
December 22nd, 2008 at 8:11 pm
You should repeat the tests using the Regex static functions in addition to testing versus precompiled regex’s. Consider using the Regex static methods instead of using RegexOptions.Compiled.
I too was of your opinion until I read about improvements since .net 2.0 -the static methods now cache the parsed regex and are just about as fast. The implications are that once cached, you no longer suffer the process of rebuilding them (assuming they weren’t pushed from the cache).
IIRC the current thinking is to either use the static functions because they’re efficient, generally have a lower memory overhead, and memory is reclaimed if they’re not needed (as opposed to loaded statically). If you do determine that your regex’s need to be compiled, you should go the extra step and precompile them into an assembly. C# Regular Expression Recipes—Compiling Regular Expressions
Also, a second point is you should be measuring the time it takes to compile the regex, and the differing memory footprints. You make light of the fact that compiling the regex is slow, but at the same time try to show that test2 is a factor of 10 slower than test 1. Additionally, if you take the creation of the regex outside of the for loop, how do the results differ. Further, you should compare simple regex’s to complicated ones to get a better feel for how they really compare.
January 6th, 2009 at 3:09 am
[...] Creating Regex objects is expensive, so it’s better to instantiate them once as static members. Using RegexOptions.Compiled is also recommended, as is reading Jeff Atwood’s post about it. [...]
February 17th, 2009 at 12:58 pm
very nice….