Improve your code: Regex creation is expensive
December 16, 2008 .Net, Improve Your Code 4 CommentsOne more Improve your code for an issue that I found in every .Net project I’ve ever worked that used Regex(es): People instantiating them too often.
I don’t remember a single project where I’ve seen them used properly (from the code-usage perspective not from the Regular Expression perspective).
Before a recommendation it’s worth noticing this critical piece of information from the MSDN documentation:
Thread Safety: The Regex class is immutable (read-only) and is inherently thread safe. Regex objects can be created on any thread and shared between threads […]
Yes, you can create one Regex and use it as many times as you want without issues.
Issue: Creating Regex classes is very very expensive
The Regex has to be parsed, a full execution tree has to be build and lots of code generated under the covers. Then, you use it once and it’s left hanging in memory for a long long time.
Thus, this is very expensive and wrong:
Regex regex = new Regex(@”^\d{13,19}$”);
It’s even worse when it’s used inside a for-loop for example or multiple times in a page.
Recommendation
The proper way to initialise your Regex for the best performance is declaring them at class level as static read-only and with the compiled flag set.
Like this:
private static readonly Regex valueFormatMatch = new Regex(@”(\[*\])”, RegexOptions.Compiled);
Why:
- Make it static: so you always have access to it. It’s thread safe so it’s ok to have it static.
- Make it read-only: so you avoid someone changing it half way through the run plus you help the JIT optimizer.
- If the expression is complex flag it as RegexOptions.Compiled: Improves performance as the parsing is tree is exported to an assembly which should yield better performance.
- Note: from personal experience I’ve noticed that this only works better if you have a complex expression. For simple expressions the version without Compiled seems to be slightly faster
Running some tests
For for the fun I’ve put together a small performance test that will run a simple Regex over several strings:
Test 1: Static readonly Regex with compiled flag
private static readonly Regex valueFormatMatch = new Regex(@”(\[*\])”, RegexOptions.Compiled);
private static void Test1() { Stopwatch s1 = new Stopwatch(); s1.Start(); for(int i = 0; i < _iterations; i++) { valueFormatMatch.IsMatch(“123:04″); valueFormatMatch.IsMatch(“23:34:56″); valueFormatMatch.IsMatch(“12345678″); } s1.Stop(); Console.WriteLine(“Test1: “ + s1.ElapsedMilliseconds ); }
Test 2: Creating the Regex inside the for-loop
private static void Test2() { Stopwatch s1 = new Stopwatch(); s1.Start(); for (int i = 0; i < _iterations; i++) { Regex test = new Regex(@”(\[*\])”); test.IsMatch(“123:04″); test.IsMatch(“23:34:56″); test.IsMatch(“12345678″); } s1.Stop(); Console.WriteLine(“Test2: “ + s1.ElapsedMilliseconds); GC.Collect(); GC.Collect(); }
Test 3: Creating the Regex inside the for-loop with the Compile flag set
private static void Test3() { Stopwatch s1 = new Stopwatch(); s1.Start(); for (int i = 0; i < _iterations; i++) { Regex test = new Regex(@”(\[*\])”, RegexOptions.Compiled); test.IsMatch(“123:04″); test.IsMatch(“23:34:56″); test.IsMatch(“12345678″); } s1.Stop(); Console.WriteLine(“Test3: “ + s1.ElapsedMilliseconds); GC.Collect(); GC.Collect(); }
Performance results over 10000 iterations:
- Test 1: Static readonly Regex with compiled flag: 10ms
- Test 2: Creating the Regex inside the for-loop: 153ms
- Test 3: Creating the Regex inside the for-loop with the Compile flag set: 13725ms
So, quite clearly the static-readonly Regex is your best option.
The test 3 all it proves is that it’s very expensive to do the compilation of the Regex. I like the idea and I apply it but the compiled flag it’s not really required. Just make sure you don’t have Regex-ex created everywhere through your code and you’ll be ok.