There are re library that gives opportunity to use regexp. With them you can make some of the string-checks that using clean-pythonic way would be a bit more complex.

Let's check out if this is really a kind of bottleneck that can underperform in some edge-cases!

To The Point

Regexp example.

Let's say that we have a string that we know should contain max 16 regular text letters.

With Regexp we could check if the string matches our expectations using this code:

import re
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'
pattern = re.compile(regexp)
print(pattern.match(string1))

The printing will return "None" - meaning that variable string1 is not matching string1.

Performance regexp

Let's now check how to find performance of it.

First we need to change our script into function:

import re
string1 = "Test"

def check_regexp(string_variable):
    regexp = '^[a-zA-Z0-9]{16}$'
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

print(check_regexp(string1))

Now the timeit library comes into play.

time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)

It will check timing of the function.

Let's check performance between the clean-pythonic way and the regexp way:

import re
import timeit
import string
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'

def check_regexp(string_variable):
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

def check_python(string_variable):
    if not len(str(string_variable)) == 16:
        return False
    alphabet = string.ascii_letters + string.digits
    for char in string_variable:
        if char not in alphabet:
            return False
    return True

time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(8)))", setup="from __main__ import check_python", number=100000)
print(8 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(16)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(16)))", setup="from __main__ import check_python", number=100000)
print(16 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(32)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(32)))", setup="from __main__ import check_python", number=100000)
print(32 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(64)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(64)))", setup="from __main__ import check_python", number=100000)
print(64 , time_perform, time_perform2)

And we see that the results are outperforming regexp with clean-pythonic function. It's not like there's huge difference, but check yourself on this results:

8 0.5771335330209695 0.38084229797823355
16 0.818426341022132 0.6164886929909699
32 1.226193142007105 1.0479984590201639
64 2.1883155060058925 1.9493357640167233

It was only on some digits. Let's check now how it will perform on string with digits:

import re
import timeit
import string
string1 = "YetAnotherStr123" # has 16 chars
string2 = "YetAnotherStr123"*2 # has 32 chars
string3 = "YetAnotherStr123"*4 # has 64 chars
string4 = "YetAnoth*rStr123" # has 15 chars and one special
string5 = "YetAnotherStr12*" # has 15 chars and one special
regexp = '^[a-zA-Z0-9]{16}$'

def check_regexp(string_variable):
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

def check_python(string_variable):
    if not len(str(string_variable)) == 16:
        return False
    alphabet = string.ascii_letters + string.digits
    for char in string_variable:
        if char not in alphabet:
            return False
    return True

time_perform = timeit.timeit("check_regexp(string1)", setup="from __main__ import check_regexp, string1", number=100000)
time_perform2 = timeit.timeit("check_python(string1)", setup="from __main__ import check_python, string1", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string2)", setup="from __main__ import check_regexp, string2", number=100000)
time_perform2 = timeit.timeit("check_python(string2)", setup="from __main__ import check_python, string2", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string3)", setup="from __main__ import check_regexp, string3", number=100000)
time_perform2 = timeit.timeit("check_python(string3)", setup="from __main__ import check_python, string3", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string4)", setup="from __main__ import check_regexp, string4", number=100000)
time_perform2 = timeit.timeit("check_python(string4)", setup="from __main__ import check_python, string4", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string5)", setup="from __main__ import check_regexp, string5", number=100000)
time_perform2 = timeit.timeit("check_python(string5)", setup="from __main__ import check_python, string5", number=100000)
print(time_perform, time_perform2)

Well... now we can see what type of problems our clean-pythonic code can have with the results as follows:

0.1520436129940208 0.14696863299468532
0.12985941700753756 0.0332289119833149
0.13141735998215154 0.03479363300721161
0.1242920590157155 0.1090795070049353
0.12702590200933628 0.14172734299791045

It looks like in the string5 version pythonic code underperforms.

For now I don't see any valid solution that could outperform this problem.

I will come back to this if I find any valid solution.

Acknowledgements

Thanks!

That's it :) Comment, share or don't - up to you.

Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.

See you in the next episode! Cheers!



Comments

comments powered by Disqus