There are re library that gives opportunity to use regexp. With them you can make some of the string-checks that using clean-pythonic way would be a bit more complex.

Let's check out if this is really a kind of bottleneck that can underperform in some edge-cases!

To The Point

Regexp example.

Let's say that we have a string that we know should contain max 16 regular text letters.

With Regexp we could check if the string matches our expectations using this code:

import re
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'
pattern = re.compile(regexp)
print(pattern.match(string1))

The printing will return "None" - meaning that variable string1 is not matching string1.

Performance regexp

Let's now check how to find performance of it.

First we need to change our script into function:

import re
string1 = "Test"

def check_regexp(string_variable):
    regexp = '^[a-zA-Z0-9]{16}$'
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

print(check_regexp(string1))

Now the timeit library comes into play.

time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)

It will check timing of the function.

Let's check performance between the clean-pythonic way and the regexp way:

import re
import timeit
import string
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'

def check_regexp(string_variable):
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

def check_python(string_variable):
    if not len(str(string_variable)) == 16:
        return False
    alphabet = string.ascii_letters + string.digits
    for char in string_variable:
        if char not in alphabet:
            return False
    return True

time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(8)))", setup="from __main__ import check_python", number=100000)
print(8 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(16)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(16)))", setup="from __main__ import check_python", number=100000)
print(16 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(32)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(32)))", setup="from __main__ import check_python", number=100000)
print(32 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(64)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(64)))", setup="from __main__ import check_python", number=100000)
print(64 , time_perform, time_perform2)

And we see that the results are outperforming regexp with clean-pythonic function. It's not like there's huge difference, but check yourself on this results:

8 0.5771335330209695 0.38084229797823355
16 0.818426341022132 0.6164886929909699
32 1.226193142007105 1.0479984590201639
64 2.1883155060058925 1.9493357640167233

It was only on some digits. Let's check now how it will perform on string with digits:

import re
import timeit
import string
string1 = "YetAnotherStr123" # has 16 chars
string2 = "YetAnotherStr123"*2 # has 32 chars
string3 = "YetAnotherStr123"*4 # has 64 chars
string4 = "YetAnoth*rStr123" # has 15 chars and one special
string5 = "YetAnotherStr12*" # has 15 chars and one special
regexp = '^[a-zA-Z0-9]{16}$'

def check_regexp(string_variable):
    pattern = re.compile(regexp)
    return pattern.match(string_variable)

def check_python(string_variable):
    if not len(str(string_variable)) == 16:
        return False
    alphabet = string.ascii_letters + string.digits
    for char in string_variable:
        if char not in alphabet:
            return False
    return True

time_perform = timeit.timeit("check_regexp(string1)", setup="from __main__ import check_regexp, string1", number=100000)
time_perform2 = timeit.timeit("check_python(string1)", setup="from __main__ import check_python, string1", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string2)", setup="from __main__ import check_regexp, string2", number=100000)
time_perform2 = timeit.timeit("check_python(string2)", setup="from __main__ import check_python, string2", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string3)", setup="from __main__ import check_regexp, string3", number=100000)
time_perform2 = timeit.timeit("check_python(string3)", setup="from __main__ import check_python, string3", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string4)", setup="from __main__ import check_regexp, string4", number=100000)
time_perform2 = timeit.timeit("check_python(string4)", setup="from __main__ import check_python, string4", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string5)", setup="from __main__ import check_regexp, string5", number=100000)
time_perform2 = timeit.timeit("check_python(string5)", setup="from __main__ import check_python, string5", number=100000)
print(time_perform, time_perform2)

Well... now we can see what type of problems our clean-pythonic code can have with the results as follows:

0.1520436129940208 0.14696863299468532
0.12985941700753756 0.0332289119833149
0.13141735998215154 0.03479363300721161
0.1242920590157155 0.1090795070049353
0.12702590200933628 0.14172734299791045

It looks like in the string5 version pythonic code underperforms.

For now I don't see any valid solution that could outperform this problem.

I will come back to this if I find any valid solution.

Acknowledgements

ToThePoint

Thanks!

That's it :) Comment, share or don't - up to you.

Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.

See you in the next episode! Cheers!

Regexp vs clean python code performance check

To The Point

Regexp example.

Performance regexp

Acknowledgements

Thanks!

Comments

To The Point

Regexp example.

Performance regexp

Acknowledgements

Thanks!

Related Posts:

Comments