There are re library that gives opportunity to use regexp. With them you can make some of the string-checks that using clean-pythonic way would be a bit more complex.
Let's check out if this is really a kind of bottleneck that can underperform in some edge-cases!
To The Point
Regexp example.
Let's say that we have a string that we know should contain max 16 regular text letters.
With Regexp we could check if the string matches our expectations using this code:
import re
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'
pattern = re.compile(regexp)
print(pattern.match(string1))
The printing will return "None" - meaning that variable string1
is not matching string1.
Performance regexp
Let's now check how to find performance of it.
First we need to change our script into function:
import re
string1 = "Test"
def check_regexp(string_variable):
regexp = '^[a-zA-Z0-9]{16}$'
pattern = re.compile(regexp)
return pattern.match(string_variable)
print(check_regexp(string1))
Now the timeit library comes into play.
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)
It will check timing of the function.
Let's check performance between the clean-pythonic way and the regexp way:
import re
import timeit
import string
string1 = "Test"
regexp = '^[a-zA-Z0-9]{16}$'
def check_regexp(string_variable):
pattern = re.compile(regexp)
return pattern.match(string_variable)
def check_python(string_variable):
if not len(str(string_variable)) == 16:
return False
alphabet = string.ascii_letters + string.digits
for char in string_variable:
if char not in alphabet:
return False
return True
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(8)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(8)))", setup="from __main__ import check_python", number=100000)
print(8 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(16)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(16)))", setup="from __main__ import check_python", number=100000)
print(16 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(32)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(32)))", setup="from __main__ import check_python", number=100000)
print(32 , time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(''.join(str(n) for n in range(64)))", setup="from __main__ import check_regexp", number=100000)
time_perform2 = timeit.timeit("check_python(''.join(str(n) for n in range(64)))", setup="from __main__ import check_python", number=100000)
print(64 , time_perform, time_perform2)
And we see that the results are outperforming regexp with clean-pythonic function. It's not like there's huge difference, but check yourself on this results:
8 0.5771335330209695 0.38084229797823355
16 0.818426341022132 0.6164886929909699
32 1.226193142007105 1.0479984590201639
64 2.1883155060058925 1.9493357640167233
It was only on some digits. Let's check now how it will perform on string with digits:
import re
import timeit
import string
string1 = "YetAnotherStr123" # has 16 chars
string2 = "YetAnotherStr123"*2 # has 32 chars
string3 = "YetAnotherStr123"*4 # has 64 chars
string4 = "YetAnoth*rStr123" # has 15 chars and one special
string5 = "YetAnotherStr12*" # has 15 chars and one special
regexp = '^[a-zA-Z0-9]{16}$'
def check_regexp(string_variable):
pattern = re.compile(regexp)
return pattern.match(string_variable)
def check_python(string_variable):
if not len(str(string_variable)) == 16:
return False
alphabet = string.ascii_letters + string.digits
for char in string_variable:
if char not in alphabet:
return False
return True
time_perform = timeit.timeit("check_regexp(string1)", setup="from __main__ import check_regexp, string1", number=100000)
time_perform2 = timeit.timeit("check_python(string1)", setup="from __main__ import check_python, string1", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string2)", setup="from __main__ import check_regexp, string2", number=100000)
time_perform2 = timeit.timeit("check_python(string2)", setup="from __main__ import check_python, string2", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string3)", setup="from __main__ import check_regexp, string3", number=100000)
time_perform2 = timeit.timeit("check_python(string3)", setup="from __main__ import check_python, string3", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string4)", setup="from __main__ import check_regexp, string4", number=100000)
time_perform2 = timeit.timeit("check_python(string4)", setup="from __main__ import check_python, string4", number=100000)
print(time_perform, time_perform2)
time_perform = timeit.timeit("check_regexp(string5)", setup="from __main__ import check_regexp, string5", number=100000)
time_perform2 = timeit.timeit("check_python(string5)", setup="from __main__ import check_python, string5", number=100000)
print(time_perform, time_perform2)
Well... now we can see what type of problems our clean-pythonic code can have with the results as follows:
0.1520436129940208 0.14696863299468532
0.12985941700753756 0.0332289119833149
0.13141735998215154 0.03479363300721161
0.1242920590157155 0.1090795070049353
0.12702590200933628 0.14172734299791045
It looks like in the string5 version pythonic code underperforms.
For now I don't see any valid solution that could outperform this problem.
I will come back to this if I find any valid solution.
Acknowledgements
Thanks!
That's it :) Comment, share or don't - up to you.
Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.
See you in the next episode! Cheers!
Comments
comments powered by Disqus