In the Making pelican-link-to-title plugin article I've created a neat script that uses <ahref>link</ahref>
and transforms this to <a href="link">TITLE OF LINK</a>
. Today I'm going to analyze what I've done wrong while creating this plugin. Check out.
To The Point
Slowness of plugin
How can you analyze slowness of plugin?
First let's test how much time it will take to generate html content without plugin:
git clone --recursive-modules https://github.com/anselmos/debug_pelican_plugins # or --recursive if you have git < 2.13
cd debug_pelican_plugins
pipenv shell
make html
The last command shows how much time it took to generate pages.
To be able to check performance of plugin I need to add a bit more than only one link.
Let's create a content that will have at least 5 different page links.
Title: first
Date: 2018-03-19 20:30
Status: published
Author: Witkowski Bartosz
Slug: first
# first
# some topic for first
<ahref>http://bbc.com/</ahref>
<ahref>https://stackoverflow.com/</ahref>
<ahref>https://github.com/</ahref>
<ahref>http://abc.go.com/</ahref>
<ahref>https://www.quora.com</ahref>
<ahref>https://www.thegoodwebguide.co.uk/lifestyle/website-reviews/best-sites-news-information/best-alternative-news-sites/13301</ahref>
<ahref>http://wakatime.com/</ahref>
<ahref>http://twitter.com/</ahref>
<ahref>https://www.facebook.com/AnselmosBlog/</ahref>
<ahref>https://tomato-timer.com/</ahref>
And now let's find out how much time it takes to generate content for this article without plugin, and with plugin:
without: ~ 0.15 s. with: ~ 10-12 s ...
And it's after a bit of optimizations with looping.
The biggest bottleneck is the urllib and obtaining page-source with it. There is solution with httplib2 which I'm going to cover in next episode.
This time let's focus on making a type of caching for this requests.
Let's create a redis cache with link-title elements.
Redis
Redis is a key-value pair database server (no-sql). It's fast and reliable. Let's make a query with it in our plugin:
Starting redis from Docker
To start Redis you can install it on system. I prefer use docker instead:
docker run --name pelican-redis -v $(pwd)/redis:/data -d redis redis-server --appendonly yes
and the plugin looks like this:
# -*- coding: utf-8 -*-
""" This is a main script for pelican_link_to_title """
from pelican import signals
from bs4 import BeautifulSoup
import urllib
def link_to_title_plugin(generator):
"Link_to_Title plugin "
article_ahreftag= {}
for article in generator.articles:
soup = BeautifulSoup(article._content, 'html.parser')
ahref_tag = soup.find_all('ahref')
if ahref_tag:
article_ahreftag[article] = (ahref_tag, soup)
for article, (p_tags, soup) in article_ahreftag.items():
for tag in p_tags:
url_page = tag.string
if url_page:
if "http://" in url_page or "https://" in url_page:
tag.name = "a"
tag.string = read_page(url_page)
tag.attrs = {"href": url_page}
else:
continue
article._content = str(soup).decode("utf-8")
def read_page(url_page):
import redis
redconn = redis.Redis(host='localhost', port=6379, db=0)
found = redconn.get(url_page)
if not found:
r = urllib.urlopen(url_page).read()
soup = BeautifulSoup(r , "html.parser")
title = soup.find("title").string
redconn.set(url_page, title)
return title
else:
return found
def register():
""" Registers Plugin """
signals.article_generator_finalized.connect(link_to_title_plugin)
This makes rendering again at the ~0.15 s. :)
This has limitations - first time when links are obtained by urllib it will take ~10-12 seconds.
Thanks!
That's it :) Comment, share or don't - up to you.
Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.
See you in the next episode! Cheers!
Comments
comments powered by Disqus