In the Making pelican-link-to-title plugin article I've created a neat script that uses <ahref>link</ahref> and transforms this to <a href="link">TITLE OF LINK</a>. Today I'm going to analyze what I've done wrong while creating this plugin. Check out.

To The Point

Slowness of plugin

How can you analyze slowness of plugin?

First let's test how much time it will take to generate html content without plugin:

git clone --recursive-modules https://github.com/anselmos/debug_pelican_plugins # or --recursive if you have git < 2.13
cd debug_pelican_plugins
pipenv shell
make html

The last command shows how much time it took to generate pages.

To be able to check performance of plugin I need to add a bit more than only one link.

Let's create a content that will have at least 5 different page links.

Title: first
Date: 2018-03-19 20:30
Status: published
Author: Witkowski Bartosz
Slug: first
# first

# some topic for first


<ahref>http://bbc.com/</ahref>
<ahref>https://stackoverflow.com/</ahref>
<ahref>https://github.com/</ahref>
<ahref>http://abc.go.com/</ahref>
<ahref>https://www.quora.com</ahref>
<ahref>https://www.thegoodwebguide.co.uk/lifestyle/website-reviews/best-sites-news-information/best-alternative-news-sites/13301</ahref>
<ahref>http://wakatime.com/</ahref>
<ahref>http://twitter.com/</ahref>
<ahref>https://www.facebook.com/AnselmosBlog/</ahref>
<ahref>https://tomato-timer.com/</ahref>

And now let's find out how much time it takes to generate content for this article without plugin, and with plugin:

without: ~ 0.15 s. with: ~ 10-12 s ...

And it's after a bit of optimizations with looping.

The biggest bottleneck is the urllib and obtaining page-source with it. There is solution with httplib2 which I'm going to cover in next episode.

This time let's focus on making a type of caching for this requests.

Let's create a redis cache with link-title elements.

Redis

Redis is a key-value pair database server (no-sql). It's fast and reliable. Let's make a query with it in our plugin:

Starting redis from Docker

To start Redis you can install it on system. I prefer use docker instead:

docker run --name pelican-redis -v $(pwd)/redis:/data -d redis redis-server --appendonly yes

and the plugin looks like this:

# -*- coding: utf-8 -*-
""" This is a main script for pelican_link_to_title """
from pelican import signals
from bs4 import BeautifulSoup
import urllib


def link_to_title_plugin(generator):
    "Link_to_Title plugin "
    article_ahreftag= {}
    for article in generator.articles:
        soup = BeautifulSoup(article._content, 'html.parser')
        ahref_tag = soup.find_all('ahref')
        if ahref_tag:
            article_ahreftag[article] = (ahref_tag, soup)

    for article, (p_tags, soup) in article_ahreftag.items():
        for tag in p_tags:
            url_page = tag.string
            if url_page:
                if "http://" in url_page or "https://" in url_page:
                    tag.name = "a"
                    tag.string = read_page(url_page)
                    tag.attrs = {"href": url_page}
            else:
                continue
        article._content = str(soup).decode("utf-8")

def read_page(url_page):
    import redis
    redconn = redis.Redis(host='localhost', port=6379, db=0)
    found = redconn.get(url_page)
    if not found:
        r = urllib.urlopen(url_page).read()
        soup = BeautifulSoup(r , "html.parser")
        title = soup.find("title").string
        redconn.set(url_page, title)
        return title
    else:
        return found

def register():
    """ Registers Plugin """
    signals.article_generator_finalized.connect(link_to_title_plugin)

This makes rendering again at the ~0.15 s. :)

This has limitations - first time when links are obtained by urllib it will take ~10-12 seconds.

Thanks!

That's it :) Comment, share or don't - up to you.

Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.

See you in the next episode! Cheers!



Comments

comments powered by Disqus