Yesterday I've added into my usual <ahref> </ahref> html tag https link that failed. I was surprised, but since it was not my main focus then I decided to give it a shot today. Let's check out what is wrong with plugin or what is wrong with the page I used.

To The Point

The failed https page link

The link to the page was actually the Travis-Ci page: <ahref>https://travis-ci.org/beyondgrep/ack2</ahref>

It will fail with this:

plugins/pelican_link_to_title/pelican_link_to_title.py", line 34, in read_page
    r = urllib.urlopen(url_page).read()
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
    return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 208, in open
    return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 437, in open_https
    h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
File "/usr/lib/python2.7/httplib.py", line 1220, in connect
    self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 487, in wrap_socket
    ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 243, in __init__
    self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 405, in do_handshake
    self._sslobj.do_handshake()
CRITICAL: IOError: [Errno socket error] [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error

The first thing I've found that may be relevant is to change urllib library into requests - as it is shown in this issue of pytorch.

Let's check out if it will help.

Replacing urllib into requests.

After replacing urllib with requests (adding import and changing gathering data from page) pelican_link_to_title looks like this now:

# -*- coding: utf-8 -*-
""" This is a main script for pelican_link_to_title """
from pelican import signals
from bs4 import BeautifulSoup
import requests


def link_to_title_plugin(generator):
    "Link_to_Title plugin "
    article_ahreftag= {}
    for article in generator.articles:
        soup = BeautifulSoup(article._content, 'html.parser')
        ahref_tag = soup.find_all('ahref')
        if ahref_tag:
            article_ahreftag[article] = (ahref_tag, soup)

    for article, (p_tags, soup) in article_ahreftag.items():
        for tag in p_tags:
            url_page = tag.string
            if url_page:
                if "http://" in url_page or "https://" in url_page:
                    tag.name = "a"
                    tag.string = read_page(url_page)
                    tag.attrs = {"href": url_page}
            else:
                continue
        article._content = str(soup).decode("utf-8")

def read_page(url_page):
    import redis
    redconn = redis.Redis(host='localhost', port=6379, db=0)
    found = redconn.get(url_page)
    if not found:
        r = requests.get(url_page).text
        soup = BeautifulSoup(r , "html.parser")
        title = soup.find("title").string
        redconn.set(url_page, title)
        return title
    else:
        return found

def register():
    """ Registers Plugin """
    signals.article_generator_finalized.connect(link_to_title_plugin)

After a bit of tweaking my pipenv environment I got it finally working.

Firstly I installed only requests with pipenv install requests, but I had an issue :

local/lib/python2.7/site-packages/urllib3/util/ssl_.py:339: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings                  
    SNIMissingWarning


local/lib/python2.7/site-packages/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
    InsecurePlatformWarning

CRITICAL: SSLError: HTTPSConnectionPool(host='travis-ci.org', port=443): Max retries exceeded with url: /beyondgrep/ack2 (Caused by SSLError(SSLError(1, '_ssl.c:510: error:14077438:SS$ routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error'),))

I've found that installing the security addon will fix the above problem - pipenv install requests[security].

Check out latest master of the pelican_link_to_title plugin that contains this fix and redis-optimization.

Acknowledgements

Auto-promotion

Related links

Thanks!

That's it :) Comment, share or don't - up to you.

Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.

See you in the next episode! Cheers!



Comments

comments powered by Disqus