Yesterday I've added into my usual <ahref> </ahref>
html tag https link that failed. I was surprised, but since it was not my main focus then I decided to give it a shot today. Let's check out what is wrong with plugin or what is wrong with the page I used.
To The Point
The failed https page link
The link to the page was actually the Travis-Ci page:
<ahref>https://travis-ci.org/beyondgrep/ack2</ahref>
It will fail with this:
plugins/pelican_link_to_title/pelican_link_to_title.py", line 34, in read_page
r = urllib.urlopen(url_page).read()
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 208, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 437, in open_https
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1220, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 487, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 243, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 405, in do_handshake
self._sslobj.do_handshake()
CRITICAL: IOError: [Errno socket error] [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error
The first thing I've found that may be relevant is to change urllib library into requests
- as it is shown in this issue of pytorch.
Let's check out if it will help.
Replacing urllib into requests.
After replacing urllib with requests (adding import and changing gathering data from page) pelican_link_to_title looks like this now:
# -*- coding: utf-8 -*-
""" This is a main script for pelican_link_to_title """
from pelican import signals
from bs4 import BeautifulSoup
import requests
def link_to_title_plugin(generator):
"Link_to_Title plugin "
article_ahreftag= {}
for article in generator.articles:
soup = BeautifulSoup(article._content, 'html.parser')
ahref_tag = soup.find_all('ahref')
if ahref_tag:
article_ahreftag[article] = (ahref_tag, soup)
for article, (p_tags, soup) in article_ahreftag.items():
for tag in p_tags:
url_page = tag.string
if url_page:
if "http://" in url_page or "https://" in url_page:
tag.name = "a"
tag.string = read_page(url_page)
tag.attrs = {"href": url_page}
else:
continue
article._content = str(soup).decode("utf-8")
def read_page(url_page):
import redis
redconn = redis.Redis(host='localhost', port=6379, db=0)
found = redconn.get(url_page)
if not found:
r = requests.get(url_page).text
soup = BeautifulSoup(r , "html.parser")
title = soup.find("title").string
redconn.set(url_page, title)
return title
else:
return found
def register():
""" Registers Plugin """
signals.article_generator_finalized.connect(link_to_title_plugin)
After a bit of tweaking my pipenv environment I got it finally working.
Firstly I installed only requests
with pipenv install requests
, but I had an issue :
local/lib/python2.7/site-packages/urllib3/util/ssl_.py:339: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
SNIMissingWarning
local/lib/python2.7/site-packages/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecurePlatformWarning
CRITICAL: SSLError: HTTPSConnectionPool(host='travis-ci.org', port=443): Max retries exceeded with url: /beyondgrep/ack2 (Caused by SSLError(SSLError(1, '_ssl.c:510: error:14077438:SS$ routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error'),))
I've found that installing the security addon will fix the above problem - pipenv install requests[security]
.
Check out latest master of the pelican_link_to_title plugin that contains this fix and redis-optimization.
Acknowledgements
Auto-promotion
Related links
- Quickstart — Requests 2.18.4 documentation
- python - InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately - Stack Overflow
- urllib & urllib2 may fail ssl handshaking under a variety of circumstances · Issue #3193 · pytorch/pytorch · GitHub
Thanks!
That's it :) Comment, share or don't - up to you.
Any suggestions what I should blog about? Post me a comment in the box below or poke me at Twitter: @anselmos88.
See you in the next episode! Cheers!
Comments
comments powered by Disqus