Do you have a blog, that lacks search engine that could find in content of your article where you wrote about this and that ? Let's check if Elasticsearch can help with it. Check out!


S0-E22/E30 :)

Elasticsearch store your pelican posts

This blog uses Pelican engine that uses either a reStructuredText or Markdown files which then are transformed into a html with some theme.

Because of that, you can't have a search engine on html files online. Fortunately you can use Elasticsearch, push your rst/md Files and search for specific words in it :)

Let's check how to achieve this.

Making a Search within your own .md files.

If you are so lucky to have some text files, you can start off. For my prof of concept purpose we'll use github repo from pelican blog.

Let's make a script that would put data from md/rst files into Elasticsearch.

Ofcourse let's don't repeat, and reuse what we have learned from yesterday's elasticsearch python client

Listing files with extension in python

import os
from glob import glob
def list_files(path, fileextension):
    return [y for x in os.walk(path) for y in glob(os.path.join(x[0], '*.{}'.format(fileextension)))]

Putting file context into elasticsearch

def import_rst_files():
    all_rst_files = list_files(mypath, "rst")
    docs = []
    for rst_file in all_rst_files:
        client = Elasticsearch('localhost')
        # Readfile content:
        content = read_file_content(rst_file)
        doc = {

            "_index": "blogpost-{}".format("%Y-%m-%d")),
            "_type": "blogpost",
            "_id": rst_file,
            "_source": {
                "author": "PelicanblogAuthors",
                "content": content,
    helpers.bulk(client, docs)

def read_file_content(filename):
    return open(filename, 'r').read().decode('utf-8')

Making a very trivial searching

def search_in_elastic(phrase_to_search="blog"):
    from elasticsearch_dsl import Search
    client = Elasticsearch('localhost')
    s = Search(using=client).query('match', content=phrase_to_search)
    response = s.execute()
    return response

def print_found(search_response):
    print pprint.pformat(search_response.hits.hits)


found = search_in_elastic()

This will list files, read them, put into elasticsearch and then output all found blog-posts with 'blog' phrase.


You can find the code listed above in this repository.



That's it :) Comment, share or don't :)

If you have any suggestions what I should blog about in the next articles - please give me a hint :)

See you tomorrow! Cheers!


comments powered by Disqus