Today more adventures about Grammarly and scraping data from checking-results.
S0-E18/E30 :)
Grammarly results scraping continues
After yesterday's article about possibility to scrape some results from Grammarly checking, let's now make a more advanced scraping :)
Let's make an example with more issues, so we can experiment with BeautifulSoap and scrape more with it.
I've found the "Demo document" that is presented within first usage of Grammarly, contains a lot of issues to present features of Grammarly. So let's use it :)
Scraping:
def tests_all_in_one(self):
from selenium import webdriver
self.driver = webdriver.Firefox()
filename = "demo_document.txt"
demo_data_text = None
with open(filename, 'r+') as f:
demo_data_text = f.read().decode("utf-8")
page_login = GrammarlyLogin(self.driver)
page_login.make_login('za2217279@mvrht.net', 'test123')
page_new_doc = GrammarlyNewDocument(self.driver)
page_new_doc.make_new_document("")
page_doc = GrammarlyDocument(self.driver)
page_doc.put_title("DEMO DATA TEXT")
page_doc.put_text(demo_data_text)
page_source = GrammarlyDocument(self.driver)
actual_source = page_source.get_page_source()
scraper = DocumentScraper(actual_source)
found_issues = scraper.find_all_issues()
assert len(found_issues) == 14
issues_by_type = scraper.return_issues_by_type()
assert len(issues_by_type) == 2
assert u'_ed4374-plainTextTitle' in issues_by_type
assert u'_ed4374-titleReplacement' in issues_by_type
assert len(issues_by_type['_ed4374-plainTextTitle']) == 3
assert len(issues_by_type['_ed4374-titleReplacement']) == 11
And the source of Document Scraper
:
from bs4 import BeautifulSoup
class DocumentScraper(object):
def __init__(self, html_source):
# self.html_source = html_source
self.bs = BeautifulSoup(html_source, "html.parser")
def get_issue_div(self):
# DIV with class=_adbfa1e6-editor-page-cardsCol
return self.bs.find('div', {'class': '_adbfa1e6-editor-page-cardsCol'})
def get_all_warnings(self):
return self.get_issue_div().contents
def get_all_warnings_texts(self):
return [element.text for element in self.get_all_warnings()]
def iterate_over_warnings(self):
for innerelement in self.get_all_warnings():
print innerelement.text
def find_all_issues(self):
return self.bs.findAll('div', {'class': '_ed4374-title'})
def return_issues_by_type(self):
issues = self.find_all_issues()
output = {}
for issue in issues:
key = issue.contents[0].attrs['class'][0]
try:
output[key].append(issue.contents[0].contents)
except KeyError:
output[key] = [issue.contents[0].contents]
return output
Where demo_document.txt
contains "Demo Document" from Grammarly.
This gives you and example of how one can make our previous automation usable and transform it to something more.
Selenium Screenshots for better understanding Grammarly found issues.
Taking into account that there is no Selenium PDF generator - let's at least make a screenshot of page so we can then know where exactly those issues are:
self.driver.save_screenshot('grammarly_checks.png')
self.driver.get_screenshot_as_file('grammarly_checks2.png')
The only problem I've found with this is that it does not make a full-page screenshot. I remember that within Java that trick worked and you could make a full-web-page screenshot (and the webbrowser).
I might comeback to this :)
Acknowledgements
- Taking a Screenshot of Full Page with Selenium Python
- Taking a Screenshot Of a Page
- Take a screenshot with Selenium WebDriver
- Taking a whole page screenshot with Selenium Marionette in Python
- make a full screenshot with selenium in python
Thanks!
That's it :) Comment, share or don't :)
If you have any suggestions what I should blog about in the next articles - please give me a hint :)
See you tomorrow! Cheers!
Comments
comments powered by Disqus