Introduction to webscraping in Python

8.11.2018

saara.hukka@gmail.com

What is webscraping?

  • downloading websites programmatically
  • extracting specific information out of the HTML

Scraping in the real world

  • search engines
  • price comparison websites
  • price monitoring websites

Why do I like webscraping?

  • make something of use in only a small amount of code
  • automate the boring things!
  • you can learn more about the web/websites (HTTP, DOM, APIs)

Scraping in the data science pipeline

image.png http://veekaybee.github.io/2017/06/19/data-science-myths/

Ways to collect data

  • APIs
    • a service provides certain methods to interact with it
    • API finder
  • Scraping
    • download the website and extract the interesting information from the HTML code
    • We're going to scrape foodora.at!