Introduction to webscraping in Python


What is webscraping?

  • downloading websites programmatically
  • extracting specific information out of the HTML

Scraping in the real world

  • search engines
  • price comparison websites
  • price monitoring websites

Why do I like webscraping?

  • make something of use in only a small amount of code
  • automate the boring things!
  • you can learn more about the web/websites (HTTP, DOM, APIs)

Scraping in the data science pipeline


Ways to collect data

  • APIs
    • a service provides certain methods to interact with it
    • API finder
  • Scraping
    • download the website and extract the interesting information from the HTML code
    • We're going to scrape!