Introduction to webscraping in Python
¶
8.11.2018
¶
saara.hukka@gmail.com
¶
What is webscraping?
¶
downloading websites programmatically
extracting specific information out of the HTML
Scraping in the real world
¶
search engines
price comparison websites
price monitoring websites
Why do I like webscraping?
¶
make something of use in only a small amount of code
automate the boring things!
you can learn more about the web/websites (HTTP, DOM, APIs)
Scraping in the data science pipeline
¶
http://veekaybee.github.io/2017/06/19/data-science-myths/
Ways to collect data
¶
Data sets
Google's dataset search
Open Data Vienna
APIs
a service provides certain methods to interact with it
API finder
Scraping
download the website and extract the interesting information from the HTML code
We're going to scrape foodora.at!