# An introduction to web scraping: locating Spanish schools

### Abstract

Whenever a new paper is released using some type of scraped data, most of my peers in the social science community get baffled at how researchers can do this. In fact, many scientists can’t even think of research questions that can be addressed with this type of data simply because they don’t know it’s even possible. As the old saying goes, when you have a hammer, every problem looks like a nail. In this tutorial I’ll be guiding you through the basics of web scraping using R and the xml2 package. I’ll begin with a simple example using fake data and elaborate further by trying to scrape the location of a sample of schools in Spain. We will discuss the pros and cons of web scraping and discuss how to scrape data ethically. I assume zero knowledge on the topic, so feel free to come if you’re completely new to the topic.

Date
Event
Location
Room 18.0.A06. Campus de Getafe, Calle Madrid, 126, 28903 Getafe, Madrid, Spain.

### Requirements

We’ll be needing these packages: xml2, httr, tidyverse, sf, rnaturalearth, scrapex. All of those can install from cran, except scrapex, which can be installed with remotes::install_github("cimentadaj/scrapex").