A really quick tutorial
Prerequisites: Knowledge of React.js will be required for this tutorial.
Let’s say you want to pull data from the frontend of a website because there’s no API available. You inspect the page and see that the data is available in the HTML, so how do you gather that information to be used in your app? It’s rather simple, we’re going to install two libraries and write less than 50 lines of code to demonstrate the scraping of a website. To keep this tutorial simple, we’ll use https://pokedex.org/ as our example.
- In terminal:
npm i request-promise
npm i cheerio
2. We’re going to start by using request-promise to get the HTML from https://pokedex.org/ into a console log.
3. Sometimes you may come across a CORS error blocking you from fetching. For demonstration purposes, try fetching pokemon.com
You should see an error like this in the console:
4. You can get around CORS by using https://cors-anywhere.herokuapp.com. Simply add that URL before your desired fetch URL like so:
Now you should be able to see the HTML from pokemon.com show in your console.
5. But we won’t have to use cors-anywhere for
rp("https://pokedex.org/"), so let’s proceed
6. Now that we have the HTML, let’s use the cheerio library to help us grab the exact data that we want from desired element tags. In this example, we’ll grab all the names of the pokemon then display them in a list.
7. You should see a list of all the pokemon names display onto your screen:
It’s that simple! You scraped those names from the HTML without having to directly access any backend. Now try scraping the examples on http://toscrape.com/ for practice. Enjoy your new abilities!