Web Scraping with a Headless Browser: A Puppeteer Tutorial

, Software Pundits
This post was originally published on this site

Toptal

In this article, we’ll see how easy it is to perform web scraping (web automation) with the somewhat non-traditional method of using a headless browser.

What Is a Headless Browser and Why Is It Needed?

The last few years have seen the web evolve from simplistic websites built with bare HTML and CSS. Now there are much more interactive web apps with beautiful UIs, which are often built with frameworks such as Angular or React. In other words, nowadays JavaScript rules the web, including almost everything you interact with on websites.

For our purposes, JavaScript is a client-side language. The server returns JavaScript files or scripts injected into an HTML response, and the browser processes it. Now, this is a problem if we are doing some kind of web scraping or web automation because more times than not, the content that we’d like to see or scrape is actually rendered

To read the full article click on the 'post' link at the top.