Web Scraping with Real-Time ScrapeStack REST API using PHP

Web scraping (also known as data extraction or data mining) allows to scrape data from web pages. The web scraping can be done manually, but usually refers to automated processes by bots or crawlers. The REST API interfaces are created to scrape website data.

So if you’re looking for solution to scrape web pages using Web Scraping API, then you’re here at right place. In this tutorial you will learn about ScrapeStack Web Scraping REST API to get content from web pages. The API helps to scrape data from websites such as SEO meta tags, body content, Amazon products, reviews and more.

Also, read:

The ScrapeStack is a real-time REST API interface that scrape data from web pages without making any change for Geo locations, IP blocks or CAPTCHAs. It supports features essential to web scraping, such as JavaScript rendering, custom HTTP headers, various geo-targets, POST/PUT requests and an option to use premium residential proxies instead of data center proxies. The ScrapeStack Web scraping API is available in PHP, Python, Nodejs, jQuery, Go and Ruby.

ScrapeStack API Features

There are following features provided by ScrapeStack:


  • Powerful web scraping engines.
  • Extensive pool of data center and residential IP addresses across dozens of global ISPs, supporting real devices, smart retries and IP rotation.
  • Fast web page scraping with advanced features like concurrent API requests, CAPTCHA solving, browser support and JS rendering.
  • Allow JavaScript rendering before delivering the final scraping result.
  • Handling Proxy Locations by auto-rotate IP addresses in a way that the same IP address is never used twice in a row.
  • Available support for different languages such as PHP, Python, Nodejs, jQuery, Go and Ruby.

Integrate ScrapeStack API with PHP

Now we will integrate ScrapeStack API using PHP to scrape web pages. The documentation is supportive and we can integrate API easily using any language. We will cover API integration step by step to get API access key and integrate API.

Step1: Get API Access Key
First we need to create an account on ScrapeStack to get unique API access key. After creating account, we will get API Access key from dashboard to authenticate with the API. We will pass the API access key to access_key parameter. We will also pass the website or web page URL from which we want to scrape data. We will pass access_key and url with their values like below.

<?php
$queryString = http_build_query([
'access_key' => 'YOUR_ACCESS_KEY',
'url' => 'WEBSITE_URL',
'render_js' => 1,
'keep_headers' => 1,
]);

We can set the render_js option as 1 if we want to render JavaScript on the target web page. The default value of render_js option is 0. The JavaScript rendering is done using a Google Chrome headless browser.

We can also set keep_headers option as 1 if we wants to send currently active HTTP headers to the target URL with your API request and have the API return these headers along with your API response. the default value of this is 0. There are also other options to set. You can check the documentation here.


Step2: Make HTTP Request to ScrapeStack API with PHP
Now will integrate ScrapeStack API with PHP and HTTP request to API http://api.scrapestack.com/scrape with PHP Curl. We will pass query string with access_key and url of the web page to scrape data and get scrape the data.

<?php
$queryString = http_build_query([
'access_key' => 'YOUR_ACCESS_KEY',
'url' => 'WEBSITE_URL',
'render_js' => 1,
'keep_headers' => 1,
]);
$ch = curl_init(sprintf('%s?%s', 'http://api.scrapestack.com/scrape', $queryString));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$websiteContent = curl_exec($ch);
curl_close($ch);
echo $websiteContent;

Step3: Conclusion
In this tutorial you have learned how to integrate ScrapeStack web scarping API using PHP. You can also gone through documentation to integrate in other programming languages like Python, Nodejs, jQuery, Go and Ruby.

You may also like: