Scraping website data with PHP

As a PHP programmer, we often need to get some data from another website for some purpose. Getting data from another websites is known as web scraping. Scrapping website data is not an easy task as it creates many challenges.

So if you’re looking for solution to scrape data, then you’re here at the right place. In this tutorial you will learn how to scrape data from website using PHP. We will use HTML DOM parser library to scrape data with PHP.

The tutorial is explained in easy steps with live demo and you can also download the demo source code.

Also, read:

So let’s start the coding. We will have following file structure for data scraping tutorial


  • index.php
  • function.php

Steps1: Create Form To Enter Website URL
As we will handle this tutorial with demo, so first we will create From in index.php with submit button to enter website URL to scrape data.

<form method="post" name="scrap_form" id="scrap_form" action="">   
	<label>Enter Website URL To Scrape Data</label>	
	<input type="input" name="website_url" id="website_url"> 
	<input type="submit" name="submit" value="Submit" >    	
</form>

Steps2: Create PHP Function Get Website Data
Now we will create a PHP function scrapWebsite() in function.php to get website data using PHP HTML DOM parser library to scrape website data with CURL library.

function scrapWebsite($url) {	
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$response = curl_exec($ch);
	curl_close($ch);
	$html = new simple_html_dom();
	$html->load($response);
	return $html;	
}

Steps3: Scrape Particular Data from Website
Now finally we will handle functionality to scrape particular section of page. As mostly we don’t want all data from page, just need section of page or data. So here in this example, we will look for latest posts at PHPZAG.COM. For this we will pass that particular section from which we will get posts details.

So we have created function getPostDetails () to get post details like title, link and image.

function getPostDetails ($html) {
	$titles = array();
	$i = 0 ;
	foreach($html->find('h2[class=entry-title] a') as $post) {		
		$titles[$i]['title'] = $post->plaintext;  
		$titles[$i]['link'] = $post->href;			
		$i++;
	}
	$i = 0 ;
	foreach($html->find('div[class=entry-content] img') as $img) {		
		$titles[$i]['img'] = $img->src;  			
		$i++;
	}
	return $titles;	
}	

The above function will work to list latest posts from PHPZAG.COM. This is really a simple example to get that particular section of page. You can go further to get useful data from websites according to your requirement. For example, you can scrape data from eCommerce websites to get product details, price etc. The point is, once the website data in your hands, you can do whatever you want.


You can also write the scraped data to CSV file. So for this, we have created a function writeToCSV() to write data to CSV file and download.

function writeToCSV ($postDetail) {
	ob_clean();
	ob_start();
	header('Content-Type: application/csv');
	header('Content-Disposition: attachment; filename="output.csv";');
	$f = fopen('php://memory', 'w'); 
	fputcsv($f, array('Title', 'Link', 'image')); 
	foreach ($postDetail as $key => $value){	
		fputcsv($f, $value);
	}
	fseek($f, 0);		
	fpassthru($f);
	fclose($f);
	exit();
}

Now we will check for form submit and call function scrapWebsite() and getPostDetails() to scrape post data and display. You also call function writeToCSV() to write to csv file.

if(isset($_POST['submit']) && $_POST['website_url']){
	$html = scrapWebsite($_POST['website_url']);	
	$postDetail = getPostDetails($html);	
	
	echo "<pre>";
	print_r($postDetail);
	echo "</pre>";
	
	//writeToCSV($postDetail);
} 

You may also like:

You can view the live demo from the Demo link and can download the script from the Download link below.
Demo Download


2 thoughts on “Scraping website data with PHP

  1. new simple_html_dom() where it exists, Sir, I am getting an error. Thanks for entire script

Comments are closed.