Web page scraping

The regular expression meta characters we will be using (.*)

The dot (.) stands for any character while the asterisks (*) stands for 0 or more characters. When both are combined (.*) you are letting the system know that you are looking for any set of characters with a length of 0 or more.

we will be using 3 functions in order to extract our data. The first function is our file_get_contents() function which will get the desired page and input all of its contents and html into a string format. The second function we will be using is our preg_match() function which will return us one result when given the regular expression code. The final function we will be using is preg_match_all() which works the same as preg_match() just that preg_match_all() will return more then 1 result.

 

[code]

$url = “index.html”;
$content = file_get_contents($url);
$patten = ‘!(.*)!’;
preg_match_all($patten, $content, $data);
print($data);

[/code]

Example:

Download: scrape

Leave a Reply

Your email address will not be published. Required fields are marked *


*