Generally there are 3 ways to access a site's data:
1. Download the site's web content thorugh HTML
2. Send a query parameter to a website, and pick the result
3. Use XML-based web service to access data and XML parser to acces the result. There are different protocols though, such as SOAP(Simple Object Access Protocol) and REST(Representational State Transfer).
And you will eventually also need:
1. cURL
2. OpenSSL, to access secure sites
3. XML, to parse data from web services
First, create a cURL connection handle to a web page
$cLink = curl_init();
Then use curl_setopt()
to determine the connection setting
curl_setopt($cLink, CURLOPT_URL, 'http://www.google.com');
By default, cURL prints the accessed page instead of returning it as string. If you need to get the data to anlayze it first, use CURLOPT_RETURNTRANSFER
option.
curl_setopt($cLink, CURLOPT_RETURNRANSFER, true);
Now, access the page
$page_data = curl_exec($cLink);
And don't forget to clean up the connection
curl_close($cLink)
Those steps enable you to access web data via GET
method. If you want to use POST
to submit form data you'll need to set more options with curl_setopt()
. Now, this is an example of a function that can access pages via GET
or POST
method
function get_page_content($url, $posts = null){ $query = null; if(!is_null($posts)){ if(!is_array($posts)){ die('POST parameters must in array format'); } $query = http_build_query($posts); }
And here's the cURL connection handle. We know that it's a
POST
request when there's a query string, so we configure the connection according to that and use the query string as POST
data
$cLink = curl_init(); if($query){ curl_setopt($cLink, CURLOPT_POST, true); curl_setopt($cLink, CURLOPT_POSTFIELDS, $query); }
Then we configure the connection, execute the query, and return the data
curl_setopt($cLink, CURLOPT_URL, $url); curl_setopt($cLink, CURLOPT_HEADER, false); curl_setopt($cLink, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($cLink); curl_close($cLink); return $data; }
And here's the usage example:
echo get_page_content('http://search.yahoo.com/search", array("p" => "codeaway');
Hope it helps.
0 comments:
Post a Comment