Web Scraping in Laravel using Goutte
The web scraping means gets HTML information from a web-page. Web Scrapping is a easy way to display information of any website to your website. There are tons of libraries to scrap data, In this blog you can learn to scrap data with goutte in laravel. Goutte is a web scraping and the web crawling library. In Goutte we can also scrap data using a particular element i.e. class, id, count, etc. Goutte is built by Symfony developer on Guzzle and Symfony components. In this blog i will show you to scrap a post by its URL `https://usingphp.com/post/get-next-and-previous-post-link-in-post-page`
First we need to Install goutte via the Composer package manager:
composer require fabpot/goutte
After successfully install the package, add the ServiceProvider and Facade to the providers & facades array in config/app.php
ServiceProvider
'Goutte' => Weidner\Goutte\GoutteFacade::class,
Facades
Weidner\Goutte\GoutteServiceProvider::class,
Create a Controller using run below command in terminal
php artisan make:controller TestController
Add Route in web.php
Route::get('test', 'TestController@index');
Add below code in TestController
<?php
namespace App\Http\Controllers;
use Goutte;
use Illuminate\Http\Request;
use Spatie\ArrayToXml\ArrayToXml;
class TestController extends Controller
{
public function index(){
$url = 'https://ecode-learn.com/post/get-next-and-previous-post-link-in-post-page';
$crawler = Goutte::request('GET', $url)
//page title is in h1 tag
//get title of the pageecho $crawler->filter('h1')->first()->text();
//get content of the page
ech $crawler->filter('.
single-page-content')->text();}
}
You can also scrap and get all the child element of the parent using each with goutte. Assume if you want to get all elements of <li> in a <ul> then you can easily get them by each loop.
<?php
$crawler->filter('ul li')->each(function ($node) { echo $node->text(); });
Same if you want to scrap all the images from any particular page, then you can use below code in your controller, it will return all image sources.
<?php
$crawler->filter('img')->each(function ($node) { if ($node->hasAttribute('src')) { return $node->getAttribute('src'); }
});
You can also scrap and get element by last and first child of a parent. I.e. if you want to get the first paragraph from a page then use below code.
<?php
$crawler->filter('p')->eq(0); //first paragraph
$crawler->filter('p')->eq(1); //second paragraph
$crawler->filter('p')->eq(2); //third paragraph
- - - - - - - -AND SO ON- - - - - - - - - - - -
$crawler->filter('p')->eq(n); //n paragraph
Features of Goutte:
- Suitable for large projects
- Better parsing speed
- It is an OOP's based library
- Simply scrap data based on HTML element
The web scraping means gets HTML information from a web-page. Web Scrapping is a easy way to display information of any website to your website. There are tons of libraries to scrap data, In this blog you can learn to scrap data with goutte in laravel. Goutte is a web scraping and the web crawling library. In Goutte we can also scrap data using a particular element i.e. class, id, count, etc. Goutte is built by Symfony developer on Guzzle and Symfony components. In this blog i will show you to scrap a post by its URL `https://usingphp.com/post/get-next-and-previous-post-link-in-post-page`
First we need to Install goutte via the Composer package manager:
composer require fabpot/goutte
After successfully install the package, add the ServiceProvider and Facade to the providers & facades array in config/app.php
ServiceProvider
'Goutte' => Weidner\Goutte\GoutteFacade::class,
Facades
Weidner\Goutte\GoutteServiceProvider::class,
Create a Controller using run below command in terminal
php artisan make:controller TestController
Add Route in web.php
Route::get('test', 'TestController@index');
Add below code in TestController
<?php
namespace App\Http\Controllers;
use Goutte;
use Illuminate\Http\Request;
use Spatie\ArrayToXml\ArrayToXml;
class TestController extends Controller
{
public function index(){
$url = 'https://ecode-learn.com/post/get-next-and-previous-post-link-in-post-page';
$crawler = Goutte::request('GET', $url)
//page title is in h1 tag
//get title of the pageecho $crawler->filter('h1')->first()->text();
//get content of the page
ech $crawler->filter('.
single-page-content')->text();}
}
You can also scrap and get all the child element of the parent using each with goutte. Assume if you want to get all elements of <li> in a <ul> then you can easily get them by each loop.
<?php
$crawler->filter('ul li')->each(function ($node) { echo $node->text(); });
Same if you want to scrap all the images from any particular page, then you can use below code in your controller, it will return all image sources.
<?php
$crawler->filter('img')->each(function ($node) { if ($node->hasAttribute('src')) { return $node->getAttribute('src'); }
});
You can also scrap and get element by last and first child of a parent. I.e. if you want to get the first paragraph from a page then use below code.
<?php
$crawler->filter('p')->eq(0); //first paragraph
$crawler->filter('p')->eq(1); //second paragraph
$crawler->filter('p')->eq(2); //third paragraph
- - - - - - - -AND SO ON- - - - - - - - - - - -
$crawler->filter('p')->eq(n); //n paragraph
Features of Goutte:
- Suitable for large projects
- Better parsing speed
- It is an OOP's based library
- Simply scrap data based on HTML element
Recommanded Articles
- How to create a multilevel category and subcategory in Laravel
- How to check YouTube video exist Laravel validation
- Multiple user roles authentication Laravel 8
- Deploy Laravel project from local to production server
- Make custom pagination URL in Laravel without query strings
- Web Scraping in Laravel using Goutte
- Insert values during migration run laravel
- Validation for string characters only with custom message in Laravel
- Add new columns in a table Laravel
- How to create foreign key constraints in Laravel
Latest Comments
Riya
21 Dec 2020Goutam
22 Dec 2020JOSE ARTURO
12 Sep 2021Rabia khan
27 Dec 2022Rabia khan
27 Dec 2022Palmer
03 Jan 2024Rebecca
06 Jan 2024Tru
12 Jan 2024Marcel
14 Jan 2024Navy
22 Jan 2024