Web Scraping in Laravel using Goutte

Web Scraping in Laravel using Goutte

The web scraping means gets HTML information from a web-page. Web Scrapping is a easy way to display information of any website to your website. There are tons of libraries to scrap data, In this blog you can learn to scrap data with goutte in laravel. Goutte is a web scraping and the web crawling library. In Goutte we can also scrap data using a particular element i.e. class, id, count, etc. Goutte is built by Symfony developer on Guzzle and Symfony components. In this blog i will show you to scrap a post by its URL `https://usingphp.com/post/get-next-and-previous-post-link-in-post-page`

First we need to Install goutte via the Composer package manager:

composer require fabpot/goutte

After successfully install the package, add the ServiceProvider and Facade to the providers & facades array in config/app.php

ServiceProvider

Also Read: How to create a cron job in wordpress
'Goutte' => Weidner\Goutte\GoutteFacade::class,

Facades

Weidner\Goutte\GoutteServiceProvider::class,

Create a Controller using run below command in terminal

php artisan make:controller TestController

Add Route in web.php

Route::get('test', 'TestController@index');

Add below code in TestController

<?php

namespace App\Http\Controllers;
use Goutte;
use Illuminate\Http\Request;
use Spatie\ArrayToXml\ArrayToXml;
class TestController extends Controller

{
    public function index(){
        $url = 'https://ecode-learn.com/post/get-next-and-previous-post-link-in-post-page';
        $crawler = Goutte::request('GET', $url)
        //page title is in h1 tag
//get title of the page

        echo $crawler->filter('h1')->first()->text();

        //get content of the page
        ech $crawler->filter('.single-page-content')->text();
    }
}

You can also scrap and get all the child element of the parent using each with goutte. Assume if you want to get all elements of <li> in a <ul> then you can easily get them by each loop.

<?php
$crawler->filter('ul li')->each(function ($node) { echo $node->text(); });

Same if you want to scrap all the images from any particular page, then you can use below code in your controller, it will return all image sources.

<?php
$crawler->filter('img')->each(function ($node) { if ($node->hasAttribute('src')) { return $node->getAttribute('src'); }
});

You can also scrap and get element by last and first child of a parent. I.e. if you want to get the first paragraph from a page then use below code.

Also Read: Insert values during migration run laravel
<?php
$crawler->filter('p')->eq(0); //first paragraph
$crawler->filter('p')->eq(1); //second paragraph
$crawler->filter('p')->eq(2); //third paragraph
- - - - - - - -AND SO ON- - - - - - - - - - - -
$crawler->filter('p')->eq(n); //n paragraph

Features of Goutte:

  • Suitable for large projects
  • Better parsing speed
  • It is an OOP's based library
  • Simply scrap data based on HTML element

The web scraping means gets HTML information from a web-page. Web Scrapping is a easy way to display information of any website to your website. There are tons of libraries to scrap data, In this blog you can learn to scrap data with goutte in laravel. Goutte is a web scraping and the web crawling library. In Goutte we can also scrap data using a particular element i.e. class, id, count, etc. Goutte is built by Symfony developer on Guzzle and Symfony components. In this blog i will show you to scrap a post by its URL `https://usingphp.com/post/get-next-and-previous-post-link-in-post-page`

First we need to Install goutte via the Composer package manager:

composer require fabpot/goutte

After successfully install the package, add the ServiceProvider and Facade to the providers & facades array in config/app.php

ServiceProvider

Also Read: How to create a cron job in wordpress
'Goutte' => Weidner\Goutte\GoutteFacade::class,

Facades

Weidner\Goutte\GoutteServiceProvider::class,

Create a Controller using run below command in terminal

php artisan make:controller TestController

Add Route in web.php

Route::get('test', 'TestController@index');

Add below code in TestController

<?php

namespace App\Http\Controllers;
use Goutte;
use Illuminate\Http\Request;
use Spatie\ArrayToXml\ArrayToXml;
class TestController extends Controller

{
    public function index(){
        $url = 'https://ecode-learn.com/post/get-next-and-previous-post-link-in-post-page';
        $crawler = Goutte::request('GET', $url)
        //page title is in h1 tag
//get title of the page

        echo $crawler->filter('h1')->first()->text();

        //get content of the page
        ech $crawler->filter('.single-page-content')->text();
    }
}

You can also scrap and get all the child element of the parent using each with goutte. Assume if you want to get all elements of <li> in a <ul> then you can easily get them by each loop.

<?php
$crawler->filter('ul li')->each(function ($node) { echo $node->text(); });

Same if you want to scrap all the images from any particular page, then you can use below code in your controller, it will return all image sources.

<?php
$crawler->filter('img')->each(function ($node) { if ($node->hasAttribute('src')) { return $node->getAttribute('src'); }
});

You can also scrap and get element by last and first child of a parent. I.e. if you want to get the first paragraph from a page then use below code.

Also Read: Insert values during migration run laravel
<?php
$crawler->filter('p')->eq(0); //first paragraph
$crawler->filter('p')->eq(1); //second paragraph
$crawler->filter('p')->eq(2); //third paragraph
- - - - - - - -AND SO ON- - - - - - - - - - - -
$crawler->filter('p')->eq(n); //n paragraph

Features of Goutte:

  • Suitable for large projects
  • Better parsing speed
  • It is an OOP's based library
  • Simply scrap data based on HTML element

Please let me know what your thoughts or comments are on this article. If you have any suggestion or found any mistake in this article then please let us know.

Latest Comments

Riya
Riya
21 Dec 2020

Working on first attempt. very simple and easy. Thanks

Goutam
Goutam
22 Dec 2020

Goutte is best way to scrap a website

JOSE ARTURO
JOSE ARTURO
12 Sep 2021

doesnt work for sites that render the DOM using javascript

Rabia khan
Rabia khan
27 Dec 2022

i try first time this pakge. its worked first attempt no error occuer its very good

Rabia khan
Rabia khan
27 Dec 2022

i try first time this pakge. its worked first attempt no error occuer its very good

Palmer
Palmer
03 Jan 2024

Estrella Velasquez

Rebecca
Rebecca
06 Jan 2024

Anne Rowe

Tru
Tru
12 Jan 2024

Guinevere Gallegos

Marcel
Marcel
14 Jan 2024

Michelle Little

Navy
Navy
22 Jan 2024

Jericho Cain

Add your comment

Close