Documentation for crwlr / crawler-ext-browser (v2.2)

Initialize a Session

The InitSession step is designed for scenarios where you need to initiate a session using a (headless) browser, enabling subsequent requests to be made with the more resource-efficient Guzzle HTTP client.

When executed, the InitSession step makes an HTTP request using the Chrome browser, adds the received cookies to the Cookie Jar, and outputs the same URL it received as input. This allows the URL to be used by a subsequent HTTP loading step, which will typically use the more resource-efficient Guzzle HTTP client. When the loader is not configured to use the (headless) browser, it is switched for this step and reverted back afterwards.

use Crwlr\Crawler\HttpCrawler;
use Crwlr\CrawlerExtBrowser\Steps\InitSession;

$crawler = HttpCrawler::make()->withBotUserAgent('MyCrawler');

$crawler
    ->input('https://www.example.com')
    ->addStep(new InitSession())
    ->addStep(Http::get())
    ->addStep(Html::metaData()->only(['title', 'description']));

$crawler->runAndDump();