Documentation for crwlr / crawler-ext-browser (v2.0)

Custom Browser Steps

If you want to create your own custom step utilizing a headless browser, this extension provides the abstract BrowserBaseStep class, which you can extend for this purpose.

use Crwlr\CrawlerExtBrowser\Steps\BrowserBaseStep;

class Click extends BrowserBaseStep
{
    protected function invoke(mixed $input): Generator
    {
        $this->_switchLoaderBefore();

        // Load the input URI, using the loader.
        $response = $this->loader->load($input);

        if ($response) {
            // If loading was successful, get the opened page from the browser...
            $page = $this->loader->browser()->getOpenPage();

            // ...find an element matching the CSS selector a#foo and click it...
            $page->mouse()->find('a#foo')->click();

            // ...and after that, get the HTML of the page.
            yield $page->getHtml();
        }

        $this->_switchLoaderAfterwards();
    }
}

The _switchLoaderBefore() method ensures that the loader uses the headless browser in this step. If it was previously configured to use the (guzzle) HTTP client, it switches back to this configuration when the _switchLoaderAfterwards() method is called.

Through the browser helper, accessible via the loader ($this->loader->browser()), you can interact with a loaded page. The getOpenPage() method returns a HeadlessChromium\Page object. To see what you can do with it, check out the chrome-php readme.

If you don't want to start by using the loader, to load a page, you are free to directly use the chrome-php library within the Step's invoke() method. In this case, you also don't need to extend the BrowserBaseStep class.