User Agents
User agents are very simple. The basic
UserAgentInterface
only defines
that implementations need to have a __toString()
method.
The HttpLoader
sends that
string as User-Agent HTTP Header
with every request.
If you want to just use some specific browser user agent
you can do it like this in your Crawler
class:
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\UserAgents\UserAgent;
use Crwlr\Crawler\UserAgents\UserAgentInterface;
class MyCrawler extends HttpCrawler
{
protected function userAgent(): UserAgentInterface
{
return new UserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:99.0) Gecko/20100101 Firefox/99.0'
);
}
}
Bot User Agent
If you want to be polite and identify as a bot, you can
use the BotUserAgent
to do so.
It can be created with at least the name of your bot, but
you can also add an url where you provide infos about
your crawler and a version number.
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\UserAgents\BotUserAgent;
use Crwlr\Crawler\UserAgents\UserAgentInterface;
class MyCrawler extends HttpCrawler
{
protected function userAgent(): UserAgentInterface
{
return new BotUserAgent('MyBot', 'https://www.example.com/my-bot', '1.2');
}
}
The toString()
method of the BotUserAgent
will return
this user-agent string:
Mozilla/5.0 (compatible; MyBot/1.2; +https://www.example.com/my-bot)