Step Output Filters
Any step that extends the abstract Step
class shipped with
the package, has the where()
and orWhere()
methods to
filter its outputs. Here's an example how to use it:
use Crwlr\Crawler\Steps\Filters\Filter;
use Crwlr\Crawler\Steps\Json;
$json = <<<JSON
{
"queenAlbums": [
{ "title": "Queen", "year": 1973, "charts": { "uk": 24, "us": 83 } },
{ "title": "Queen II", "year": 1974, "charts": { "uk": 5, "us": 49 } },
{ "title": "A Night at the Opera", "year": 1975, "charts": { "uk": 1, "us": 4 } },
{ "title": "A Day at the Races", "year": 1976, "charts": { "uk": 1, "us": 5 } },
{ "title": "The Game", "year": 1980, "charts": { "uk": 1, "us": 1 } },
{ "title": "A Kind of Magic", "year": 1986, "charts": { "uk": 1, "us": 46 } }
]
}
JSON;
$crawler = new MyCrawler();
$crawler->input($json);
$crawler->addStep(
Json::each('queenAlbums', ['title', 'year', 'chartsUK' => 'charts.uk', 'chartsUS' => 'charts.us'])
->where('year', Filter::greaterThan(1979))
->where('chartsUS', Filter::equal(1))
);
As you can see, you always need to provide a Filter
object.
But that shouldn't be too complicated, as there is a static
method for any available filter on that class.
In the example, the result will be only the album "The Game", as it's the only one from the list from after 1979 and reaching #1 in the US charts.
The first parameter is the key in the step's output array
(or object). If the step outputs only a single, non
array/object value, you can just give it only the Filter
:
use Crwlr\Crawler\Steps\Filters\Filter;
use Crwlr\Crawler\Steps\Html;
$crawler->addStep(
Html::getLink('.linkClass')
->where(Filter::urlDomain('crwlr.software'))
);
As mentioned, there is also orWhere
. So in the same
example as above you can also do:
use Crwlr\Crawler\Steps\Filters\Filter;
use Crwlr\Crawler\Steps\Json;
$crawler->addStep(
Json::each('queenAlbums', ['title', 'year', 'chartsUK' => 'charts.uk', 'chartsUS' => 'charts.us'])
->where('year', Filter::greaterThan(1979))
->where('chartsUS', Filter::equal(1))
->orWhere('chartsUK', Filter::equal(1))
);
This will also get "A Kind of Magic" as it was #1 in UK.
Available Filters
Comparison Filters
use Crwlr\Crawler\Steps\Filters\Filter;
Filter::equal(mixed $toValue);
Filter::notEqual(mixed $value);
Filter::greaterThan(mixed $value);
Filter::greaterThanOrEqual(mixed $value);
Filter::lessThan(mixed $value);
Filter::lessThanOrEqual(mixed $value);
String Filters
use Crwlr\Crawler\Steps\Filters\Filter;
Filter::stringContains(string $string); // uses PHP's str_contains()
Filter::stringStartsWith(string $string) // str_starts_with()
Filter::stringEndsWith(string $string) // str_ends_with()
URL Filters
use Crwlr\Crawler\Steps\Filters\Filter;
Filter::urlScheme(string $scheme); // e.g. http, https, ftp,...
Filter::urlHost(string $host); // www.crwlr.software
Filter::urlDomain(string $domain); // crwlr.software
Filter::urlPath(string $path); // /exact/path
Filter::urlPathStartsWith(string $pathStart); // /foo
Filter::urlPathMatches(string $regex); // Regex (without delimiters) that the path has to match.
// Like: ^/\d{1,5}/