Step Output Filters
Any step that extends the abstract Step
class shipped with
the package, has the where()
and orWhere()
methods to
filter its outputs. Here's an example how to use it:
$json = <<<JSON
{
"queenAlbums": [
{ "title": "Queen", "year": 1973, "charts": { "uk": 24, "us": 83 } },
{ "title": "Queen II", "year": 1974, "charts": { "uk": 5, "us": 49 } },
{ "title": "A Night at the Opera", "year": 1975, "charts": { "uk": 1, "us": 4 } },
{ "title": "A Day at the Races", "year": 1976, "charts": { "uk": 1, "us": 5 } },
{ "title": "The Game", "year": 1980, "charts": { "uk": 1, "us": 1 } },
{ "title": "A Kind of Magic", "year": 1986, "charts": { "uk": 1, "us": 46 } }
]
}
JSON;
$crawler = new MyCrawler();
$crawler->input($json);
$crawler->addStep(
Json::each('queenAlbums', ['title', 'year', 'chartsUK' => 'charts.uk', 'chartsUS' => 'charts.us'])
->where('year', Filter::greaterThan(1979))
->where('chartsUS', Filter::equal(1))
);
As you can see, you always need to provide a Filter
object.
But that shouldn't be too complicated, as there is a static
method for any available filter on that class.
In the example, the result will be only the album "The Game", as it's the only one from the list from after 1979 and reaching #1 in the US charts.
The first parameter is the key in the step's output array
(or object). If the step outputs only a single, non
array/object value, you can just give it only the Filter
:
$crawler->addStep(
Html::getLink('.linkClass')
->where(Filter::urlDomain('crwlr.software'))
);
As mentioned, there is also orWhere
. So in the same
example as above you can also do:
$crawler->addStep(
Json::each('queenAlbums', ['title', 'year', 'chartsUK' => 'charts.uk', 'chartsUS' => 'charts.us'])
->where('year', Filter::greaterThan(1979))
->where('chartsUS', Filter::equal(1))
->orWhere('chartsUK', Filter::equal(1))
);
This will also get "A Kind of Magic" as it was #1 in UK.
Available Filters
Comparison Filters
Filter::equal(mixed $toValue);
Filter::notEqual(mixed $value);
Filter::greaterThan(mixed $value);
Filter::greaterThanOrEqual(mixed $value);
Filter::lessThan(mixed $value);
Filter::lessThanOrEqual(mixed $value);
String Filters
Filter::stringContains(string $string); // uses PHP's str_contains()
Filter::stringStartsWith(string $string) // str_starts_with()
Filter::stringEndsWith(string $string) // str_ends_with()
Url filters
Filter::urlScheme(string $scheme); // e.g. http, https, ftp,...
Filter::urlHost(string $host); // www.crwlr.software
Filter::urlDomain(string $domain); // crwlr.software
Filter::urlPath(string $path); // /exact/path
Filter::urlPathStartsWith(string $pathStart); // /foo