Json
The Json
step has three static methods:
Json::all()
to extract the whole JSON objectJson::get()
to cherry pick properties from the JSON object- and
Json::each()
to extract multiple items from the JSON object
Json::all()
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\Steps\Json;
use Crwlr\Crawler\Steps\Loading\Http;
$crawler = HttpCrawler::make()->withUserAgent('MyCrawler');
$crawler
->input('https://www.example.com/json')
->addStep(Http::get())
->addStep(Json::all());
Json::get()
The Json::get()
method works pretty much like the extract
method of the Html
and Xml
steps. Thanks to
adbario/php-dot-notation extracting data from JSON documents is really simple. Given the URL https://www.example.com/json
responds with the following JSON:
{
"data": {
"something": "yolo",
"target": {
"foo": "Lorem ipsum",
"bar": "dolor sit",
"array": [
{ "baz": "zero" },
{ "baz": "one" },
{ "baz": "two" }
]
}
}
}
Cherry-pick your desired properties like this:
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\Steps\Json;
use Crwlr\Crawler\Steps\Loading\Http;
$crawler = HttpCrawler::make()->withUserAgent('MyCrawler');
$crawler
->input('https://www.example.com/json')
->addStep(Http::get())
->addStep(
Json::get([
'foo' => 'data.target.foo',
'bar' => 'data.target.array.1.baz',
])
);
The output of the JSON step then is:
array(2) {
["foo"]=>
string(11) "Lorem ipsum"
["bar"]=>
string(3) "one"
}
Json::each()
You can also extract multiple items from an array in the JSON object, by using the each
method. Let's say the JSON looks like this:
{
"list": {
"people": [
{ "name": "Hans Zimmer", "age": { "years": 66 }, "home": "US" },
{ "name": "John Williams", "age": { "years": 92 }, "home": "US" },
{ "name": "Alan Silvestri", "age": { "years": 73 }, "home": "US" }
]
}
}
You can get the names and ages like this:
use Crwlr\Crawler\HttpCrawler;
use Crwlr\Crawler\Steps\Json;
use Crwlr\Crawler\Steps\Loading\Http;
$crawler = HttpCrawler::make()->withUserAgent('MyCrawler');
$crawler
->input('https://www.example.com/json')
->addStep(Http::get())
->addStep(
Json::each(
'list.people',
[ // provide the data mapping as second argument to the each() method.
'name' => 'name',
'age' => 'age.years'
]
)
);
This yields 3 separate outpus:
array(2) {
["name"]=>
string(11) "Hans Zimmer"
["age"]=>
int(66)
}
array(2) {
["name"]=>
string(13) "John Williams"
["age"]=>
int(92)
}
array(2) {
["name"]=>
string(14) "Alan Silvestri"
["age"]=>
int(73)
}