XML Steps
The Xml
step extends the same base class (Dom
) as
the Html
step but uses XPath queries as default, instead
of CSS selectors. So selecting data from an XML document
looks pretty much the same as selecting from HTML:
Xml::root()->extract(['title' => '//title', 'author' => '//author']);
Xml::each('bookstore/book')->extract(['title' => '//title', 'author' => '//author']);
Xml::first('bookstore/book')->extract(['title' => '//title', 'author' => '//author']);
Xml::last('bookstore/book')->extract(['title' => '//title', 'author' => '//author']);
root
is used to just extract a set of properties from the
root of the document. each
, first
and last
are all
used to extract a set of properties from a list of similar
items. each
is the only one that yields multiple outputs.
The extract
method takes an array with the data property
names that you want to have in the output/result as key
and the XPath query as value.
Accessing other Node Values
By default, the XPath queries return the text of the selected node. But of course you can also get other values:
Xml::first('listing/item')->extract([
'default' => Dom::xPath('//default')->text(),
'foo' => Dom::xPath('//foo')->innerText(),
'bar' => Dom::xPath('//bar')->html(),
'baz' => Dom::xPath('//baz')->outerHtml(),
'test' => Dom::xPath('//test')->attribute('test'),
]);
text
You don't have to use this explicitly, it's the default
when you only provide the selector as string. It gets the
text inside the node including children.
innerText
Gets only the text directly inside the node. Excludes text
from child nodes.
html
Gets the xml source inside the selected element.
outerHtml
Gets the xml source of the selected element including the
element itself.
attribute(x)
Gets the value inside attribute x of the selected element.
Using CSS selectors instead of XPath queries
As default, Xml steps use XPath queries, but if you want to, you can also use CSS selectors for Xml:
Xml::each(Dom::cssSelector('bookstore book'))
->extract([
'title' => Dom::cssSelector('title'),
'author' => Dom::cssSelector('author'),
'year' => Dom::cssSelector('year'),
]);