CSV Steps
The Csv
step comes with two different options what it will
expect as input.
use Crwlr\Crawler\Steps\Csv;
// Either
Csv::parseString();
// or
Csv::parseFile();
When using Csv::parseString()
it just
expects to get a string (or RespondedRequest
from an Http
step).
When using Csv::parseFile()
it expects a file path
from your local filesystem that it'll read line by line.
This way it should be possible to read very large CSV files
without using too much memory.
Both methods have the same optional arguments. The first one is an array where you can provide a column mapping (explained further below).
use Crwlr\Crawler\Steps\Csv;
Csv::parseString(['id', 'name', 'homepage']);
Csv::parseFile(['id', 'name', 'homepage']);
And a second optional param to tell the step to skip the first line (when it contains column headlines).
use Crwlr\Crawler\Steps\Csv;
Csv::parseString(['id', 'name', 'homepage'], true);
But actually there is also a method to achieve this, which makes it more readable:
use Crwlr\Crawler\Steps\Csv;
Csv::parseString(['id', 'name', 'homepage'])->skipFirstLine();
Column mapping
The column mapping is an array of property names in the order of the columns in the CSV. If you don't provide a mapping it will take the values from the first CSV line as keys. So this only makes sense when the CSV has column headlines in the first line.
In the example above it gets the first 3 columns and in the
output they'll have the keys id
, name
and homepage
.
If you want to skip columns, you can either use numerical keys in the array matching the CSV columns starting at 0.
use Crwlr\Crawler\Steps\Csv;
// 123,Christian,Olear,"https://www.otsch.codes",m
Csv::parseFile([1 => 'firstname', 3 => 'website', 4 => 'gender']);
Or use null values to skip columns. So for the same example as above:
use Crwlr\Crawler\Steps\Csv;
Csv::parseFile([null, 'firstname', null, 'website', 'gender']);
Separator, Enclosure and Escape Characters
There are also methods to change the separator, enclosure and escape characters that it should use.
use Crwlr\Crawler\Steps\Csv;
Csv::parseFile(['username', 'firstname', 'surname'])
->separator('|')
->enclosure('/')
->escape('%');
And as you can see the methods can be chained as they all return the instance.