Documentation for crwlr / crawler (v2.1)

CSV Steps

The Csv step comes with two different options what it will expect as input.

use Crwlr\Crawler\Steps\Csv;

// Either
Csv::parseString();

// or
Csv::parseFile();

When using Csv::parseString() it just expects to get a string (or RespondedRequest from an Http step).

When using Csv::parseFile() it expects a file path from your local filesystem that it'll read line by line. This way it should be possible to read very large CSV files without using too much memory.

Both methods have the same optional arguments. The first one is an array where you can provide a column mapping (explained further below).

use Crwlr\Crawler\Steps\Csv;

Csv::parseString(['id', 'name', 'homepage']);

Csv::parseFile(['id', 'name', 'homepage']);

And a second optional param to tell the step to skip the first line (when it contains column headlines).

use Crwlr\Crawler\Steps\Csv;

Csv::parseString(['id', 'name', 'homepage'], true);

But actually there is also a method to achieve this, which makes it more readable:

use Crwlr\Crawler\Steps\Csv;

Csv::parseString(['id', 'name', 'homepage'])->skipFirstLine();

Column mapping

The column mapping is an array of property names in the order of the columns in the CSV. If you don't provide a mapping it will take the values from the first CSV line as keys. So this only makes sense when the CSV has column headlines in the first line.

In the example above it gets the first 3 columns and in the output they'll have the keys id, name and homepage.

If you want to skip columns, you can either use numerical keys in the array matching the CSV columns starting at 0.

use Crwlr\Crawler\Steps\Csv;

// 123,Christian,Olear,"https://www.otsch.codes",m

Csv::parseFile([1 => 'firstname', 3 => 'website', 4 => 'gender']);

Or use null values to skip columns. So for the same example as above:

use Crwlr\Crawler\Steps\Csv;

Csv::parseFile([null, 'firstname', null, 'website', 'gender']);

Separator, Enclosure and Escape Characters

There are also methods to change the separator, enclosure and escape characters that it should use.

use Crwlr\Crawler\Steps\Csv;

Csv::parseFile(['username', 'firstname', 'surname'])
    ->separator('|')
    ->enclosure('/')
    ->escape('%');

And as you can see the methods can be chained as they all return the instance.