Documentation for crwlr / html-2-text (v0.1)

Getting Started

This very easy-to-use package, helps you to convert HTML to well formatted plain text.


Requires PHP version 8.1 or above.


Install the latest version with:

composer require crwlr/html-2-text


$html = <<<HTML
<!DOCTYPE html>
<html lang="en">
<head><title>Example Website Title</title></head>
    <style>#app { background-color: #fff; }</style>
        <h1>Article Headline</h1>
        <h2>A Subheading</h2>

            Some text containing <a href="">a link</a> <br>
            and <strong>bold text</strong>.

            <li>list item</li>
            <li>another list item</li>
            <li>and one more
                    <li>second level
                            <li>third level</li>

            <li>an ordered list</li>
            <li>list item</li>
                another list item
                        second level
                            <li>third level</li>

            <tr><th>column 1</th><th>column 2</th><th>column 3</th></tr>
            <tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>
            <tr><td>value 1</td><td colspan="2">value 2 + 3</td></tr>
            <tr><td colspan="2">value 1 and 2</td><td>value 3</td></tr>
            <tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>

            // here we have some code inside a pre tag.
            \$foo = 'bar';

            \$bar = new Foo();


$text = Html2Text::convert($html);

The $text is:

# Article Headline

## A Subheading

Some text containing [a link](

* list item
* another list item
* and one more
  * second level
    * third level

1. an ordered list
2. list item
3. another list item
  1. second level
    1. third level

| column 1 | column 2 | column 3 |
| -------- | -------- | -------- |
| value 1  | value 2  | value 3  |
| value 1  | value 2 + 3         |
| value 1 and 2       | value 3  |
| value 1  | value 2  | value 3  |

            // here we have some code inside a pre tag.
            \$foo = 'bar';

            \$bar = new Foo();


The converted text bears a resemblance to Markdown, although it's not identical. I've incorporated elements of Markdown syntax to enhance plain text readability. If you don't like it, you can configure a few things and even build custom node converters to tailor the output to your preference.