Getting Started
This very easy-to-use package, helps you to convert HTML to well formatted plain text.
Requirements
Requires PHP version 8.1 or above.
Installation
Install the latest version with:
composer require crwlr/html-2-text
Usage
$html = <<<HTML
<!DOCTYPE html>
<html lang="en">
<head><title>Example Website Title</title></head>
<body>
<script>console.log('test');</script>
<style>#app { background-color: #fff; }</style>
<article>
<h1>Article Headline</h1>
<h2>A Subheading</h2>
<p>
Some text containing <a href="https://www.crwl.io">a link</a> <br>
and <strong>bold text</strong>.
</p>
<ul>
<li>list item</li>
<li>another list item</li>
<li>and one more
<ul>
<li>second level
<ul>
<li>third level</li>
</ul>
</li>
</ul>
</li>
</ul>
<ol>
<li>an ordered list</li>
<li>list item</li>
<li>
another list item
<ol>
<li>
second level
<ol>
<li>third level</li>
</ol>
</li>
</ol>
</li>
</ol>
<table>
<thead>
<tr><th>column 1</th><th>column 2</th><th>column 3</th></tr>
</thead>
<tbody>
<tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>
<tr><td>value 1</td><td colspan="2">value 2 + 3</td></tr>
<tr><td colspan="2">value 1 and 2</td><td>value 3</td></tr>
<tr><td>value 1</td><td>value 2</td><td>value 3</td></tr>
</tbody>
</table>
<pre>
// here we have some code inside a pre tag.
\$foo = 'bar';
\$bar = new Foo();
\$bar->baz(\$foo);
</pre>
</article>
</body>
</html>
HTML;
$text = Html2Text::convert($html);
The $text
is:
# Article Headline
## A Subheading
Some text containing [a link](https://www.crwl.io)
and BOLD TEXT.
* list item
* another list item
* and one more
* second level
* third level
1. an ordered list
2. list item
3. another list item
1. second level
1. third level
| column 1 | column 2 | column 3 |
| -------- | -------- | -------- |
| value 1 | value 2 | value 3 |
| value 1 | value 2 + 3 |
| value 1 and 2 | value 3 |
| value 1 | value 2 | value 3 |
// here we have some code inside a pre tag.
\$foo = 'bar';
\$bar = new Foo();
\$bar->baz(\$foo);
The converted text bears a resemblance to Markdown, although it's not identical. I've incorporated elements of Markdown syntax to enhance plain text readability. If you don't like it, you can configure a few things and even build custom node converters to tailor the output to your preference.