-
Notifications
You must be signed in to change notification settings - Fork 274
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Data Liberation] Add HTML to Blocks converter (#2095)
Adds a basic `WP_HTML_To_Blocks` class that accepts HTML and outputs block markup. It's a very basic converter. It only considers the markup and won't consider any visual changes introduced via CSS or JavaScript. Only a few core blocks are supported in this initial PR. The API can easily support more HTML elements and blocks. To preserve visual fidelity between the original HTML page and the produced block markup, we'll need an annotated HTML input produced by the [Try WordPress](https://github.com/WordPress/try-wordpress/) browser extension. It would contain each element's colors, sizes, etc. We cannot possibly get all from just analyzing the HTML on the server without building a full-blown, browser-like HTML renderer in PHP, and I know I'm not building one. A part of #1894 ## Example ```php $html = <<<HTML <meta name="post_title" content="My first post"> <p>Hello <b>world</b>!</p> HTML; $converter = new WP_HTML_To_Blocks( $html ); $converter->convert(); var_dump( $converter->get_all_metadata() ); /* * array( 'post_title' => array( 'My first post' ) ) */ var_dump( $converter->get_block_markup() ); /* * <!-- wp:paragraph --> * <p>Hello <b>world</b>!</p> * <!-- /wp:paragraph --> */ ``` ## Caveats I had to patch WP_HTML_Processor to stop baling out on `<meta>` tags referencing the document charset. Ideally we'd patch WordPress core to stop baling out when the charset is UTF-8. ## Testing instructions This PR mostly adds new code. Just confirm the unit tests pass in CI. cc @brandonpayton @zaerl @sirreal @dmsnell @ellatrix
- Loading branch information
Showing
16 changed files
with
4,143 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
51 changes: 50 additions & 1 deletion
51
packages/playground/data-liberation/src/block-markup/WP_Block_Markup_Converter.php
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,57 @@ | ||
<?php | ||
|
||
/** | ||
* Represents a {Data Format} -> Block Markup + Metadata converter. | ||
* | ||
* Used by the Data Liberation importers to accept data formatted as HTML, Markdown, etc. | ||
* and convert them to WordPress posts. | ||
*/ | ||
interface WP_Block_Markup_Converter { | ||
/** | ||
* Converts the input document specified in the constructor to block markup. | ||
* | ||
* @return bool Whether the conversion was successful. | ||
*/ | ||
public function convert(); | ||
|
||
/** | ||
* Gets the block markup generated by the convert() method. | ||
* | ||
* @return string The block markup. | ||
*/ | ||
public function get_block_markup(); | ||
|
||
/** | ||
* Gets all the metadata sourced from the input document by the convert() method. | ||
* The data format is: | ||
* | ||
* array( | ||
* 'post_title' => array( 'The Name of the Wind' ), | ||
* 'post_author' => array( 'Patrick Rothfuss', 'Betsy Wollheim' ) | ||
* ) | ||
* | ||
* Note each meta key may have multiple values. The consumer of this interface | ||
* must account for this. | ||
* | ||
* @return array The metadata sourced from the input document. | ||
*/ | ||
public function get_all_metadata(); | ||
public function get_meta_value( $key ); | ||
|
||
/** | ||
* Gets the first metadata value for a given key. | ||
* | ||
* Example: | ||
* | ||
* Metadata: | ||
* array( | ||
* 'post_title' => array( 'The Name of the Wind' ), | ||
* 'post_author' => array( 'Patrick Rothfuss', 'Betsy Wollheim' ) | ||
* ) | ||
* | ||
* get_first_meta_value( 'post_author' ) returns 'Patrick Rothfuss'. | ||
* | ||
* @param string $key The metadata key. | ||
* @return mixed The metadata value. | ||
*/ | ||
public function get_first_meta_value( $key ); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.