Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta data is missing from the Facebook preview when sharing a page or post which is cached #248

Open
markhowellsmead opened this issue Nov 22, 2021 · 15 comments

Comments

@markhowellsmead
Copy link

markhowellsmead commented Nov 22, 2021

The meta data is missing from the Facebook preview when sharing a page or post which is cached. The Facebook Debugger shows the error message Curl error: 61 (BAD_CONTENT_ENCODING).

I copied out the HTML content to a static file in the webroot and shared this URL; it was correctly interpreted. Therefore, the content delivery from the cache is causing a problem. I traced the problem back to the following rules in the .htaccess file. If I comment these out, the Facebook Debugger can load the content correctly.

  # gzip
  RewriteRule .* - [E=CACHIFY_SUFFIX:]
  <IfModule mod_mime.c>
    RewriteCond %{HTTP:Accept-Encoding} gzip
    RewriteRule .* - [E=CACHIFY_SUFFIX:.gz]
    AddType text/html .gz
    AddEncoding gzip .gz
  </IfModule>

To Reproduce

  1. Publish a page or post and allow it to be cached as a static file on the server.
  2. Test the URL using https://developers.facebook.com/tools/debug/
  3. See the error Curl error: 61 (BAD_CONTENT_ENCODING)

Expected behavior
The usual Facebook preview of meta data (including post thumbnail) should be shared. The preview in the Facebook Debugger should not return any errors.

System (please complete the following information):

  • OS: all
  • Version: all
@stklcode stklcode added the bug label Nov 22, 2021
@krafit krafit added this to the 2.4.0 milestone Dec 30, 2021
@angcl
Copy link
Contributor

angcl commented Jan 11, 2022

Hi there! I tried to reproduce it and didn't get the curl error.

Reproduction steps:

  1. Create a new WordPress 5.8.3 instance
  2. Install and setup Cachify for static file caching
  3. Publish a blog post and open it as non-admin
  4. Verify that cache files were created on the webserver
  5. Test the URL using https://developers.facebook.com/tools/debug/

The URL to the created page is:
https://cachify.acali.de/2022/01/11/first-blog-post/

The Sharing Debugger results looks like this:
image

@krafit
Copy link
Member

krafit commented Jan 11, 2022

Thanks for checking, @angelocali. Did you enable the „Cache minify“ option?

@angcl
Copy link
Contributor

angcl commented Jan 11, 2022

@krafit You’re welcome. My first attempt was without the „Cache minify“ option. I tried again with setting it to "HTML + Inline-JavaScript" and added a thumbnail to my post. I get slightly different results but still no curl error.

@Zodiac1978
Copy link
Member

This is coming from this support thread with more information shared. Just FYI:
https://wordpress.org/support/topic/cachify-plugin-breaks-facebook-sharing/

@Zodiac1978
Copy link
Member

In the last reply @stklcode is recommending this potential fix:

<FilesMatch "\.html\.gz$">
    Header append Content-Encoding gzip
    Header append Vary Accept-Encoding
</FilesMatch>

If verified, this should be added to the documentation.

@angelocali Maybe you can verify that those headers are provided in your case and that's why it is working for you?

@stklcode
Copy link
Contributor

stklcode commented Jan 11, 2022

The screenshot by @angelocali looks fine to my eyes. The debugger accepts chunked content, so servers likely serve it partially, i.e. with several 206 responses.

The encoding error by @markhowellsmead is reproducible, if pre-gzipped content is served from the .html.gz files without setting a proper Content-Encoding header. So gzip compressed HTML is served, but the client behaves strictly and does not proactively try to decompress when the server indicated uncompressed HTML content. (most browsers do for convenience)

According to the headers, @angelocali site is served by an nginx and correctly provides the Content-Encoding: gzip header.

There is no way the plugin itself can solve this problem programmatically. The documentation for the webserver config should be extended to ensure the correct behavior, if the pre-compressed content is used.
If the webserver only points to the .html file and does compression on the fly, the headers are typically correct.


Sidenote:
The debate, if pre-compression is really beneficial, is a whole different story. Strongly depends on the scenario, but on-the-fly for static files is pretty efficient in most cases. The initial overhead of generating GZ could be reduced, if not used (see #197)

@markhowellsmead
Copy link
Author

I've checked this over again and tried the suggestion from @Zodiac1978 (adding the Content Encoding definition). After clearing the entire cache and running “scrape again“, the Facebook debugger returns the correct result, i.e. the request which generates the cache entry receives the correct response. Subsequent requests, which receive the cached version, still produce the same cURL error.

@Zodiac1978
Copy link
Member

Subsequent requests, which receive the cached version, still produce the same cURL error.

@markhowellsmead I tried this on https://developers.facebook.com/tools/debug/ with the URL provided on WordPress.org and clicked multiple times the "Scrape Again" button and always get the same correct answer. No curl error.

In the header I don't see the correct encoding, it shows as content-encoding: br for me, but vary: Accept-Encoding,User-Agent.

Not sure why this does not work for you. Without reproducing the error, it is not easy to solve this.

(The code snippet is from @stklcode - I just copied it from the WordPress.org thread.)

@markhowellsmead
Copy link
Author

@Zodiac1978 I'd commented out the problematic htaccess code so that the sharing wasn't broken. If you try it now, you'll see the error. (I've temporarily set it back to the standard htaccess code from the documentation.)

@markhowellsmead
Copy link
Author

Additional: if I call the the cached page (link in the WordPress.org thread) with the following htaccess rules active, then the page doesn't render in the browser.

<FilesMatch "\.html\.gz$">
    Header append Content-Encoding gzip
    Header append Vary Accept-Encoding
</FilesMatch>

@Zodiac1978
Copy link
Member

Zodiac1978 commented Jan 12, 2022

@markhowellsmead The gzip part is not necessary. Like @stklcode wrote:

If the webserver only points to the .html file and does compression on the fly, the headers are typically correct.

The server is using the Brotli algorithm for compression, which is a successor to gzip.

See:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding#directives
https://en.wikipedia.org/wiki/Brotli

If the GZIP lines are not used you are serving just HTML which is compressed "on-the-fly" by your server with Brotli. Faster and smaller than gzip. Everything is fine. No need to fix anything, because nothing is broken.

IF you want to enable the GZIP part (I don't know why, because there is nothing broken ...) you need to set the encoding to GZIP if you serve pre-gzipped files. The .htaccess lines are saying, if the browser accepts gzip, then add .gz at the end of the files and add the gzip encoding for those files.

Obviously your server doesn't want to let you change the content-encoding via header. It stays on "br" for Brotli. Maybe you are not allowed to change headers in general.

You can try this:

<IfModule mod_headers.c>
  <FilesMatch "\.html\.gz$">
    Header append Content-Encoding gzip
    Header append Vary Accept-Encoding
  </FilesMatch>
</IfModule>

If the module for changing headers is not available the code is not executed and therefore nothing should break anymore. But also the content-encoding value is not changed as well ...

In this case there is nothing to solve for us. If we can't change it, it will not work. You can ask your hoster, but as said above: Brotli is better and is used "on-the-fly". Why change that?

The only thing to "fix" is, you don't need those .gz files. We should document all of this and make the generation (aka overhead) of generating those .gz files optional. Via UI or filter.

Hopefully this explanation now clarifies what happened. :)

@stklcode Please correct me if I am wrong here in something.

@stklcode
Copy link
Contributor

stklcode commented Jan 13, 2022

The server is using the Brotli algorithm for compression, which is a successor to gzip.

It serves the “best“ encoding that is acceptable before both sides (client sends an indicating header Accept-Encoding: br, gzip and the server replies with the actual choice Content-Encoding: br). Not all clients accept all (or any) encodings. Communication can alway fall back to plain content.

This is how we currently document the behavior for Apache httpd:

<IfModule mod_mime.c>
  RewriteCond %{HTTP:Accept-Encoding} gzip  # if client announces GZIP support
  RewriteRule .* - [E=CACHIFY_SUFFIX:.gz]   # rewrite to the .html.gz file
  AddType text/html .gz   # override content type to HTML
  AddEncoding gzip .gz    # override encoding

  # the following is NOT in the docs, but worth a try
  <IfModule mod_headers.c>  # nested IF as of Apache 2.4.26
    Header set Content-Encoding gzip
    Header set Vary Accept-Encoding
  </IfModule>
</IfModule>

There are several reasons why this might not work, depending on the server configuration. Not all flags are allowed to override in .htaccess and that is configurable. Worst-case the content mit be double-compressed, if the server does not detect the type and applies gzip/br/deflate again.

We should document all of this and make the generation (aka overhead) of generating those .gz files optional. Via UI or filter.

+1
I am personally no friend of pre-compression. In most cases the webserver implementation is very efficient and static, compressed content often served from memory buffers. There might be other scenarios though, so a config/filter solution would be fine.

@markhowellsmead
Copy link
Author

This is how we currently document the behavior for Apache httpd:

The quoted code doesn't appear in the documentation at https://cachify.pluginkollektiv.org/documentation/, and it also produces the same cURL error.

The following code in htaccess seems to work both for the browser version and for the Facebook debugger:

# BEGINN CACHIFY
<IfModule mod_rewrite.c>
    # ENGINE ON
    RewriteEngine On

    # GZIP FILE
    <IfModule mod_mime.c>
        RewriteCond %{REQUEST_URI} /$
        RewriteCond %{REQUEST_URI} !^/wp-admin/.*
        RewriteCond %{REQUEST_METHOD} !=POST
        RewriteCond %{QUERY_STRING} =""
        RewriteCond %{HTTP_COOKIE} !(wp-postpass|wordpress_logged_in|comment_author)_
        RewriteCond %{HTTP:Accept-Encoding} gzip
        RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/cachify/%{HTTP_HOST}%{REQUEST_URI}index.html.gz -f
        RewriteRule ^(.*) /wp-content/cache/cachify/%{HTTP_HOST}%{REQUEST_URI}index.html.gz [L]

        AddType text/html .gz
        AddEncoding gzip .gz
    </IfModule>

    <IfModule mod_headers.c>
        <FilesMatch "\.html\.gz$">
            Header append Content-Encoding gzip
            Header append Vary Accept-Encoding
        </FilesMatch>
    </IfModule>

    # HTML FILE
    RewriteCond %{REQUEST_URI} /$
    RewriteCond %{REQUEST_URI} !^/wp-admin/.*
    RewriteCond %{REQUEST_METHOD} !=POST
    RewriteCond %{QUERY_STRING} =""
    RewriteCond %{HTTP_COOKIE} !(wp-postpass|wordpress_logged_in|comment_author)_
    RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/cachify/%{HTTP_HOST}%{REQUEST_URI}index.html -f
    RewriteRule ^(.*) /wp-content/cache/cachify/%{HTTP_HOST}%{REQUEST_URI}index.html [L]
</IfModule>
# END CACHIFY

@stklcode
Copy link
Contributor

The quoted code doesn't appear in the documentation at https://cachify.pluginkollektiv.org/documentation/, and it also produces the same cURL error.

Copied from the “setup“ tab content in the plugin config.
https://github.com/pluginkollektiv/cachify/blob/2.3.2/inc/setup/cachify.hdd.htaccess.php

Apparently that’s what we “recommend“ there. Should be unified with the more elaborate web documentation.

@markhowellsmead
Copy link
Author

Apparently that’s what we “recommend“ there. Should be unified with the more elaborate web documentation.

Agree, although the code in the setup tab leads to the cURL error, so is currently unusable for all of our projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants