Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5 SafeListSanitizer removes 'viewbox' even when allowed_attribute #169

Closed
jorg-vr opened this issue Oct 10, 2023 · 2 comments
Closed

Comments

@jorg-vr
Copy link

jorg-vr commented Oct 10, 2023

We use the following sanitize function:

def sanitize(html)
    @tags ||= Rails::HTML5::SafeListSanitizer.allowed_tags.to_a +
              %w[table thead tbody tr td th colgroup col style summary details] +
              %w[svg g style circle line rect path polygon text]
    @attributes ||= Rails::HTML5::SafeListSanitizer.allowed_attributes.to_a +
                    %w[style target data-bs-toggle data-parent data-tab data-line data-element id] +
                    %w[viewbox width height version style class transform id x y rx ry x1 y1 x2 y2 d points fill stroke stroke-width cx cy r font-size font-family font-weight font-variant]

    # Filters allowed tags and attributes
    sanitized = Rails::HTML5::SafeListSanitizer.new.sanitize html,
                                                             tags: @tags,
                                                             attributes: @attributes
    sanitized.html_safe
  end

The following test fails:

test 'sanitize helper should allow a selection of svg tags' do
    dirty_html = <<~HTML
      <svg viewbox="0 0 100 100" width="300" height="100" version="1.1">
        <style>line,circle{stroke-width:3px;stroke:black;stroke-linecap:round}</style>
        <style>test{stroke-width:3px;stroke:black;stroke-linecap:round}</style>
        <g id="group1" transform="translate(50,50)">
          <circle cx="0" cy="0" r="40" fill="none"></circle>
          <line class="test" x1="0" y1="0" x2="0" y2="-40"></line>
        </g>
        <rect x="0" y="0" rx="1" ry="1" width="100" height="100" fill="none" stroke="black" stroke-width="3px"></rect>
        <polygon points="0,0 100,0 100,100 0,100"></polygon>
        <path d="M0,0 L100,0 L100,100 L0,100 Z"></path>
        <text x="0" y="0" font-size="14px" font-weight="bold" font-variant="normal" font-family="serif">Hello</text>
      </svg>
    HTML
    clean_html = sanitize dirty_html

    assert_equal dirty_html, clean_html
  end

with output:

1) sanitize helper should allow a selection of svg tags
      --- expected
      +++ actual
      @@ -1,4 +1,4 @@
      -"<svg viewbox=\"0 0 100 100\" width=\"300\" height=\"100\" version=\"1.1\">
      +"<svg width=\"300\" height=\"100\" version=\"1.1\">
         <style>line,circle{stroke-width:3px;stroke:black;stroke-linecap:round}</style>
         <style>test{stroke-width:3px;stroke:black;stroke-linecap:round}</style>
         <g id=\"group1\" transform=\"translate(50,50)\">

As you can see viewbox is removed even though it is specified as a safe attribute.

We previously used the HTML4 sanitizer. Replacing HTML5 with HTML4 in the above code makes the test run perfectly.
I do not see why the behavior is different between both sanitizers.

We are currently using rails-html-sanitizer (1.6.0)

Context: I am currently working on updating to rails 7.1 which replaces HTML4 with HTML5 as default if supported. see dodona-edu/dodona#5031

@flavorjones
Copy link
Member

@jorg-vr Thanks for opening this issue! I'll take a look.

@flavorjones
Copy link
Member

Fascinating!

  it "parses the same" do
    input = %(<svg viewbox="0 0 100 100" width="300" height="100" version="1.1"></svg>)
    output = Nokogiri::HTML5.fragment(input).to_html

    assert_equal(input, output)
  end

yields

  1) Failure:
TestSanitize#test_0002_parses the same [./issues/169-svg-viewbox.rb:54]:
--- expected
+++ actual
@@ -1 +1 @@
-"<svg viewbox=\"0 0 100 100\" width=\"300\" height=\"100\" version=\"1.1\"></svg>"
+"<svg viewBox=\"0 0 100 100\" width=\"300\" height=\"100\" version=\"1.1\"></svg>"

You'll note that the HTML5 parser emits viewBox where the HTML4 parser emits viewbox. This is the correct thing to do! The HTML4 parser incorrectly lowercased element names and attribute names in the SVG foreign context, so we should consider this to be an improvement.

If you update your allowlist to use viewBox instead of viewbox then the attribute (re-cased to viewBox) will be allowed by the sanitizer!

I hope all this makes sense -- if you've got followup questions, I'll try to answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants