Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEO: rewrite certain xml:id values and validate against Geekodoc #364

Merged
merged 19 commits into from
Jul 23, 2024

Conversation

smithfarm
Copy link
Member

@smithfarm smithfarm commented Jul 10, 2024

For SEO optimization, we need to avoid the use of dots ('.') and underscores ('_') in our xml:id values.

To be merged together with companion PR: openSUSE/obs-landing#466

@hennevogel
Copy link
Member

Phew this will break a lot of external links right?

@smithfarm
Copy link
Member Author

smithfarm commented Jul 11, 2024

Did we ever say that our links live forever? All the other SUSE documentation had to take this poison pill, so why not ours?

@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch from fc28462 to c012ad1 Compare July 11, 2024 10:05
@hennevogel
Copy link
Member

Did we ever say that our links live forever?

No but that doesn't mean we should not give a 💩 either. obs-landing knows redirects. At least for the sections etc. (everything but anchors) we should do those. Like

/help/manuals/obs-user-guide/par.first_steps /help/manuals/obs-user-guide/par-first-steps

@smithfarm
Copy link
Member Author

@hennevogel OK, I'll create a companion PR at obs-landing.

@smithfarm smithfarm changed the title SEO: rewrite certain xml:id values SEO: rewrite certain xml:id values and validate against Geekodoc Jul 11, 2024
@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch 2 times, most recently from 86e8c96 to 55dbf51 Compare July 21, 2024 13:13
smithfarm added a commit to openSUSE/obs-landing that referenced this pull request Jul 21, 2024
Otherwise we break an unknown number of external hyperlinks.

Refers-to: openSUSE/obs-docu#364
Signed-off-by: Nathan Cutler <[email protected]>
smithfarm added a commit to openSUSE/obs-landing that referenced this pull request Jul 21, 2024
Otherwise we break an unknown number of external hyperlinks.

Refers-to: openSUSE/obs-docu#364
Signed-off-by: Nathan Cutler <[email protected]>
@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch from bb27e05 to 03f4eff Compare July 21, 2024 14:27
@tomschr
Copy link
Contributor

tomschr commented Jul 22, 2024

Did we ever say that our links live forever? All the other SUSE documentation had to take this poison pill, so why not ours?

That's true, we faced this problem in the past. We mitigated this by adding redirect rules in our Apache .htaccess file. For the future, we introduced a "don't change your ID!" rule.

Still we face this problem from time to time when some clever managers want to change the product names. 😉

If IDs are part of URLs, there is not much you can do to avoid this problem. Sometimes it's inevitable. You can make people aware of that to mitigate the issue.

For the record, our styleguide contains a section about Identifiers. Just to give you an idea how we use and create IDs.

@smithfarm
Copy link
Member Author

That's true, we faced this problem in the past. We mitigated this by adding redirect rules in our Apache .htaccess file.

As already suggested by @hennevogel I have opened a companion PR with corresponding redirects: openSUSE/obs-landing#466

@tomschr
Copy link
Contributor

tomschr commented Jul 22, 2024

I have opened a companion PR with corresponding redirects: openSUSE/obs-landing#466

Yes, saw it. Just wanted to give you some context from our side. 🙂

@smithfarm
Copy link
Member Author

We seem to have reached a consensus that migrating to Geekodoc is the way to go, so as long as there are no objections to the additional changes I have added, this PR seems ready to go in (together with its counterpart PR over at obs-landing).

For SEO optimization, we need to avoid the use of dots ('.') and underscores
('_') in our xml:id values.

Signed-off-by: Nathan Cutler <[email protected]>
For SEO optimization, we need to avoid the use of dots ('.') and underscores
('_') in our xml:id values.

Signed-off-by: Nathan Cutler <[email protected]>
The xml:id values in the XML file were rewritten, but not in the
correct way.

Signed-off-by: Nathan Cutler <[email protected]>
The term "open build service" (without initial caps) is wrong, anyway.

Signed-off-by: Nathan Cutler <[email protected]>
It doesn't make sense to hyperlink to a figure that follows immediately after
the link.

Signed-off-by: Nathan Cutler <[email protected]>
This comment wasn't adding any value.

Signed-off-by: Nathan Cutler <[email protected]>
This FIXME was confused by the presence of "&gitproject;" in the referencespec.
It doesn't actually link to GitHub.

Signed-off-by: Nathan Cutler <[email protected]>
I don't know what the original intention was here, but these days
it's just confusing.

Signed-off-by: Nathan Cutler <[email protected]>
The art-obs-quickstart tag has been here since the beginning, but
it doesn't mean anything in the given context and hence is just
decreasing the overall signal-to-noise ratio.

Signed-off-by: Nathan Cutler <[email protected]>
There was some confusion over whether "Source Services" should be plural or
singular, and in some places we were referring to Source Services merely as
"services" in contexts where it might not be abundantly clear to the reader
what is meant by "services".

Signed-off-by: Nathan Cutler <[email protected]>
As a first-time reader I was confused by the current text, so I quickly came up
with some edits to make the chapter more intelligible.

Signed-off-by: Nathan Cutler <[email protected]>
@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch from 03f4eff to 8c0a735 Compare July 22, 2024 10:40
@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch from 7754e7b to 6fc1afa Compare July 22, 2024 14:24
@tomschr
Copy link
Contributor

tomschr commented Jul 22, 2024

I've did a quick validation check and there is more work ahead. 😉

I forgot to mention that the xml:id is only one part. You also need to modify the value of the linkend attribute. It appears inside a <xref linkend="..."/> element. These are internal cross references to other parts of the document. I'm sorry for the additional work.

@smithfarm Nathan, I've experimented a bit to make this work more efficient. After I was unsuccessful with sed and xmlstarlet, I wrote a small Python script:

# modifydocbook.py
import re
import sys

pattern = re.compile(r'(xml:id|linkend)="([^"]+)"')  # Regular expression pattern


def modify_docbook_xml(file_path):
  """
  Read a DocBook XML file by replacing dots and underscores with dashes
  in xml:id and linkend attributes.
  Prints the modified content to standard output.

  Args:
    file_path: Path to the original DocBook XML file.
  """

  with open(file_path, 'r') as fh:
    for line in fh:
        match = pattern.search(line)
        if match:
            # line = line.rstrip()
            attr_value = match.group(2).replace('.', '-').replace('_', '-')
            replacement = rf'\1="{attr_value}"'
            line = pattern.sub(replacement, line)

        print(line, end="")


if __name__ == "__main__":
    modify_docbook_xml(sys.argv[1])

If you install sponge (package name is the same) you can do it in one go:

$ for xml in xml/*.xml; do 
  python3 modifydocbook.py $xml | sponge $xml
done

After this process, there is one ID (co-obs-sserv-struct-name) which occurs twice. If I fix that as well and correct the ROOTID in the DC files, both admin and user guide are valid.

smithfarm added a commit to openSUSE/obs-landing that referenced this pull request Jul 22, 2024
Otherwise we break an unknown number of external hyperlinks.

Refers-to: openSUSE/obs-docu#364
Signed-off-by: Nathan Cutler <[email protected]>
Signed-off-by: Nathan Cutler <[email protected]>
@smithfarm
Copy link
Member Author

smithfarm commented Jul 22, 2024

You also need to modify the value of the linkend attribute.

@tomschr Thanks! But I believe I did that already... Do you see any linkend attribute value that I missed?

@tomschr
Copy link
Contributor

tomschr commented Jul 22, 2024

Thanks! But I believe I did that already...

daps thinks otherwise. 😉 It reports some xml:id and linkend errors. I count 14 files that are modified after I apply my script. 🙂

@smithfarm
Copy link
Member Author

@tomschr Instead of me duplicating your work, could you just push the changes right into the PR? Or just push a branch with the commit and I'll cherry-pick it in...

@smithfarm
Copy link
Member Author

smithfarm commented Jul 22, 2024

@tomschr The odd thing is, Geekodoc no longer complains about the . and _ characters in the xml:id attribute values.

I mean, I'm trying to get to them all, but I found I missed some - yet Geekodoc (daps) did not complain.

@smithfarm
Copy link
Member Author

smithfarm commented Jul 22, 2024

daps thinks otherwise. 😉 It reports some xml:id and linkend errors.

Where? Here in the PR it says this:

[INFO] Building DC-obs-all
[DEBUG] Validate inside container docker exec fd9b38640981f186599fbbaca8e9a91df877a2104d61d86db9027b55b2ae5468 daps -d /daps_temp-dkjdVP1OK7/source/DC-obs-all validate 
[DEBUG] Validation for DC-obs-all result was 0
[DEBUG] Checking validation result...
daps -d /daps_temp-dkjdVP1OK7/source/DC-obs-all html  
Your output documents are:
/tmp/daps2docker-2DhNs4mF/obs-all/html/obs-all/

I thought this meant that daps thinks the XML is valid.

@tomschr
Copy link
Contributor

tomschr commented Jul 23, 2024

daps thinks otherwise. 😉 It reports some xml:id and linkend errors.

Where? Here in the PR it says this:
...
I thought this meant that daps thinks the XML is valid.

Use the option -vv and look at the beginning. You will see probably something like this:

     DOCBOOK_VERSION: 5
        DOCBOOK5_RNG: /usr/share/xml/docbook/schema/rng/5.2/docbookxi.rng

If that's the case, you still validate against DocBook 5. However, DocBook doesn't have any restriction regarding IDs.

To really validate against Geekdoc, you need daps to tell what schema to use. In your ~/.config/daps/dapsrc file, add the following line:

DOCBOOK5_RNG_URI="urn:x-suse:rng:v2:geekodoc-flat"

Then validate again and you should hopefully see some messages. For me, it looks like this:

[ERROR]: The document contains XML errors:
/home/toms/repos/GH/opensuse/obs-docu/build/.profiled/bogus_opensuse_novell/book-obs-user-guide.xml:6:217: error: value of attribute "xml:id" is invalid; must be an XML name without colons matching the regular expression "[\-0-9a-zA-Z]+"

GeekoDoc restricts identifiers in xml:id/linkend attributes
and allows only specific characters (no dots, underscores).

* Replace dots/underscores with "-"
* Correct ROOTID in DC files
* Set validation schema in DC files to GeekoDoc
@tomschr
Copy link
Contributor

tomschr commented Jul 23, 2024

@tomschr Instead of me duplicating your work, could you just push the changes right into the PR? Or just push a branch with the commit and I'll cherry-pick it in...

@smithfarm Absolutely! I just didn't want to intervene with your work so my next questions would be exactly that. 👍 I've committed the changes in commit 05a8958, see the commit message for details.

@smithfarm
Copy link
Member Author

@tomschr Thanks! Much appreciated.

smithfarm added a commit to openSUSE/obs-landing that referenced this pull request Jul 23, 2024
Otherwise we break an unknown number of external hyperlinks.

Refers-to: openSUSE/obs-docu#364
Signed-off-by: Nathan Cutler <[email protected]>
@smithfarm smithfarm force-pushed the wip-seo-link-rewrite branch from c687115 to 05a8958 Compare July 23, 2024 09:47
@@ -3,15 +3,15 @@
<!ENTITY % entities SYSTEM "entity-decl.ent">
%entities;
]>
<chapter version="5.1" xml:id="cha-obs-source-service"
<chapter version="5.1" xml:id="cha-obs-source-services"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear we are going to need redirects for this too...

@@ -4,25 +4,36 @@
<!ENTITY % entities SYSTEM "entity-decl.ent">
%entities;
]>
<chapter version="5.1" xml:id="cha-obs-building"
<chapter version="5.1" xml:id="cha-obs-local-building"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a redirect to obs-landing

@hennevogel hennevogel merged commit bfafe8c into master Jul 23, 2024
4 checks passed
@smithfarm smithfarm deleted the wip-seo-link-rewrite branch July 23, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants