Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check links on CI #902

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gromakovsky
Copy link

@gromakovsky gromakovsky commented Jan 23, 2024

Description

Markdown files contain a plenty of links which tend to get outdated occasionally. It's hard to notice when a certain link (local or global) becomes broken.

In the past, the xrefcheck tool was used to find broken links:

This PR automates xrefcheck running by adding it as a new GitHub Action job. It uses xrefcheck-action under the hood.

Resolves #4

In the second commit I tried to fix/ignore broken links detected by xrefcheck using my best judgement. AFAIU docs/website contains sources for https://ouroboros-consensus.cardano.intersectmbo.org/, so I used that site to check some links.

  1. Links in References.md, index.md and TechnicalReports.md don't work on GitHub because they are invalid in the sense of Markdown, but they work fine on the website, so I didn't touch them and added a comment to make xrefcheck ignore them.
  2. Glossary.md contains many links to anchors in the same file. These anchors work differently in Markdown and Docusarus as can be seen here. Ideally, they should be fixed to work on the resulting website, in which case all of them would be broken in the sense of Markdown. Fixing them for Docusaurus is out of scope of this PR, but since they will probably become broken in the sense of Markdown, I've instructed xrefcheck to ignore all links in that file.
  3. There is also an anchor link in VersioningSchemeDecision.md, but this file is not used in the resulting website, so I've fixed it according to Markdown rules.

There are also some global/absolute links that return 404 and I couldn't find how to fix them.


  ➥  In file docs/website/contents/for-developers/AbstractProtocol.md
     bad reference (external) at src:7:3-178:
       - text: "Ouroboros.Consensus.Tutorial.Simple"
       - link: https://github.com/IntersectMBO/ouroboros-consensus/blob/master/ouroboros-consensus/src/tutorials/Ouroboros/Consensus/Tutorial/Simple.lhs
       - anchor: -
     
     ⛂  Resource unavailable (404 Not Found)
     
     
  ➥  In file docs/website/contents/for-developers/AbstractProtocol.md
     bad reference (external) at src:9:3-184:
       - text: "Ouroboros.Consensus.Tutorial.WithEpoch"
       - link: https://github.com/IntersectMBO/ouroboros-consensus/blob/master/ouroboros-consensus/src/tutorials/Ouroboros/Consensus/Tutorial/WithEpoch.lhs
       - anchor: -
     
     ⛂  Resource unavailable (404 Not Found)

@gromakovsky gromakovsky requested a review from a team as a code owner January 23, 2024 17:45
@gromakovsky gromakovsky force-pushed the gromak/add-xrefcheck-to-ci branch from 7e39b77 to 89329ea Compare January 23, 2024 17:48
@jorisdral
Copy link
Contributor

jorisdral commented Jan 24, 2024

There are also some global/absolute links that return 404 and I couldn't find how to fix them.

Good catch! These are literate Haskell files that aren't rendered and uploaded to the website, which we actually might want to do. I'm creating a separate issue for this

EDIT: #903

@amesgen
Copy link
Member

amesgen commented Jan 24, 2024

Drive-by comment: some broken links detected here are (also) fixed in #884

@jorisdral
Copy link
Contributor

@gromakovsky as discussed in #4 (comment), would it be possible to demote the output to a warning?

@dnadales dnadales self-assigned this Jan 29, 2024
@gromakovsky
Copy link
Author

as discussed in #4 (comment), would it be possible to demote the output to a warning?

AFAIK there is no way to report "warning" status for a check, I see a feature request for it, but it's unresolved: https://github.com/orgs/community/discussions/11592
According to https://stackoverflow.com/a/75849954, it should be possible to report some warnings during a job run and finish the job as successful. I think we can do it: update xrefcheck to accept a custom format for reporting broken links and pass the format recognizable by GitHub Actions. I've created an issue for it in xrefcheck (serokell/xrefcheck#297), but I can't promise how soon it will be done. One problem with this approach is that the job status will be green, so it will be easier to miss. But that's a trade-off.

Two other options have been proposed in the issue comment:

  1. Make the check not required for merging. This is already the case, this check is not required for merging.
  2. Make this a nightly (or weekly) check. I can do that, should be trivial.

I'm new to this repo, so please let me know how you would like to proceed:

  1. Do nothing since this check is not required for merging.
  2. Run this check on schedule. In this case do you have any preferences how often to run it? Once a day, week, etc.?
  3. Update it to always succeed, but report broken links as warnings recognizable by GitHub Actions. It will take some time to make the desired change in xrefcheck and release a new version.

@gromakovsky gromakovsky force-pushed the gromak/add-xrefcheck-to-ci branch from 89329ea to f85bc43 Compare February 4, 2024 17:13
@gromakovsky gromakovsky requested a review from a team as a code owner February 4, 2024 17:13
@gromakovsky
Copy link
Author

Drive-by comment: some broken links detected here are (also) fixed in #884

Right, after rebasing onto latest main I don't have any errors running xrefcheck locally. I've pushed the rebased branch.

@jasagredo
Copy link
Contributor

First of all, thanks for your PR. I think having an automated link checker is a good thing to have, and xrefcheck could very well be that checker. However, xrefcheck-0.2.2 reports no errors on this branch, but xrefcheck@master does. I do have some question on that regard:

  1. Is it expected that a newer version will be released from master? Many things seem to have changed since the latest release (link).
  2. What is the fundamental reason for considering permanent redirects as errors? other tools like markdown-link-check seem to just check if the response status is correct. Why does xrefcheck go that extra step even for links that just add / at the end of the url?
  3. How is it that [email protected] does not find many permanent redirects? I'm running both versions on the same code and there are many redirects found by the master version that are not found by the 0.2.2 version.

With 0.2.2:

❯ /root/.local/bin/xrefcheck
Configuration file not found, using default config for GitHub repositories

All repository links are valid.

With master:

❯ /root/.local/bin/xrefcheck
Configuration file not found, using default config for GitHub repositories

=== Scan errors found ===

  ➥  In file docs/website/contents/for-developers/Glossary.md
     scan error at src:1:1-31:

     Unrecognised option "file" perhaps you meant <"ignore link"|"ignore paragraph"|"ignore all">

Scan errors dumped, 1 in total.

=== Invalid references found ===

  ➥  In file CONTRIBUTING.md
     bad reference (external) at src:31:4-109:
       - text: "Preflight guide"
       - link: https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/PreflightGuide

     Permanent redirect found:
       -| https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/PreflightGuide
       -> https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/PreflightGuide/
          ^-- stopped before this one

  ➥  In file CONTRIBUTING.md
     bad reference (external) at src:32:4-96:
       - text: "Glossary"
       - link: https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary

     Permanent redirect found:
       -| https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary
       -> https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary/
          ^-- stopped before this one

  ➥  In file CONTRIBUTING.md
     bad reference (external) at src:32:4-96:
       - text: "Glossary"
       - link: https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary

     Permanent redirect found:
       -| https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary
       -> https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary/
          ^-- stopped before this one

  ➥  In file docs/website/contents/for-developers/Glossary.md
     bad reference (file-local) at src:82:24-74:
       - text: "active slot coefficient"
       - anchor: active-slot-coefficient

     Anchor 'active-slot-coefficient' is not present, did you mean:
         - active-slot-coefficient-f (header II) at src:9:1-31

  ➥  In file docs/website/contents/for-developers/Glossary.md
     bad reference (file-local) at src:164:48-76:
       - text: "Chain Growth"
       - anchor: chain-growth

     Anchor 'chain-growth' is not present, did you mean:
         - chain-growth-property (header II) at src:78:1-25

  ➥  In file docs/website/contents/for-developers/Glossary.md
     bad reference (file-local) at src:253:4-22:
       - text: "Genesis"
       - anchor: genesis

     Anchor 'genesis' is not present, did you mean:
         - genesis-block (header II) at src:167:1-17
         - genesis-window (header II) at src:202:1-18

  ➥  In file docs/website/contents/for-developers/PreflightGuide.md
     bad reference (external) at src:77:108-125:
       - text: "Mithril client"
       - link: https://mithril.network/doc

     Permanent redirect found:
       -| https://mithril.network/doc
       -> https://mithril.network/doc/
          ^-- stopped before this one

  ➥  In file docs/website/contents/for-developers/PreflightGuide.md
     bad reference (external) at src:78:8-35:
       - text: "here"
       - link: https://mithril.network/doc/manual/getting-started/bootstrap-cardano-node

     Permanent redirect found:
       -| https://mithril.network/doc/manual/getting-started/bootstrap-cardano-node
       -> https://mithril.network/doc/manual/getting-started/bootstrap-cardano-node/
          ^-- stopped before this one

  ➥  In file docs/website/contents/for-developers/PreflightGuide.md
     bad reference (external) at src:144:378-468:
       - text: "Solana"
       - link: https://docs.solana.com/running-validator/validator-reqs#hardware-recommendations

     Permanent redirect found:
       -| https://docs.solana.com/running-validator/validator-reqs#hardware-recommendations
       -> https://docs.solanalabs.com/operations/requirements
          ^-- stopped before this one

  ➥  In file CONTRIBUTING.md
     bad reference (external) at src:32:4-96:
       - text: "Glossary"
       - link: https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary

     Permanent redirect found:
       -| https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary
       -> https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary/
          ^-- stopped before this one

Invalid references dumped, 10 in total.

@dnadales dnadales removed their assignment Jun 25, 2024
@int-index int-index force-pushed the gromak/add-xrefcheck-to-ci branch from f85bc43 to 3de3464 Compare December 31, 2024 12:46
@int-index
Copy link

int-index requested review from nfrisby, jasagredo, amesgen, fraser-iohk, dnadales and geo2a as code owners

For what it's worth, this review request was generated automatically by GitHub due to a rebase. I didn't mean to request another review just yet—not until I respond to the questions from the previous review.

@int-index
Copy link

int-index commented Jan 1, 2025

Is it expected that a newer version will be released from master? Many things seem to have changed since the latest release.

Yes. We wanted to include a few other improvements before cutting a release, and now xrefcheck-0.3.0 is available.

What is the fundamental reason for considering permanent redirects as errors?

Two reasons:

  1. For readers of a document, opening a link is more efficient if it is direct. Perhaps that alone isn't sufficient grounds to consider redirects as errors, but it makes them undesirable.

  2. There are websites (e.g. GitLab) that don't follow the usual HTTP conventions and do redirects instead of responding with 404 if a document is unavailable, e.g. try https://gitlab.com/gitlab-org/gitlab/-/blob/master/asdf.

    One could argue that GitLab shouldn't do that. However, since redirects are undesirable anyway, the pragmatic choice is to report them as errors.

How is it that [email protected] does not find many permanent redirects?

The old version follows those permanent redirects instead of reporting them. Our experience of using xrefcheck with GitLab links made us reconsider that design choice, so xrefcheck-0.3.0 reports those redirects as errors.

Copy link
Contributor

@jasagredo jasagredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation above.

The PR looks good, however we require commits to be signed. You will need to force-push the branch with signed commits.

@gromakovsky gromakovsky force-pushed the gromak/add-xrefcheck-to-ci branch from 3de3464 to a22feb3 Compare January 6, 2025 10:14
Problem: Markdown files contain a plenty of links which
tend to get outdated occasionally. It's hard to notice
when a certain link (local or global) becomes broken.

Solution: in the past, the `xrefcheck` tool was used
to find broken links:
* IntersectMBO#3
* IntersectMBO#454

See https://github.com/serokell/xrefcheck
This PR automates `xrefcheck` running by adding it as a new
GitHub Action job.
It uses [xrefcheck-action](https://github.com/serokell/xrefcheck-action)
under the hood.

Resolves IntersectMBO#4
Problem: several links in Markdown files are broken.

Solution:

1. Links in References.md, index.md and TechnicalReports.md
   are invalid in the sense of Markdown and this repo, but
   they work on the website, so they are explicitly ignored.

2. Glossary.md contains many links to anchors in the same file.
   These anchors work differently in Markdown and Docusarus as can
   be seen here:
   https://ouroboros-consensus.cardano.intersectmbo.org/docs/for-developers/Glossary/
   Ideally, they should be fixed to work on the resulting website,
   in which case all of them would be broken in the sense of Markdown.
   Fixing them for Docusaurus is out of scope of this commit,
   but we force xrefcheck to ignore all links in that file in
   preparation for that fix.

3. There is also an anchor link in VersioningSchemeDecision.md,
   but this file is not used in the resulting website, so we fix
   it according to Markdown rules.

4. Redirects have been replaced with direct links to make opening links
   more efficient for readers.

5. Fixed outdated links and anchors in CONTRIBUTING.md and
   ouroboros-consensus-cardano/README.md
@int-index int-index force-pushed the gromak/add-xrefcheck-to-ci branch from a22feb3 to 7cf3a36 Compare January 16, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚫 Help needed
Development

Successfully merging this pull request may close these issues.

Check links on CI
6 participants