Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clap derive should parse markdown doc comment into normal text #2389

Open
2 tasks done
ducaale opened this issue Mar 6, 2021 · 29 comments · May be fixed by #4444 or #5891
Open
2 tasks done

Clap derive should parse markdown doc comment into normal text #2389

ducaale opened this issue Mar 6, 2021 · 29 comments · May be fixed by #4444 or #5891
Labels
A-derive Area: #[derive]` macro API C-bug Category: Updating dependencies E-medium Call for participation: Experience needed to fix: Medium / intermediate S-blocked Status: Blocked on something else such as an RFC or other implementation work.

Comments

@ducaale
Copy link
Contributor

ducaale commented Mar 6, 2021

Maintainer's notes:

  • Workaround: verbatim_doc_comment attribute

Blocked on finalizing details of StyledStr, including


Please complete the following tasks

  • I have searched the discussions
  • I have searched the existing issues

Rust Version

rustc 1.48.0 (7eac88abb 2020-11-16)

Clap Version

3.0.0-beta.2

Minimal reproducible code

use clap::Clap;

#[derive(Clap, Debug)]
#[clap(name = "xh")]
struct Cli {
    /// Optional key-value pairs to be included in the request.
    ///
    /// - key:=value to add a complex JSON value (e.g. `numbers:=[1,2,3]`)
    /// - key@filename to upload a file from filename (with --form)
    /// - header:value to add a header
    /// - header: to unset a header
    /// - header; to add a header with an empty value
    ///
    /// A backslash can be used to escape special characters (e.g. weird\:key=value).
    #[clap(value_name = "REQUEST_ITEM")]
    raw_rest_args: Vec<String>
}

fn main() {
      let _ = Cli::parse();
}

Steps to reproduce the bug with the above code

cargo run -- --help

Actual Behaviour

ARGS:
    <REQUEST_ITEM>...
            Optional key-value pairs to be included in the request.

            * key==value to add a parameter to the URL \n * key=value to add a JSON field (--json)
            or form field (--form) * key:=value to add a complex JSON value (e.g.
            `numbers:=[1,2,3]`) * key@filename to upload a file from filename (with --form) *
            header:value to add a header * header: to unset a header * header; to add a header with
            an empty value

            A backslash can be used to escape special characters (e.g. weird\:key=value).

Expected Behaviour

ARGS:
    <REQUEST_ITEM>...
            Optional key-value pairs to be included in the request.

            * key==value to add a parameter to the URL
            * key=value to add a JSON field (--json) or form field (--form)
            * key:=value to add a complex JSON value (e.g. `numbers:=[1,2,3]`)
            * key@filename to upload a file from filename (with --form)
            * header:value to add a header
            * header: to unset a header
            * header; to add a header with an empty value

            A backslash can be used to escape special characters (e.g. weird\:key=value).

Additional Context

In StructOpt, it possible to use the long_help attribute to preserve newlines but I couldn't find something similar in clap-derive.

Just found that it is called long_about in clap-derive.

Debug Output

Will add if requested

@ducaale ducaale added the C-bug Category: Updating dependencies label Mar 6, 2021
@pksunkara pksunkara added A-derive Area: #[derive]` macro API D: easy 💸 $5 labels Mar 6, 2021
@pksunkara pksunkara added this to the 3.0 milestone Mar 6, 2021
asteding pushed a commit to asteding/clap that referenced this issue Mar 9, 2021
@pksunkara
Copy link
Member

There is quite a bit of discussion in #2401

@pksunkara pksunkara modified the milestones: 3.0, 3.1 Apr 23, 2021
@pksunkara pksunkara changed the title Preserve newlines in Doc comments Clap derive should parse markdown doc comment into normal text Nov 14, 2021
@epage epage removed this from the 3.1 milestone Dec 13, 2021
@epage epage added the S-waiting-on-design Status: Waiting on user-facing design to be resolved before implementing label Dec 13, 2021
@epage
Copy link
Member

epage commented Dec 13, 2021

One of the challenges with this is we'd basically need to pull in pulldown-cmark which is quite a large dependency. This would need to be put behind a feature flag.

@ducaale
Copy link
Contributor Author

ducaale commented Dec 13, 2021

My main goal was to preserve newlines in the doc comment and that should be achievable with the verbatim_doc_comment attribute.

Example

// clap = { version = "3.0.0-rc.4", features = ["derive"] }

use clap::Parser;

#[derive(Parser, Debug)]
#[clap(name = "xh")]
struct Cli {
    /// Optional key-value pairs to be included in the request.
    ///
    ///   • key:=value to add a complex JSON value (e.g. `numbers:=[1,2,3]`)
    ///   • key@filename to upload a file from filename (with --form)
    ///   • header:value to add a header
    ///   • header: to unset a header
    ///   • header; to add a header with an empty value
    ///
    /// A backslash can be used to escape special characters (e.g. weird\:key=value).
    #[clap(value_name = "REQUEST_ITEM", verbatim_doc_comment)]
    raw_rest_args: Vec<String>,
}

fn main() {
    let _ = Cli::parse();
}

Output

USAGE:
    temp-clap [REQUEST_ITEM]...

ARGS:
    <REQUEST_ITEM>...
            Optional key-value pairs to be included in the request.

              • key:=value to add a complex JSON value (e.g. `numbers:=[1,2,3]`)
              • key@filename to upload a file from filename (with --form)
              • header:value to add a header
              • header: to unset a header
              • header; to add a header with an empty value

            A backslash can be used to escape special characters (e.g. weird\:key=value).

OPTIONS:
    -h, --help
            Print help information

I wouldn't mind if this issue was to be closed.

@epage
Copy link
Member

epage commented Dec 13, 2021

I think its still an interesting topic for us to weigh out. I did call out the workaround in the issue so its more discoverable.

@I60R
Copy link

I60R commented Dec 29, 2021

We may need only a subset of markdown features in CLI parser:

  • newline escaping (implemented in Allow to escape newline with \ like in markdown #3228)
  • lists as suggested here (should be relatively easy to implement, even with nesting)
  • bold, italic, bold-italic, inline code - if colors enabled (not sure whether it's worth effort, but why not?)
  • fenced code blocks (without syntax highlighting, of course)

There also could be some mechanism to cut off help message from doc comment as @epage suggested here — IMO markdown headers are a perfect mechanism for that e.g.:

  • without any header everything goes to -h --help output
    • that's the same as if everything was placed under # Help header
  • if # Long help is specified that section will be appended to --help output (only)
  • we may also have # Manpage section (if clap will provide some mechanism for manpage generation)
  • text under all other headers will be ignored by clap

I think this subset should satisfy 99.9% of all use cases of clap, everyone should like it, and it doesn't seem to be so complicated to require pulldown-cmark or any other heavy dependency.

@epage
Copy link
Member

epage commented Dec 29, 2021

My experience writing one off parsers has made me a bit cautious of doing so. I threw together some md parser benchmarks to see what might be small enough for being an optional dependency. minimad looks tempting.

@epage
Copy link
Member

epage commented Dec 30, 2021

Another thought on markdown support, should we support it at runtime, compile time, or both?

The core of this Issue is our line breaks which is a problem exclusive to the derive API. For that, we could build markdown parsing directly into clap_derive.

Another step up is markdown formatting. For that, we'd want to expose it to both APIs. Say we generalized our Colorizer struct and accepted it for all our user-facing inputs (for #1790, #1433), users can do whatever styling they want, though it might be a bit arduous. We then provided a markdown-parsing macro that would code-generate Colorizer. clap_derive would then have this built-in for all doc comments.

The main benefit of this is that this would remove the binary-size overhead of the markdown parser. We'd still need to compile it (though it'd be optional). Though this would mean we wouldn't regress in --help and error performance by introducing markdown, the performance would have to be pretty bad for us to care.

This has the added benefit that we wouldn't need yet-another Setting in the API for doing controlling this at runtime.

@epage
Copy link
Member

epage commented Dec 30, 2021

Another question for us to answer in this issue is how disruptive would markdown parsing be. While --helps output is not considered "stable", we shouldn't dramatically alter users crafting of their help output. To what degree of markdown parsing can we add in a patch, a feature release, or a breaking release?

epage added a commit to epage/clap that referenced this issue Dec 30, 2021
This reverts commit 333b993.

PR clap-rs#1810's motivation was effectively "this is redundant with `\n`".
That is a fine motivation.  Unfortunately, we don't have a way to force
a hard break in `clap_derive`.  clap-rs#2389 should help with this eventually
but we shouldn't hold up 3.0 to get that ready.  So in the mean time, we
are restoring `{n}`.

We have clap-rs#3230 for tracking the re-removal of it.
@pronebird
Copy link

I'd only ask for italic and bold for basic formatting.

@nrdxp
Copy link

nrdxp commented Oct 20, 2024

In my case I just want to add a space in a list without having to add a newline between the bullets and without having to use verbatim doc comments (so wrap_help is respected properly)

@ModProg
Copy link
Contributor

ModProg commented Jan 19, 2025

Is there some specification on how this should work? Should we fully parse markdown, or should we just implement the parts that make sense, i.e., lists, bold, italic? I'm currently quite annoyed by lists not working so I'd be willing to invest some time into implementing some solution here.

@alerque
Copy link
Contributor

alerque commented Jan 19, 2025

How established is CommonMark in Rust docs contexts? I would suggest pulldown-cmark or, if another Markdown parser is established in Rust context whatever that is, as a full flow Markdown parser makes more sense than a "whatever makes sense/minimalist" approach. Markdown is notoriously hard to actually get right even if small things like emphasis or list parsing.

@epage
Copy link
Member

epage commented Jan 20, 2025

Subsets of markdown were covered earlier in the thread
#2389 (comment)

@ModProg
Copy link
Contributor

ModProg commented Jan 20, 2025

#2389 (comment) you also cautioned against writing a custom parser should we just use pulldown-cmark and ignore the ones we don't support?

Also should this be configurable, for some the mapping to ANSI is pretty clear e.g. bold or italic while for others it's unclear e.g. code_block should that set a backgroun? or keep the quotes ` ?

Another thing to consider would be should support be implemented for html tags like in color_print: https://docs.rs/color-print/latest/color_print/index.html#list-of-accepted-tags

@epage
Copy link
Member

epage commented Jan 20, 2025

#2389 (comment) you also cautioned against writing a custom parser should we just use pulldown-cmark and ignore the ones we don't support?

Looking at md parser benchmarks, I would be curious about how well we could get by with minimad.

Details like code_lock would need to be figured out. In a way, it is another form of literal.

It would be good to have a way for people to set colors but I would be worried about full HTML support.

@ModProg
Copy link
Contributor

ModProg commented Jan 20, 2025

It would be good to have a way for people to set colors but I would be worried about full HTML support.

The only reason I mentioned color_print is that it is referenced in the clap docs, and it could be nice to also be able to use it inside of markdown.

I would agree that any block elements would be annoying to deal with, so only inline ones could work. Maybe adding some tags for all the styles definable could be useful, i.e., <header>, <error>, usage, ...

Details like code_block would need to be figured out. In a way, it is another form of literal.

Makes sense, though one could also add the different markdown elements to Styles, e.g., if someone wants to color their bold text red.

The other question is should mark down parsing be enabled at runtime as well? If not, the compile time parsed markdown needs to be converted like you suggested here:

One design problem I've been running into is how to render markdown at compile time when the colors aren't known until runtime.
I think the way to do this is to allow a fn(fmt, palette) -> fmt::Result to be passed into StyledStr and the derive would generate this, parsing the markdown at compile time but rendering the styles at runtime.

Even if we support markdown parsing at runtime, one might like to move that effort to compilation where possible, but we could also just implement it for runtime first and then check if there is a relevant performance penalty end disable it if so.

The whole markdown feature should probably be behind a feature flag anyway.

Would you be in favor of building a PoC for markdown parsing to test it out?

@epage
Copy link
Member

epage commented Jan 20, 2025

I suspect we should do this all at compile time though either direction comes with its problems.

One combination of challenges

  • I intentionally switched StyledStr to only hold ANSI escape codes
  • If we want to provide style classes like "header", the conversion of that to ANSI is only really available at runtime

iirc there were more. Feel free to play around and come back with more ideas. I feel like I'd want things settled a little more before adding an unstable feature for this.

@ModProg
Copy link
Contributor

ModProg commented Jan 24, 2025

I found two issues with minimad, doesn't support lists using - and doesn't support multilevel list, because it thinks they are codeblocks Canop/minimad#2.

While multilevel lists probably aren't as important, the list style could be quite confusing and result in unnecessary issues :D

- Not recognised as list
* Is a list
    * is a code block

results in:

Text {
    lines: [
        Normal(
            Composite {
                style: Paragraph,
                compounds: [],
            },
        ),
        Normal(
            Composite {
                style: Paragraph,
                compounds: [
                    "- Not recognised as list",
                ],
            },
        ),
        Normal(
            Composite {
                style: ListItem(
                    0,
                ),
                compounds: [
                    "Is a list",
                ],
            },
        ),
        Normal(
            Composite {
                style: Code,
                compounds: [
                    "* is a code block",
                ],
            },
        ),
    ],
}

I'm gonna experiment with pulldown, but shouldn't be too bad to switch that later on

@epage
Copy link
Member

epage commented Jan 24, 2025

I wonder if they'd be open to adding more list styles.

@ModProg
Copy link
Contributor

ModProg commented Jan 24, 2025

I wonder if they'd be open to adding more list styles.

Might be, but the issue was open for 4 years, and I was more interested in the other parts of this, but I think the main considerations are independent of parser and should be therefor easy enough to replace.

@passcod
Copy link

passcod commented Jan 24, 2025

Just because I looked it up, I'll note for context that rustdoc uses pulldown-cmark, so that would be the most directly compatible.

@ModProg ModProg linked a pull request Jan 24, 2025 that will close this issue
1 task
@ModProg
Copy link
Contributor

ModProg commented Jan 24, 2025

Created an initial PoC #5891

Image

Still some things to figure out, like should we use Unicode characters for list items? And which styles should we apply as we don't have access to the actual styles.

One thing I also need to fix is that anstyle fully resets the style when ending e.g. bold, so I need to restart italic manually.

@epage
Copy link
Member

epage commented Jan 24, 2025

I know at least Cargo is cautious on the use of unicode, auto-detecting when terminals should support it with a config to override it.

@ModProg
Copy link
Contributor

ModProg commented Jan 25, 2025

I know at least Cargo is cautious on the use of unicode, auto-detecting when terminals should support it with a config to override it.

As that'd be hard to do at compile time, should we only output ASCII for now? The only alternative would be to generate 2 help messages and then in the generated builder switch between them depending on Unicode capability.

@ModProg
Copy link
Contributor

ModProg commented Jan 26, 2025

Short works as expected, i.e., only taking the first paragraph:

Image

Long takes everything

Image

@epage Should we also investigate using headings for separating sections and if so what would be the names? # Help and # Long help (probably matching any casing)?

Following #2389 (comment) the implementation would be as follows:

  1. if the doc comment contains no top-level heading with the content help (any casing) then follow regular procedure, i.e. take first paragraph as help and everything as long_help.
  2. if the doc comment contains a top-level help heading take everything below it and use it for help if there is also a long help heading take everything below help and long help and use it for the long help.

This leaves the following questions for me:

  1. What if the doc comment only contains # Long Help?

And the following alternative implementations

  1. We could instead of # Long Help use a sub heading of # Help called ## Long.
  2. We could (if there is no # Long Help) apply the default parsing behavior to the content of # Help i.e. use the first paragraph for help and everything for long_help.
  3. Should we, if # Long Help is specified only output this section as the complete text for long_help? This could be helpful if the short help is redundant when the long_help is also printed. In this case we could use # Long Help to contain the complete long_help and use the ## Long (Help) below # Help to have the behavior of everything before ## Long is only contained in help while long_help would contain everything in the # Help section including ## Long.

@ModProg
Copy link
Contributor

ModProg commented Jan 26, 2025

btw there was one test where the output changed because of this, as now ` is dropped in the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-derive Area: #[derive]` macro API C-bug Category: Updating dependencies E-medium Call for participation: Experience needed to fix: Medium / intermediate S-blocked Status: Blocked on something else such as an RFC or other implementation work.
Projects
None yet
10 participants