Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a MakeCSL tool #244

Open
bdarcus opened this issue Jun 12, 2020 · 7 comments
Open

Create a MakeCSL tool #244

bdarcus opened this issue Jun 12, 2020 · 7 comments

Comments

@bdarcus
Copy link
Member

bdarcus commented Jun 12, 2020

This is still needed; a 21st century analog to makebst.

The idea is to use smart machine-learning-based reference parsing libraries like anystyle to feed basic metadata about the needed output style, including example formatted citations and references, and the tool would spit out a dependent or independent style.

Should be much easier for users, and more effective, than a visual editor, particularly given the massive corpus of styles and included macros this project now has.

If anyone is interested in tackling this, please see the below linked issue, where the author of anystyle and I bat around what I think are some promising ideas on implementation details:

inukshuk/anystyle#146

@denismaier
Copy link
Member

Or you'll use it together with the visual editor? Like, after a couple if questions you'll have a basic skeleton you can tweak then.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 12, 2020

Sure. But that could be a last step.

Like, if tool isn't sure on a few decisions, could give users a choice among a few output results.

Wonder if anycite itself could be extended along these lines.

@customcommander
Copy link

Have you considered Blockly?

You can use it to build a Scratch-like visual editor and let people build CSL styles by putting blocks together. The actual CSL style can be generated in real-time on the side.

@bdarcus
Copy link
Member Author

bdarcus commented Sep 9, 2021

You can use it to build a Scratch-like visual editor and let people build CSL styles by putting blocks together. The actual CSL style can be generated in real-time on the side.

Something like that could be a good piece of this.

But I still hold out hope someone can come up with a solution that does most of this automatically, from formatted output.

@customcommander
Copy link

Another thing I once contemplated was to convert a CSL style into a grammar and use that grammar to generate a parser. By feeding citations or bibliographies to a parser we can tell whether these were complying with a particular CSL style. So this could be a tool to promote the reuse of existing CSL styles.

Not exactly what you're aiming for (I think) but this may or may not be a useful tool too?

@bdarcus
Copy link
Member Author

bdarcus commented Sep 9, 2021

Sounds like it; yes.

I go into the idea a bit more in the linked discussion; particularly here:

inukshuk/anystyle#146 (comment)

It's pretty speculative at this point, but basic idea is all style logic has already been written; just need to automate reassembling it all.

But same code could be used for finding existing styles too.

@cormacrelf
Copy link

cormacrelf commented Sep 9, 2021

@customcommander that's basically how I do disambiguation in citeproc-rs, except it's stamped with the actual values of each bibliography entry to match rendered cites against. For example, the type of the entry is known so you can throw out large swathes of style code when you stamp, and eg where the title is rendered, it is the actual title. You can reuse that code with less information discarded to produce a grammar of the style in general. It already handles things like conditionals activated by disambiguation, all possible name expansions, etc., and does a minimisation step.

Even better, the resulting grammar has some helpful qualities: it is a regular grammar. No recursion or any form of internal state other than which state of the DFA it's currently on, due to macro recursion being non-terminating and hence disallowed entirely. It already spits out a regular grammar in graph form. So you can theoretically get it to spit out regular expressions too.

It would need a bit of reworking for this purpose, because the base assumptions of disambiguation are rather baked in. It only has capacity for 64 or so "free variables" which are not (and can not be) stamped by a particular reference. If you open that up you'd have to add a hundred more and also add some shadow ones, like an "is Latin/Cyrillic" + presence of each individual name component for each name in each name variable, a variable name count, an is-numeric for each number variable, has day for each date variable, ... etc. You would also be dealing with much, much bigger DFA graphs. The regexes would be enormous. And it is at the moment quite closely tied to the implementation. Overall I rate this quite difficult. Code here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants