-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser #28
Comments
hmmm, been modding expatbuilder and seems to have worked. a decent parseString could be coming, quite soon. can you feel the excitement?. |
this looks exciting... given what i just did with expat. may be able to mod that to generate domonic from huge sites? |
I managed to mod the file. easier that I thought... so that appears to work. even with lots of websites. It seems to build trees with domonic.
|
So the options are to patch that file after each install. or pip install git+https://path to my patched version i need to figure out that path and test. again. But very promising. It's so fast. |
It is a cool toolkit, but is there a way to quick transcript html page to python code? |
Hi @ipfans , thanks for feedback. There is Not yet a perfect way as I originally only set out to generate html. But it IS on the roadmap. Some more complete parsers for html/python will hopefully be ready by v1. Which I'd love to get done within 12 months. We can already get about 75% or more of the way. (but is dangerous and uses eval) see codemirror.py in this folder... or via the command line util... Also all tags recently had a so if you do:
it might work. If you have an existing dom. A precursory option was added to the renderer.
However for this to work we need a dom already parsed. As people know who use minidom (some may be coming here) . It can only parse very very strict XML not html. So it seems to work sometimes but very easily doesn't. Hence domonic parsers failing as it leverages the same. Usually failing due to content not node structure. Often the default parsers work fine for html strings without content for example. I then tried to get around this with a simple parser myself. But found I wanted to keep expanding on it and that is at the heart of domonic. an unfinished regex, in-place html to python converter. However it still has errors and the main issue is python wants keyword args last. Therefor you have to not only parse but swap around the nodes to put 'content' before _classes for example. (the only real crux of learning domonic) Anyway during investigation I found several ways to parse. python has a builtin html parser too. But you have to use it like a lexer and I've not gotten round to it yet. There's also PEG parsers and some offshelf ones. I found also a html5 c++ one referenced above. So my long term goal would be to have a default good one out of the box, with options of picking some others. for now. if you are brave domonic By using these tools you can get 75% of the way there for some huge files and manual modify and edit them to work. By rendering them then fixing the syntax issues pointed out when trying to compile. (there's a guide on the readme for common errors that can help speed this up). My biggest success was using the hacked html5 c++ parser as mentioned above and then calling pyml() on the dom it produces. However there's still issues compared to my existing parser (which isn't too bad in some cases). i.e the c++ one does not yet convert data-attributes to the keyword argument syntax format. it doesn't do this... automatically for you. So I hadn't released any further documentation until I come back to investigate parsing. Or get help. Anyway I hope these tips assist you while I'm still figuring it all out and maybe you might like the codemirror.py example. once done you may also enjoy this plugin. that will format it for you. useful plugin for formatting flat .pyml in vscodehttps://marketplace.visualstudio.com/items?itemName=mgesbert.indent-nested-dictionaryAlso as a final note. If you don't want it ALL in domonic if templating parts is laborious, you can mixin your own fstrings. See DocumentFragment example here... |
to explain maybe a little deeper. and future progress. As parser stuff is undocumented. domonic orignally had a simple regex parser, for tags only no content. which grew. domonic currently uses that... (which you then need to eval if you want to auto fix it up) but it can also use a copy of builtin in minidom parseString. This autofails with single char replacement so could take infinity to gen a working doc if the XML is not perfect. : / . I achieved that by hacking the builtin expatparser to use domonic rather than minidom. However that needs replacing by a html5 parser. so the c++ one i knocked up to prove the concept and check compatibility but is not ideal as not pure python and needs extra steps to setup on windows. so will be a later 'option'. i need to write a pure python one using the builtin if possible. There's a new window class that will eventually let you do window.location = x which I on my own fork swapped out the parseString method for to get working the c++ one. So if you need a quick fix you can do somethign like that. To help with this I've been moving some of the parse methods discovered to a new utility parse package. So if you want to play you can try to hook the data-attribute fixer to the hacked c++ parser and bingo. However the full solution I'm probably at least several months away from as I need to start a whole new one or find a compatible lib that can build with my dom as an option rather than hacking it like i did with expat. Before I can get back to my regex curiosity. Also for compatibility 'html' needs not BE the document. So a slight re-architecure on the dom is needed without breaking current useage. Which I'm also in the process of considering which should help with other dom builders. To understand what im talking about diff the native expat parser vs mine 'borrowed' one. and you will see. |
Thanks for your replies, and I made a just works version of transcript :) But it is a good news for official support. |
html5lib now has an integration point. An example exists in the /examples/parsers/html5libtest... and notes on the release. https://github.com/byteface/domonic/releases/tag/0.6.5 |
I've included html5lib. and and integration point for the c++ one.
though that one is still experimental and to test. |
I think I'm going to need a few types of parser. a normal one, one that uses python built in one, peg ones, ones that can do xml/svg/html/... as well as my evolving one. consider importing something light if it could easily output domonic style pyml. but this could do with conversations with others that are good at that kind of thing for suggestions to improve. etc.
The text was updated successfully, but these errors were encountered: