Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the code generator #75

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Refactor the code generator #75

wants to merge 5 commits into from

Conversation

pmaupin
Copy link
Collaborator

@pmaupin pmaupin commented Apr 26, 2017

  • Make it prettier
  • Make it faster
  • Make the interface to the pretty printer better

Currently, I really only care about to_source and rtrip (and tests for those). It's been years since I touched any of the other stuff.

So, I've been looking at refactoring all the kludgey pretty printing stuff I put in.

I think we should do this before the release, so that we don't release a kludgey pretty-printing API and then a better one.

I have code_gen refactored really well (I think) and the API over to the pretty printer is solid.

I would at least like to have it printing as well as it does now, preferably more PEP8 like. Right now, it doesn't do as many string changes as it used to, but it will be much easier to make the pretty printer better (and/or to replace it from external code), because all the string munging happens after the code generator is finished. (Before, the literal string fixups happened during the middle of code generation.)

Also, the code generator creates a list of lists, so the pretty printer doesn't have to separate the lines out -- they are already separated.

I'm tossing this out there to start the conversation. It needs PEP8 cosmetic changes as well, but I need to go to bed.

- Make it prettier
- Make it faster
- Make the interface to the pretty printer better
@@ -132,6 +132,7 @@ class ExplicitNodeVisitor(ast.NodeVisitor):

"""

@staticmethod
def abort_visit(node): # XXX: self?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the addition of staticmethod decorator would make the # XXX: self? comment obsolete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh. Yeah. (Mumbles to himself about why he did something based on a comment then didn't remove the comment.)


def to_source(node, indent_with=' ' * 4, add_line_information=False,
pretty_string=pretty_string, pretty_source=pretty_source):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would potentially break the public API. We need a at least a short deprecation period (maybe remove them in 0.7?) to remove them from the to_source signature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting case. These weren't there in 0.5 (pretty printer added right after that), and I unthinkingly already changed the call signature to pretty_source a couple of commits ago, and the internals are now way different.

Having said that, I think we can actually do a reasonable job of supporting that interim API with the current code, so I'll go back and look at that. I'm at (real) work today, so that will have to wait until Thursday.

Also, I think that it might be useful to make a ToSource class to make things more easily configurable, and make to_source a convenience method of that class.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch! You're right that these weren't in 0.5.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, unless you know of a major user using them, we should probably not support them, since anybody using them was pulling from git rather than pypi anyway.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only major user I know of is Hylang and I doubt they are doing anything with these parameters :) So let's get rid of them!


def else_body(node,
# Constants
If=ast.If, set_precedence=set_precedence,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, are these optimizations still produce big speedups in a modern Python? We don't use this technique in the Python stdlib anymore because some core developers said that the performance gain is no longer huge, but I don't remember seeing any benchmark.

Copy link
Collaborator Author

@pmaupin pmaupin Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gains do drop off. I tend to apply them to functions that are called often, and/or reference the attributes in loops. I'd have to go back and look, but I think I'm getting about an 8% speedup in to_unparse (around 5% overall in rtrip) this way.

There were additional gains to be had by adding a proliferation of dispatchers, e.g. adding a write method that handled a single item with no for loop, but that was too much stuff to remember, and made the API too big.

I actually went the other way, decomposing conditional_write into condition and write, and coalescing the write and visit methods into a single write, so in one very real sense, these parameter constant micro-optimizations (which I don't mind looking at and thinking about or ignoring, as the case may be) offset the speed losses from making the API more regular and orthogonal and reducing the number of dispatchers.

I also reverted some changes there was a big discussion about last year about whether we should pass a string or a boolean for async to node handlers that can handle two things. Two things decided it for me: (1) since async is now a keyword, my editor screams at me everywhere it appears, and I think the editor is right -- there's no real reason to use it as a variable name even where you can; and (2) to me this is still more natural (and less code). Having said that, if anybody still really feels strongly the other way, we could change it back, but they have to pick a different variable name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, overall, I'd say that the gains from doing this are much smaller. Attribute lookups must be much faster than they used to be, because one of the things that really surprised me was that the gains from using closures are much smaller now, and non-existent or negative in a few corner, presumably for the same reason. You have to have a lot of attribute accesses to offset the cost of building the closure.

pmaupin added 3 commits April 29, 2017 16:27
 - Make interface to code_gen better
 - Support old 0.5 API better
 - Shorten more lines by splitting strings
@pmaupin
Copy link
Collaborator Author

pmaupin commented Apr 29, 2017

OK, @berkerpeksag I took your hint and removed the micro-optimizations.

The code looks a lot better, and now I think it performs a lot better, too. Stdlib roundtrips aren't nearly so ugly, because long strings are wrapped.

Anyway, this is all I have time for for now. I believe the code is in pretty good shape:

  • It's slower than before the pretty printer, but much faster than a month ago
  • All the stdlibs roundtrip fine
  • Many more unittests than we had before
  • There are no current bugs that I know of except non-ASCII source files under Python 2.0.

Still needs more doc and whatever new things we want to do to rtrip, but I don't know that we actually have anything gating a 0.6 release, given that the doc is better than it used to be.

 - Add a test for subclassing that shows how to add custom
   nodes for comments, as requested in issue #50.
@pmaupin
Copy link
Collaborator Author

pmaupin commented Apr 30, 2017

In my local copy, I had made to_source a class method, and then backed it out, because I didn't think I had a use-case.

Then I remembered the use-case that was in the back of my mind -- issue #50 doesn't really need changes in astor itself, if the code is structured well enough, so I went ahead and checked in changes to support that.

Maybe later, after we have some experience and a lot of demand, we could make this a standard library feature, but I think the new test file showing how to add comment nodes should be more than good enough for anybody who wants to do that for now.

@radomirbosak
Copy link
Contributor

What's the state of this Pull Request? Is anything blocking us from merging it?

@pmaupin mentions writing more docs - is this still needed?

As this PR sits open, it will become harder and harder to rebase it because we're making changes to astor.code_gen.py.

Since this PR was opened, we've merged 26 commits

$ git log --oneline 28b081a..master | wc -l
26

and changed these files:

$ git diff --stat 28b081a..master
 .travis.yml            |   5 +++
 AUTHORS                |   2 ++
 Makefile               |   6 ++--
 README.rst             |   6 ++--
 astor/__init__.py      |  11 ++----
 astor/code_gen.py      |  18 +++++-----
 astor/codegen.py       |  11 ++++++
 astor/rtrip.py         |  67 ++----------------------------------
 astor/tree_walk.py     |   2 +-
 docs/changelog.rst     | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 docs/index.rst         | 240 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------
 setup.cfg              |  42 +++++++++++++++++++++++
 setup.py               |  56 ++++++++----------------------
 tests/support.py       |  44 ++++++++++++++++++++++++
 tests/test_code_gen.py |  18 +++++++---
 tests/test_misc.py     |  50 +++++++++++++++------------
 tox.ini                |   4 +--
 17 files changed, 453 insertions(+), 288 deletions(-)

However, the code_gen.py file changed minimally, so there shouldn't be any problem rebasing this PR.

$ git diff 28b081a..master -- astor/code_gen.py | xsel -b
diff --git a/astor/code_gen.py b/astor/code_gen.py
index 7c27f70..47d6acc 100644
--- a/astor/code_gen.py
+++ b/astor/code_gen.py
@@ -308,8 +308,8 @@ class SourceGenerator(ExplicitNodeVisitor):
         self.statement(node)
         self.generic_visit(node)
 
-    def visit_FunctionDef(self, node, async=False):
-        prefix = 'async ' if async else ''
+    def visit_FunctionDef(self, node, is_async=False):
+        prefix = 'async ' if is_async else ''
         self.decorators(node, 1 if self.indentation else 2)
         self.statement(node, '%sdef %s' % (prefix, node.name), '(')
         self.visit_arguments(node.args)
@@ -322,7 +322,7 @@ class SourceGenerator(ExplicitNodeVisitor):
 
     # introduced in Python 3.5
     def visit_AsyncFunctionDef(self, node):
-        self.visit_FunctionDef(node, async=True)
+        self.visit_FunctionDef(node, is_async=True)
 
     def visit_ClassDef(self, node):
         have_args = []
@@ -364,24 +364,24 @@ class SourceGenerator(ExplicitNodeVisitor):
                 self.else_body(else_)
                 break
 
-    def visit_For(self, node, async=False):
+    def visit_For(self, node, is_async=False):
         set_precedence(node, node.target)
-        prefix = 'async ' if async else ''
+        prefix = 'async ' if is_async else ''
         self.statement(node, '%sfor ' % prefix,
                        node.target, ' in ', node.iter, ':')
         self.body_or_else(node)
 
     # introduced in Python 3.5
     def visit_AsyncFor(self, node):
-        self.visit_For(node, async=True)
+        self.visit_For(node, is_async=True)
 
     def visit_While(self, node):
         set_precedence(node, node.test)
         self.statement(node, 'while ', node.test, ':')
         self.body_or_else(node)
 
-    def visit_With(self, node, async=False):
-        prefix = 'async ' if async else ''
+    def visit_With(self, node, is_async=False):
+        prefix = 'async ' if is_async else ''
         self.statement(node, '%swith ' % prefix)
         if hasattr(node, "context_expr"):  # Python < 3.3
             self.visit_withitem(node)
@@ -392,7 +392,7 @@ class SourceGenerator(ExplicitNodeVisitor):
 
     # new for Python 3.5
     def visit_AsyncWith(self, node):
-        self.visit_With(node, async=True)
+        self.visit_With(node, is_async=True)
 
     # new for Python 3.3
     def visit_withitem(self, node):

@pmaupin
Copy link
Collaborator Author

pmaupin commented May 4, 2018 via email

@berkerpeksag
Copy link
Owner

I think I'm the blocker here :) I was going to take a look Patrick's PRs (this one and #78) but unfortunately never had a chance to create some time for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants