Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix how we handle the Names attribute #11

Open
r0mainK opened this issue Sep 26, 2019 · 5 comments
Open

Fix how we handle the Names attribute #11

r0mainK opened this issue Sep 26, 2019 · 5 comments

Comments

@r0mainK
Copy link
Contributor

r0mainK commented Sep 26, 2019

Currently, when a node is being traversed, we check multiple attributes to find its value. To do so, we rely on the assumption that there will be only one value at best, and check, successively:

  1. The Name/name field.
  2. The Text/text field
  3. The Value/value field
  4. Finally, the Names attribute. We handle this one differently: if the field is an array of nodes, which it always is, then we join the value of each of these node's Name attribute, if they have one.

In order to avoid duplication, we then ignore the Names attribute in the goDeeper function. While this provides utility, it has 2 risks, one of which I am certain exists:

  1. If a node has a value held in one of the 3 first attributes and a non-empty Names attribute, we will not traverse the nodes in it. I have not (yet) found an example for this, but it is a possibility.
    2 . If Names contain proper nodes that are nested, thus not having a Name attribute. This is the case in the following example:
from foo import bar as baz

Here, the uast:RuntimeImport node has the following structure:

  • a Path attribute with a single uast:Identifier node, with value foo
  • a Names attribute with a single uast:Alias node, with a Node attribute containing the uast:Identifier node with value bar, and a Name attribute containing uast:Identifier node with value baz

Now, unfortunately this is not a Babelfish bug, as this structure for aliases is always the same: we replace a uast:Identifier node with a uast:Alias node that has a Name and a Node attribute. Also, this makes sense, as in the following snippet the Names attribute would have
2 alias nodes instead of one:

from foo import bar as baz, bar2 as baz2

Anyway, this is a problem, because currently we loose this information. In the first snippet, only the foo identifier is kept. I am going to check out what we can do and will push a PR when a proper solution is found. I think we should start dealing with import nodes in a specific way, and just go deeper on the Names attribute in all other cases, but I've got to look more into this before.

@vmarkovtsev
Copy link
Collaborator

Handling different node types is something that I really want to avoid, because it is the road to hellish complexity and big hacks.

But anyway, let's ignore this problem for now because our time is very little. Let's use what we have in the current DB. It will not be so precise, but 100% enough for the library embeddings task.

@r0mainK
Copy link
Contributor Author

r0mainK commented Sep 26, 2019

I agree - although doing so only for imports may be useful in the future. Anyway yeah, I just reported the bug because I saw it, as we won't be rerunning this any time soon I'd rather reference them fixing later.

@r0mainK
Copy link
Contributor Author

r0mainK commented Sep 27, 2019

Found another related issue, the sorting of nodes get's messed up because of this. For instance, if we have the Java import:

import foo.Bar

Then we get a uast:Import node with Bar the fist and only uast:Identifier node in it's Names attribute, and foo the first and only uast:Identifier node in the Names attribute of the Path. The starting position of the import node is col=1, the position of the Path node is col=8, the position of the foo Identifier node is also col=8, and the position of the Bar Identifier node is col=12. However, due to how we handle Names, we get Bar before foo, because we think it's position is 1, while the position of foo is seen as 8.

@r0mainK
Copy link
Contributor Author

r0mainK commented Sep 27, 2019

Found another related issue. Basically, all import from Ruby get removed due to this. That is because the uast:RuntimeImport node are contained in a Names attribute of node holding the require token.

@r0mainK
Copy link
Contributor Author

r0mainK commented Sep 27, 2019

Found another related issue, basically part of JavaScript imports are getting purged like Python. It is not impeding progress but thought I'd mention it, as there were some strange stuff as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants