Fix how we handle the Names attribute #11

r0mainK · 2019-09-26T15:18:42Z

Currently, when a node is being traversed, we check multiple attributes to find its value. To do so, we rely on the assumption that there will be only one value at best, and check, successively:

The Name/name field.
The Text/text field
The Value/value field
Finally, the Names attribute. We handle this one differently: if the field is an array of nodes, which it always is, then we join the value of each of these node's Name attribute, if they have one.

In order to avoid duplication, we then ignore the Names attribute in the goDeeper function. While this provides utility, it has 2 risks, one of which I am certain exists:

If a node has a value held in one of the 3 first attributes and a non-empty Names attribute, we will not traverse the nodes in it. I have not (yet) found an example for this, but it is a possibility.
2 . If Names contain proper nodes that are nested, thus not having a Name attribute. This is the case in the following example:

from foo import bar as baz

Here, the uast:RuntimeImport node has the following structure:

a Path attribute with a single uast:Identifier node, with value foo
a Names attribute with a single uast:Alias node, with a Node attribute containing the uast:Identifier node with value bar, and a Name attribute containing uast:Identifier node with value baz

Now, unfortunately this is not a Babelfish bug, as this structure for aliases is always the same: we replace a uast:Identifier node with a uast:Alias node that has a Name and a Node attribute. Also, this makes sense, as in the following snippet the Names attribute would have
2 alias nodes instead of one:

from foo import bar as baz, bar2 as baz2

Anyway, this is a problem, because currently we loose this information. In the first snippet, only the foo identifier is kept. I am going to check out what we can do and will push a PR when a proper solution is found. I think we should start dealing with import nodes in a specific way, and just go deeper on the Names attribute in all other cases, but I've got to look more into this before.

The text was updated successfully, but these errors were encountered:

vmarkovtsev · 2019-09-26T15:32:14Z

Handling different node types is something that I really want to avoid, because it is the road to hellish complexity and big hacks.

But anyway, let's ignore this problem for now because our time is very little. Let's use what we have in the current DB. It will not be so precise, but 100% enough for the library embeddings task.

r0mainK · 2019-09-26T15:49:14Z

I agree - although doing so only for imports may be useful in the future. Anyway yeah, I just reported the bug because I saw it, as we won't be rerunning this any time soon I'd rather reference them fixing later.

r0mainK · 2019-09-27T08:16:50Z

Found another related issue, the sorting of nodes get's messed up because of this. For instance, if we have the Java import:

import foo.Bar

Then we get a uast:Import node with Bar the fist and only uast:Identifier node in it's Names attribute, and foo the first and only uast:Identifier node in the Names attribute of the Path. The starting position of the import node is col=1, the position of the Path node is col=8, the position of the foo Identifier node is also col=8, and the position of the Bar Identifier node is col=12. However, due to how we handle Names, we get Bar before foo, because we think it's position is 1, while the position of foo is seen as 8.

r0mainK · 2019-09-27T11:16:50Z

Found another related issue. Basically, all import from Ruby get removed due to this. That is because the uast:RuntimeImport node are contained in a Names attribute of node holding the require token.

r0mainK · 2019-09-27T11:22:16Z

Found another related issue, basically part of JavaScript imports are getting purged like Python. It is not impeding progress but thought I'd mention it, as there were some strange stuff as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix how we handle the Names attribute #11

Fix how we handle the Names attribute #11

r0mainK commented Sep 26, 2019

vmarkovtsev commented Sep 26, 2019

r0mainK commented Sep 26, 2019

r0mainK commented Sep 27, 2019

r0mainK commented Sep 27, 2019

r0mainK commented Sep 27, 2019

Fix how we handle the Names attribute #11

Fix how we handle the Names attribute #11

Comments

r0mainK commented Sep 26, 2019

vmarkovtsev commented Sep 26, 2019

r0mainK commented Sep 26, 2019

r0mainK commented Sep 27, 2019

r0mainK commented Sep 27, 2019

r0mainK commented Sep 27, 2019