-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about create_frustum and voxel_pooling #36
Comments
I am also confused when understanding function get_geometry. It says "Determine the (x,y,z) locations (in the ego frame)"; however, the output dimension is still B x N x D x H/downsample x W/downsample x 3. I assume X Y and Z are matched to H/downsample W/downsample and D in this case? Again, I am wondering what does this 3 stand for? |
It is the tuple (x,y,z) that indicates the 3D coordinates of the point, you can see this since all the Z values for any given D are the same. This is because LSS tries to learn where the objects are using depth planes.
For
Because they are used to find the proper xyz coordinates for each pixel in each depth plane D. They aren't used for anything else, if I am not mistaken.
You get rid of Z by performing the sum pooling, which takes all points in a voxel (discretization of 3D space) of infinite height, and then add them all together. Therefore, summing all features that may appear in the same cell in the BEV, where you cannot distinguish their Z component.
No, XYZ are just a 3D vector that is assigned to each pixel (which has coordinates DHW), doing this is how you manage to associate each pixel in the features to their projection in 3D |
Hi, thanks for your excellent work! I am a little bit confused about functions create_frustum and voxel_pooling. It will be great if you can give some further explanations.
In create_frustum, the code indicates that the output dimension is D x H x W x 3, I am wondering what is this 3 represents for? Is it RGB value? Or is it the coordinate position of point in frustum? I am also wondering whether the input to this function is raw image or extracted feature?
For voxel_pooling, what I understand is that it sums up the features of all the points in a same voxel(pillar) using cumsum trick. The dimension of output in this function is B x C x Z x X x Y, where X Y and Z are the coordinates in the BEV field(which are not the same with H W and D). However, in the paper it says "perform sum pooling to create a CxHxW tensor" which really confused me. Why are we still want H and W here? Besides, I am wondering how you get rid of Z?
The text was updated successfully, but these errors were encountered: