-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details about the code in model.py #14
Comments
The Lets say we have some features and do the cumulative sum:
Now, the rank array (which tell us which feats correspond to each cell in a flattened indexing) is filtered using kept by checking if there are repeated ranks. When they are repeated, only the right-most is kept, since this one has the sum of the features that fall in it:
So as you can see, since cells with indexes 0 and 1, 3 and 4, 6 and 7 fall into the same cells (cells 0, 3 and 5 respectively), we only keep the position in the array which has the sum of all the features that fall within the cell, this is why it is called cumulative sum pooling. Having the features sum pooled, we just do the difference between the new feature tensor without the first features (since there aren't any before it) and itself removing the last position, which allows us to recover the real sum of the features easily:
Which if you do the sum of the features that fall into the same cell, you will find that they match with our result. I know the answer was a bit late to the question, but I hope it'll help others! |
@manueldiaz96 Well done! |
@manueldiaz96 , Wondering if get_geometry implementation is based on some formula which you can point out me? thanks! |
Take a look at this paper, where in equation 2 they describe how the projection is done from camera to 3D. In their case, they get the depth from a stereo depth estimation. For Lift Splat Shoot, as I responded you on issue #31 , we are predicting the certainty the network has that the pixel is located at plane D. For LSS, we do not have only one depth value, we have a set of depths from 4m to 45m separated by 1m, which we project the scaled context vector by the classification score for each depth. If you want to understand better how this works, modify the code in this line to multiply for a ones vector instead of If you want to better understand this, I would recommend you see how an image is formed in a camera, to see how the geometry works to take something in 3D to project it back to a 2D image. I linked also on issue #31 a series of blogposts which explain this using the camera matrices. |
Thanks a lot for sharing the code. You have done a great work!
I have some questions about your code: In the model.py file, can you provide more details about the get_geometry function and the voxel_pooling function? I'm so confused about how they actually work.
Thanks a lot!
The text was updated successfully, but these errors were encountered: