We also kindly invite you to our platform, Auto Driving Heart, for paper interpretation and sharing. If you would like to promote your work, please feel free to contact me.
GenAD: Generative End-to-End Autonomous Driving
CarFormer: Self-Driving with Learned Object-Centric Representations
DriveLM: Driving with Graph Visual Question Answering
ELM: Embodied Understanding of Driving Scenarios
Controllable Navigation Instruction Generation with Chain of Thought Prompting
- paper: coming soon
- code: https://github.com/refkxh/C-Instructor
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
- paper: coming soon
- code: coming soon
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
- paper: coming soon
- code: https://github.com/GradiusTwinbee/GLIS
Dolphins: Multimodal Language Model for Driving
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Pyramid Diffusion for Fine 3D Large Scene Generation
- paper: https://arxiv.org/pdf/2311.12085
- code: https://github.com/yuhengliu02/pyramid-discrete-diffusion
Fully Sparse 3D Occupancy Prediction
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Occupancy as Set of Points
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00730.pdf
- code: coming soon
OccWorld: 3D World Model for Autonomous Driving
Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
- paper: coming soon
- code: coming soon
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
ADMap: Anti-disturbance framework for reconstructing online vectorized HD map
- paper: coming soon
- code: https://github.com/hht1996ok/ADMap
Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
- paper: coming soon
- code: https://github.com/alfredgu001324/MapBEVPrediction
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
- paper: coming soon
- code: coming soon
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
- paper: https://arxiv.org/pdf/2407.02286
- code: https://github.com/engineerJPark/LiDAR-DataAug4Weather
R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
- paper: coming soon
- code: https://github.com/lxa9867/r2bench
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
- paper: coming soon
- code: https://github.com/lucifer443/RecurrentBEV
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
- paper: coming soon
- code: https://github.com/VisualAIKHU/MonoWAD
DualBEV: CNN is All You Need in View Transformation
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
- paper: coming soon
- code: https://github.com/AlmoonYsl/OPEN
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
- paper: coming soon
- code: coming soon
SEED: A Simple and Effective 3D DETR in Point Clouds
- paper: coming soon
- code: coming soon
Towards Stable 3D Object Detection
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
- paper: coming soon
- code: https://github.com/CocoBoom/fsd-bev
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
Interactive 3D Object Detection with Prompts
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02556.pdf
- code: coming soon
CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02623.pdf
- code: https://github.com/JinglinZhan/CSOT
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
- paper: https://arxiv.org/pdf/2407.08569
- code: coming soon
Enhancing Source-Free Domain Adaptive Object Detection with Low-Confidence Pseudo-Label Distillation
- paper: coming soon
- code: https://github.com/junia3/LPLD
Fully Test-Time Adaptation for Monocular 3D Object Detection
- paper: coming soon
- code: https://github.com/Hongbin98/MonoTTA
Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
- paper: coming soon
- code: https://github.com/im-djh/CMD
Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
- paper: coming soon
- code: https://github.com/luotianyou349/PnPDA
Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00560.pdf
- code: coming soon
Neural Volumetric World Models for Autonomous Driving
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf
- code: coming soon
4D Contrastive Superflows are Dense 3D Representation Learners
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
I Can't Believe It's Not Scene Flow!
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02757.pdf
- code: https://github.com/kylevedder/BucketedSceneFlowEval
T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
- paper: coming soon
- code: https://github.com/df-boy/T-CorresNet
Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
- paper: https://arxiv.org/pdf/2407.11569
- code: coming soon
RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
- paper: https://arxiv.org/pdf/2407.10159
- code: coming soon
ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation
- paper: https://arxiv.org/pdf/2407.07171
- code: coming soon
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02480.pdf
- code: https://github.com/engineerJPark/LiDARWeather
MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02968.pdf
- code: https://github.com/valeoai/TTYD
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
- paper: https://arxiv.org/pdf/2312.10217
- code: coming soon
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream
- paper: coming soon
- code: https://github.com/WU-CVGL/BeNeRF
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
- paper: coming soon
- code: https://github.com/Iris-cyy/SG-NeRF
Disentangled Generation and Aggregation for Robust Radiance Fields
- paper: coming soon
- code: https://github.com/GaoHchen/Robust-Triplane
RPBG: Towards Robust Neural Point-based Graphics in the Wild
- paper: https://arxiv.org/pdf/2405.05663
- code: coming soon
Beyond MOT: Semantic Multi-Object Tracking
- paper: coming soon
- code: https://github.com/HengLan/SMOT
3D Single-object Tracking in Point Clouds with High Temporal Variation
- paper: https://arxiv.org/pdf/2408.02049
- code: coming soon
OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01174.pdf
- code: coming soon
Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01205.pdf
- code: https://github.com/mattiasegu/walker
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01900.pdf
- code: coming soon
OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
RoadPainter: Points Are Ideal Navigators for Topology transformER
- paper: https://arxiv.org/pdf/2407.15349
- code: coming soon
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving
- paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02087.pdf
- code: coming soon
Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
- paper: coming soon
- code: https://github.com/zhyever/PatchRefiner
ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
M2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
- paper: coming soon
- code: https://github.com/IRMVLab/DVLO
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
This list of papers is primarily curated by Rujia Wang.
If you have any questions about the paper list, please do not hesitate to email me and [Auto Driving Heart Team] or open an issue on GitHub.