| Date | Title | Authors | Code | Comments | |
|---|---|---|---|---|---|
| 2026-3-28 | Unleashing the Power of Chain-of-Prediction for Monocular 3D Object Detection | Zhihao Zhang et.al | paper | - | <summary>detail</summary>Journal ref:CVPR 2026 |
| 2026-3-27 | Towards Intrinsic-Aware Monocular 3D Object Detection | Zhihao Zhang et.al | paper | - | <summary>detail</summary>This paper is accepted by CVPR 2026 |
| 2026-3-10 | SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection | Yifan Wang et.al | paper | - | <summary>detail</summary>Accepted by CVPR 2026 |
| 2026-3-10 | SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding | Xuemei Chen et.al | paper | - | - |
| 2026-3-8 | Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection | Rui Ding et.al | paper | - | - |
| 2026-2-24 | Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection | Zhaonian Kuang et.al | paper | - | <summary>detail</summary>IJCV |
| 2026-1-2 | Mono3DV: Monocular 3D Object Detection with 3D-Aware Bipartite Matching and Variational Query DeNoising | Kiet Dang Vu et.al | paper | - | - |
| 2025-11-25 | Open Vocabulary Monocular 3D Object Detection | Jin Yao et.al | paper | code | <summary>detail</summary>3DV 2026 |
| 2025-11-17 | Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection | Soyul Lee et.al | paper | - | <summary>detail</summary>AAAI 2026 accepted |
| 2025-11-14 | Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection | Yifan Wang et.al | paper | - | - |
| 2025-11-11 | MonoCLUE : Object-Aware Clustering Enhances Monocular 3D Object Detection | Sunghun Yang et.al | paper | - | <summary>detail</summary>AAAI 2026 |
| 2025-11-8 | RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection | Xiaokai Bai et.al | paper | - | - |
| 2025-9-7 | S-LAM3D: Segmentation-Guided Monocular 3D Object Detection via Feature Space Fusion | Diana-Alexandra Sas et.al | paper | - | - |
| 2025-9-5 | 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection | Yung-Hsu Yang et.al | paper | - | <summary>detail</summary>ICCV 2025 |
| 2025-8-28 | Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts | Zixuan Hu et.al | paper | - | <summary>detail</summary>Accepted by ICCV 2025 (Highlight) |
| 2025-8-27 | Generalizing Monocular 3D Object Detection | Abhinav Kumar et.al | paper | - | <summary>detail</summary>PhD Thesis submitted to MSU |
| 2025-7-3 | PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection | Seokyeong Lee et.al | paper | - | - |
| 2025-6-14 | MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation | Kiet Dang Vu et.al | paper | - | - |
| 2025-4-25 | LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring | Raul David Dominguez Sanchez et.al | paper | - | <summary>detail</summary>Accepted for the Data-Driven Learning for Intelligent Vehicle Applications Workshop at the 36th IEEE Intelligent Vehicles Symposium (IV) 2025 |
| 2025-4-10 | MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection | Rishubh Parihar et.al | paper | code | <summary>detail</summary>CVPR 2025 Camera Ready |
| Date | Title | Authors | Code | Comments | |
|---|---|---|---|---|---|
| 2026-3-31 | MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation | Changli Wu et.al | paper | code | <summary>detail</summary>CVPR 2026 |
| 2026-3-18 | OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms | Zhongyuang Liu et.al | paper | - | - |
| 2026-3-9 | UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing | Jiaxi Zhang et.al | paper | - | - |
| 2026-2-19 | JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments | Zhan Liu et.al | paper | - | - |
| 2026-2-3 | Z3D: Zero-Shot 3D Visual Grounding from Images | Nikita Drozdov et.al | paper | code | - |
| 2026-1-30 | Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation | Di Zhang et.al | paper | - | - |
| 2026-1-13 | Reasoning Matters for 3D Visual Grounding | Hsiang-Wei Huang et.al | paper | - | <summary>detail</summary>2025 CVPR Workshop on 3D-LLM/VLA: Bridging Language |
| 2025-12-31 | OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding | Wenyuan Huang et.al | paper | code | - |
| 2025-12-30 | MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding | Panquan Yang et.al | paper | - | - |
| 2025-12-28 | UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer | Tianchen Deng et.al | paper | code | - |
| 2025-12-23 | PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding | Seongmin Jung et.al | paper | - | - |
| 2025-12-9 | View-on-Graph: Zero-shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs | Yuanyuan Liu et.al | paper | - | - |
| 2025-11-30 | S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance | Beining Xu et.al | paper | - | - |
| 2025-11-10 | Mono3DVG-EnSD: Enhanced Spatial-aware and Dimension-decoupled Text Encoding for Monocular 3D Visual Grounding | Yuzhen Li et.al | paper | - | - |
| 2025-10-27 | From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Tianxu Wang et.al | paper | code | <summary>detail</summary>Update v3 of the NeurIPS 2025 Datasets and Benchmarks paper (v2) |
| 2025-10-16 | ChangingGrounding: 3D Visual Grounding in Changing Scenes | Miao Hu et.al | paper | code | - |
| 2025-10-13 | DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding | Qinghongbing Xie et.al | paper | code | - |
| 2025-9-19 | Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval | Liwei Liao et.al | paper | code | - |
| 2025-9-12 | Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving | Runwei Guan et.al | paper | code | - |
| 2025-9-4 | TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP | Fan Li et.al | paper | - | - |