-
-
Notifications
You must be signed in to change notification settings - Fork 399
/
Copy pathawesome_3dgs_papers.yaml
14645 lines (14516 loc) · 840 KB
/
awesome_3dgs_papers.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
- id: shi2025sketch
title: 'Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes'
authors: Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang
Ooi
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising representation
for photorealistic rendering of 3D scenes. However, its high storage requirements
pose significant challenges for practical applications. We observe that Gaussians
exhibit distinct roles and characteristics that are analogous to traditional artistic
techniques -- Like how artists first sketch outlines before filling in broader
areas with color, some Gaussians capture high-frequency features like edges and
contours; While other Gaussians represent broader, smoother regions, that are
analogous to broader brush strokes that add volume and depth to a painting. Based
on this observation, we propose a novel hybrid representation that categorizes
Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch
Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded
using parametric models, leveraging their geometric coherence, while Patch Gaussians
undergo optimized pruning, retraining, and vector quantization to maintain volumetric
consistency and storage efficiency. Our comprehensive evaluation across diverse
indoor and outdoor scenes demonstrates that this structure-aware approach achieves
up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent
model sizes, and correspondingly, for an indoor scene, our model maintains the
visual quality with 2.3% of the original model size.
'
project_page: null
paper: https://arxiv.org/pdf/2501.13045.pdf
code: null
video: null
tags:
- Densification
thumbnail: assets/thumbnails/shi2025sketch.jpg
publication_date: '2025-01-22T17:52:45+00:00'
date_source: arxiv
- id: arunan2025darbsplatting
title: 'DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial
Basis Functions'
authors: Vishagar Arunan, Saeedha Nazar, Hashiru Pramuditha, Vinasirajan Viruthshaan,
Sameera Ramasinghe, Simon Lucey, Ranga Rodrigo
year: '2025'
abstract: 'Splatting-based 3D reconstruction methods have gained popularity with
the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel
views. These methods commonly resort to using exponential family functions, such
as the Gaussian function, as reconstruction kernels due to their anisotropic nature,
ease of projection, and differentiability in rasterization. However, the field
remains restricted to variations within the exponential family, leaving generalized
reconstruction kernels largely underexplored, partly due to the lack of easy integrability
in 3D to 2D projections. In this light, we show that a class of decaying anisotropic
radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis
distance, supports splatting by approximating the Gaussian function''s closed-form
integration advantage. With this fresh perspective, we demonstrate up to 34% faster
convergence during training and a 15% reduction in memory consumption across various
DARB reconstruction kernels, while maintaining comparable PSNR, SSIM, and LPIPS
results. We will make the code available.
'
project_page: https://randomnerds.github.io/darbs.github.io/
paper: https://arxiv.org/pdf/2501.12369.pdf
code: null
video: null
tags:
- Project
- Rendering
thumbnail: assets/thumbnails/arunan2025darbsplatting.jpg
publication_date: '2025-01-21T18:49:06+00:00'
date_source: arxiv
- id: chen2025hac
title: 'HAC++: Towards 100X Compression of 3D Gaussian Splatting'
authors: Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has emerged as a promising framework for
novel view synthesis, boasting rapid rendering speed with high fidelity. However,
the substantial Gaussians and their associated attributes necessitate effective
compression techniques. Nevertheless, the sparse and unorganized nature of the
point cloud of Gaussians (or anchors in our paper) presents challenges for compression.
To achieve a compact size, we propose HAC++, which leverages the relationships
between unorganized anchors and a structured hash grid, utilizing their mutual
information for context modeling. Additionally, HAC++ captures intra-anchor contextual
relationships to further enhance compression performance. To facilitate entropy
coding, we utilize Gaussian distributions to precisely estimate the probability
of each quantized attribute, where an adaptive quantization module is proposed
to enable high-precision quantization of these attributes for improved fidelity
restoration. Moreover, we incorporate an adaptive masking strategy to eliminate
invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction
of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously
improving fidelity. It also delivers more than 20X size reduction compared to
Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.
'
project_page: https://yihangchen-ee.github.io/project_hac++/
paper: https://arxiv.org/pdf/2501.12255.pdf
code: https://github.com/YihangChen-ee/HAC-plus
video: null
tags:
- Code
- Compression
- Project
thumbnail: assets/thumbnails/chen2025hac.jpg
publication_date: '2025-01-21T16:23:05+00:00'
date_source: arxiv
- id: li2025cargs
title: 'Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car
Reconstruction'
authors: Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang,
Tongyi Cao
year: '2025'
abstract: '3D car modeling is crucial for applications in autonomous driving systems,
virtual and augmented reality, and gaming. However, due to the distinctive properties
of cars, such as highly reflective and transparent surface materials, existing
methods often struggle to achieve accurate 3D car reconstruction.To address these
limitations, we propose Car-GS, a novel approach designed to mitigate the effects
of specular highlights and the coupling of RGB and geometry in 3D geometric and
shading reconstruction (3DGS). Our method incorporates three key innovations:
First, we introduce view-dependent Gaussian primitives to effectively model surface
reflections. Second, we identify the limitations of using a shared opacity parameter
for both image rendering and geometric attributes when modeling transparent objects.
To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian
primitive, dedicated solely to rendering depth and normals. Third, we observe
that reconstruction errors are most prominent when the camera view is nearly orthogonal
to glass surfaces. To address this issue, we develop a quality-aware supervision
module that adaptively leverages normal priors from a pre-trained large-scale
normal model.Experimental results demonstrate that Car-GS achieves precise reconstruction
of car surfaces and significantly outperforms prior methods. The project page
is available at https://lcc815.github.io/Car-GS.
'
project_page: null
paper: https://arxiv.org/pdf/2501.11020.pdf
code: https://lcc815.github.io/Car-GS/
video: null
tags:
- Code
- Meshing
- Rendering
thumbnail: assets/thumbnails/li2025cargs.jpg
publication_date: '2025-01-19T11:49:35+00:00'
date_source: arxiv
- id: zheng2025gstar
title: 'GSTAR: Gaussian Surface Tracking and Reconstruction'
authors: Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song
year: '2025'
abstract: '3D Gaussian Splatting techniques have enabled efficient photo-realistic
rendering of static scenes. Recent works have extended these approaches to support
surface reconstruction and tracking. However, tracking dynamic surfaces with 3D
Gaussians remains challenging due to complex topology changes, such as surfaces
appearing, disappearing, or splitting. To address these challenges, we propose
GSTAR, a novel method that achieves photo-realistic rendering, accurate surface
reconstruction, and reliable 3D tracking for general dynamic scenes with changing
topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces
to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains
the mesh topology and tracks the meshes using Gaussians. In regions where topology
changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration
and the generation of new surfaces based on these optimized Gaussians. Additionally,
we introduce a surface-based scene flow method that provides robust initialization
for tracking between frames. Experiments demonstrate that our method effectively
tracks and reconstructs dynamic surfaces, enabling a range of applications. Our
project page with the code release is available at https://eth-ait.github.io/GSTAR/.
'
project_page: chengwei-zheng.github.io/GSTAR/
paper: https://arxiv.org/pdf/2501.10283.pdf
code: null
video: https://www.youtube.com/watch?v=Fwby4PrjFeM
tags:
- Avatar
- Dynamic
- Meshing
- Project
- Video
thumbnail: assets/thumbnails/zheng2025gstar.jpg
publication_date: '2025-01-17T16:26:24+00:00'
date_source: arxiv
- id: ma2025cityloc
title: 'CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with
Gaussian Representation'
authors: Qi Ma, Runyi Yang, Bin Ren, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel
year: '2025'
abstract: 'Localizing text descriptions in large-scale 3D scenes is inherently an
ambiguous task. This nonetheless arises while describing general concepts, e.g.
all traffic lights in a city. To facilitate reasoning based on such concepts,
text localization in the form of distribution is required. In this paper, we generate
the distribution of the camera poses conditioned upon the textual description.
To facilitate such generation, we propose a diffusion-based architecture that
conditionally diffuses the noisy 6DoF camera poses to their plausible locations.
The conditional signals are derived from the text descriptions, using the pre-trained
text encoders. The connection between text descriptions and pose distribution
is established through pretrained Vision-Language-Model, i.e. CLIP. Furthermore,
we demonstrate that the candidate poses for the distribution can be further refined
by rendering potential poses using 3D Gaussian splatting, guiding incorrectly
posed samples towards locations that better align with the textual description,
through visual reasoning. We demonstrate the effectiveness of our method by
comparing it with both standard retrieval methods and learning-based approaches.
Our proposed method consistently outperforms these baselines across all five large-scale
datasets. Our source code and dataset will be made publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08982.pdf
code: null
video: null
tags:
- Language Embedding
- Large-Scale
thumbnail: assets/thumbnails/ma2025cityloc.jpg
publication_date: '2025-01-15T17:59:32+00:00'
date_source: arxiv
- id: hong2025gslivo
title: 'GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry
with Gaussian Mapping'
authors: Sheng Hong, Chunran Zheng, Yishu Shen, Changze Li, Fu Zhang, Tong Qin,
Shaojie Shen
year: '2025'
abstract: 'In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel
scene representation approach. However, existing vision-only 3D-GS methods often
rely on hand-crafted heuristics for point-cloud densification and face challenges
in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual
(LIV) sensor configuration has demonstrated superior performance in localization
and dense mapping by leveraging complementary sensing characteristics: rich texture
information from cameras, precise geometric measurements from LiDAR, and high-frequency
motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based
simultaneous localization and mapping (SLAM) system. Our map system comprises
a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based
odometry. The global Gaussian map consists of hash-indexed voxels organized in
a recursive octree, effectively covering sparse spatial volumes while adapting
to different levels of detail and scales. The Gaussian map is initialized through
multi-sensor fusion and optimized with photometric gradients. Our system incrementally
maintains a sliding window of Gaussians, significantly reducing GPU computation
and memory consumption by only optimizing the map within the sliding window. Moreover,
we implement a tightly coupled multi-sensor fusion odometry with an iterative
error state Kalman filter (IESKF), leveraging real-time updating and rendering
of the Gaussian map. Our system represents the first real-time Gaussian-based
SLAM framework deployable on resource-constrained embedded systems, demonstrated
on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance
while maintaining robust multi-sensor fusion capabilities. All implementation
algorithms, hardware designs, and CAD models will be publicly available.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08672.pdf
code: null
video: null
tags:
- Large-Scale
- Lidar
thumbnail: assets/thumbnails/hong2025gslivo.jpg
publication_date: '2025-01-15T09:04:56+00:00'
date_source: arxiv
- id: wu2025vingsmono
title: 'VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes'
authors: Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding
year: '2025'
abstract: 'VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework
designed for large scenes. The framework comprises four main components: VIO Front
End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End,
RGB frames are processed through dense bundle adjustment and uncertainty estimation
to extract scene geometry and poses. Based on this output, the mapping module
incrementally constructs and maintains a 2D Gaussian map. Key components of the
2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement,
which collectively improve mapping speed and localization accuracy. This enables
the SLAM system to handle large-scale urban environments with up to 50 million
Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design
a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS)
capabilities of Gaussian Splatting for loop closure detection and correction of
the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable
presence of dynamic objects in real-world outdoor scenes. Extensive evaluations
in indoor and outdoor environments demonstrate that our approach achieves localization
performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF
SLAM methods. It also significantly outperforms all existing methods in terms
of mapping and rendering quality. Furthermore, we developed a mobile app and verified
that our framework can generate high-quality Gaussian maps in real time using
only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge,
VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in
outdoor environments and supporting kilometer-scale large scenes.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08286.pdf
code: null
video: null
tags:
- Large-Scale
- Meshing
- SLAM
thumbnail: assets/thumbnails/wu2025vingsmono.jpg
publication_date: '2025-01-14T18:01:15+00:00'
date_source: arxiv
- id: rogge2025objectcentric
title: 'Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware
Pruning for Compact Object Models'
authors: Marcel Rogge, Didier Stricker
year: '2025'
abstract: 'Current Gaussian Splatting approaches are effective for reconstructing
entire scenes but lack the option to target specific objects, making them computationally
expensive and unsuitable for object-specific applications. We propose a novel
approach that leverages object masks to enable targeted reconstruction, resulting
in object-centric models. Additionally, we introduce an occlusion-aware pruning
strategy to minimize the number of Gaussians without compromising quality. Our
method reconstructs compact object models, yielding object-centric Gaussian and
mesh representations that are up to 96\% smaller and up to 71\% faster to train
compared to the baseline while retaining competitive quality. These representations
are immediately usable for downstream applications such as appearance editing
and physics simulation without additional processing.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08174.pdf
code: null
video: null
tags:
- Compression
- Densification
- Editing
thumbnail: assets/thumbnails/rogge2025objectcentric.jpg
publication_date: '2025-01-14T14:56:31+00:00'
date_source: arxiv
- id: liu2025uncommon
title: UnCommon Objects in 3D
authors: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos
Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea
Vedaldi, Roman Shapovalov, David Novotny
year: '2025'
abstract: 'We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset
for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available
collection of high-resolution videos of objects with 3D annotations that ensures
full-360$^{\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet
and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality,
due to extensive quality checks of both the collected videos and the 3D annotations.
Similar to analogous datasets, uCO3D contains annotations for 3D camera poses,
depth maps and sparse point clouds. In addition, each object is equipped with
a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models
on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing
that uCO3D is better for learning applications.
'
project_page: https://uco3d.github.io/
paper: https://arxiv.org/pdf/2501.07574.pdf
code: https://github.com/facebookresearch/uco3d
video: null
tags:
- Code
- Project
thumbnail: assets/thumbnails/liu2025uncommon.jpg
publication_date: '2025-01-13T18:59:20+00:00'
date_source: arxiv
- id: stuart20253dgstopc
title: '3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud
or Mesh'
authors: Lewis A G Stuart, Michael P Pound
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions,
but these scenes often require specialised renderers for effective visualisation.
In contrast, point clouds are a widely used 3D representation and are compatible
with most popular 3D processing software, yet converting 3DGS scenes into point
clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible
and highly customisable framework that is capable of transforming 3DGS scenes
into dense, high-accuracy point clouds. We sample points probabilistically from
each Gaussian as a 3D density function. We additionally threshold new points using
the Mahalanobis distance to the Gaussian centre, preventing extreme outliers.
The result is a point cloud that closely represents the shape encoded into the
3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours
depending on view, and each point may contribute only subtle colour hints to the
resulting rendered scene. To avoid spurious or incorrect colours that do not fit
with the final point cloud, we recalculate Gaussian colours via a customised image
rendering approach, assigning each Gaussian the colour of the pixel to which it
contributes most across all views. 3DGS-to-PC also supports mesh generation through
Poisson Surface Reconstruction, applied to points sampled from predicted surface
Gaussians. This allows coloured meshes to be generated from 3DGS scenes without
the need for re-training. This package is highly customisable and capability of
simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful
tool for converting 3DGS data into point cloud and surface-based formats.
'
project_page: null
paper: https://arxiv.org/pdf/2501.07478.pdf
code: https://github.com/Lewis-Stuart-11/3DGS-to-PC
video: null
tags:
- Code
- Point Cloud
thumbnail: assets/thumbnails/stuart20253dgstopc.jpg
publication_date: '2025-01-13T16:52:28+00:00'
date_source: arxiv
- id: zhang2025evaluating
title: 'Evaluating Human Perception of Novel View Synthesis: Subjective Quality
Assessment of Gaussian Splatting and NeRF in Dynamic Scenes'
authors: Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian,
Lu Zhang
year: '2025'
abstract: 'Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking
technologies that have revolutionized the field of Novel View Synthesis (NVS),
enabling immersive photorealistic rendering and user experiences by synthesizing
multiple viewpoints from a set of images of sparse views. The potential applications
of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling,
and realistic medical organ imaging, underscore the importance of quality assessment
of NVS methods from the perspective of human perception. Although some previous
studies have explored subjective quality assessments for NVS technology, they
still face several challenges, especially in NVS methods selection, scenario coverage,
and evaluation methodology. To address these challenges, we conducted two subjective
experiments for the quality assessment of NVS technologies containing both GS-based
and NeRF-based methods, focusing on dynamic and real-world scenes. This study
covers 360{\deg}, front-facing, and single-viewpoint videos while providing a
richer and greater number of real scenes. Meanwhile, it''s the first time to explore
the impact of NVS methods in dynamic scenes with moving objects. The two types
of subjective experiments help to fully comprehend the influences of different
viewing paths from a human perception perspective and pave the way for future
development of full-reference and no-reference quality metrics. In addition, we
established a comprehensive benchmark of various state-of-the-art objective metrics
on the proposed database, highlighting that existing methods still struggle to
accurately capture subjective quality. The results give us some insights into
the limitations of existing NVS methods and may promote the development of new
NVS methods.
'
project_page: null
paper: https://arxiv.org/pdf/2501.08072.pdf
code: null
video: null
tags:
- Dynamic
thumbnail: assets/thumbnails/zhang2025evaluating.jpg
publication_date: '2025-01-13T10:01:27+00:00'
date_source: arxiv
- id: peng2025rmavatar
title: 'RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video
Based on Rectified Mesh-embedded Gaussians'
authors: Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong
Yang, Xiao Dong
year: '2025'
abstract: 'We introduce RMAvatar, a novel human avatar representation with Gaussian
splatting embedded on mesh to learn clothed avatar from a monocular video. We
utilize the explicit mesh geometry to represent motion and shape of a virtual
human and implicit appearance rendering with Gaussian Splatting. Our method consists
of two main modules: Gaussian initialization module and Gaussian rectification
module. We embed Gaussians into triangular faces and control their motion through
the mesh, which ensures low-frequency motion and surface deformation of the avatar.
Due to the limitations of LBS formula, the human skeleton is hard to control complex
non-rigid transformations. We then design a pose-related Gaussian rectification
module to learn fine-detailed non-rigid deformations, further improving the realism
and expressiveness of the avatar. We conduct extensive experiments on public datasets,
RMAvatar shows state-of-the-art performance on both rendering quality and quantitative
evaluations. Please see our project page at https://rm-avatar.github.io.
'
project_page: https://rm-avatar.github.io/
paper: https://arxiv.org/pdf/2501.07104.pdf
code: https://github.com/RMAvatar/RMAvatar
video: null
tags:
- Avatar
- Code
- Dynamic
- Meshing
- Monocular
- Project
thumbnail: assets/thumbnails/peng2025rmavatar.jpg
publication_date: '2025-01-13T07:32:44+00:00'
date_source: arxiv
- id: zielonka2025synthetic
title: Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
authors: Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas,
Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
year: '2025'
abstract: 'We present SynShot, a novel method for the few-shot inversion of a drivable
head avatar based on a synthetic prior. We tackle two major challenges. First,
training a controllable 3D generative network requires a large number of diverse
sequences, for which pairs of images and high-quality tracked meshes are not always
available. Second, state-of-the-art monocular avatar models struggle to generalize
to new views and expressions, lacking a strong prior and often overfitting to
a specific viewpoint distribution. Inspired by machine learning models trained
solely on synthetic data, we propose a method that learns a prior model from a
large dataset of synthetic heads with diverse identities, expressions, and viewpoints.
With few input images, SynShot fine-tunes the pretrained synthetic prior to bridge
the domain gap, modeling a photorealistic head avatar that generalizes to novel
expressions and viewpoints. We model the head avatar using 3D Gaussian splatting
and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture
space. To account for the different modeling complexities over parts of the head
(e.g., skin vs hair), we embed the prior with explicit control for upsampling
the number of per-part primitives. Compared to state-of-the-art monocular methods
that require thousands of real training images, SynShot significantly improves
novel view and expression synthesis.
'
project_page: https://zielon.github.io/synshot/
paper: https://arxiv.org/pdf/2501.06903.pdf
code: null
video: https://www.youtube.com/watch?v=4KQQatkaSgc
tags:
- Avatar
- Dynamic
- Project
- Sparse
- Video
thumbnail: assets/thumbnails/zielonka2025synthetic.jpg
publication_date: '2025-01-12T19:01:05+00:00'
date_source: arxiv
- id: chen2025generalized
title: Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
authors: Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang
year: '2025'
abstract: 'Equipped with the continuous representation capability of Multi-Layer
Perceptron (MLP), Implicit Neural Representation (INR) has been successfully employed
for Arbitrary-scale Super-Resolution (ASR). However, the limited receptive field
of the linear layers in MLP restricts the representation capability of INR, while
it is computationally expensive to query the MLP numerous times to render each
pixel. Recently, Gaussian Splatting (GS) has shown its advantages over INR in
both visual quality and rendering speed in 3D tasks, which motivates us to explore
whether GS can be employed for the ASR task. However, directly applying GS to
ASR is exceptionally challenging because the original GS is an optimization-based
method through overfitting each single scene, while in ASR we aim to learn a single
model that can generalize to different images and scaling factors. We overcome
these challenges by developing two novel techniques. Firstly, to generalize GS
for ASR, we elaborately design an architecture to predict the corresponding image-conditioned
Gaussians of the input low-resolution image in a feed-forward manner. Secondly,
we implement an efficient differentiable 2D GPU/CUDA-based scale-aware rasterization
to render super-resolved images by sampling discrete RGB values from the predicted
contiguous Gaussians. Via end-to-end training, our optimized network, namely GSASR,
can perform ASR for any image and unseen scaling factors. Extensive experiments
validate the effectiveness of our proposed method. The project page can be found
at \url{https://mt-cly.github.io/GSASR.github.io/}.
'
project_page: https://mt-cly.github.io/GSASR.github.io/
paper: https://arxiv.org/pdf/2501.06838.pdf
code: null
video: null
tags:
- Project
- Super Resolution
thumbnail: assets/thumbnails/chen2025generalized.jpg
publication_date: '2025-01-12T15:14:58+00:00'
date_source: arxiv
- id: wang2025f3dgaus
title: 'F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent
Gaussian Splatting'
authors: Yuxin Wang, Qianyi Wu, Dan Xu
year: '2025'
abstract: 'This paper tackles the problem of generalizable 3D-aware generation from
monocular datasets, e.g., ImageNet. The key challenge of this task is learning
a robust 3D-aware representation without multi-view or dynamic data, while ensuring
consistent texture and geometry across different viewpoints. Although some baseline
methods are capable of 3D-aware generation, the quality of the generated images
still lags behind state-of-the-art 2D generation approaches, which excel in producing
high-quality, detailed images. To address this severe limitation, we propose a
novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined
as F3D-Gaus, which can produce more realistic and reliable 3D renderings from
monocular inputs. In addition, we introduce a self-supervised cycle-consistent
constraint to enforce cross-view consistency in the learned 3D representation.
This training strategy naturally allows aggregation of multiple aligned Gaussian
primitives and significantly alleviates the interpolation limitations inherent
in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video
model priors to perform geometry-aware refinement, enhancing the generation of
fine details in wide-viewpoint scenarios and improving the model''s capability
to capture intricate 3D textures. Extensive experiments demonstrate that our approach
not only achieves high-quality, multi-view consistent 3D-aware generation from
monocular datasets, but also significantly improves training and inference efficiency.
'
project_page: https://arxiv.org/abs/2501.06714
paper: https://arxiv.org/pdf/2501.06714.pdf
code: https://github.com/W-Ted/F3D-Gaus
video: null
tags:
- Code
- Feed-Forward
- Monocular
- Project
thumbnail: assets/thumbnails/wang2025f3dgaus.jpg
publication_date: '2025-01-12T04:44:44+00:00'
date_source: arxiv
- id: asim2025met3r
title: 'MEt3R: Measuring Multi-View Consistency in Generated Images'
authors: Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric
Lenssen
year: '2025'
abstract: 'We introduce MEt3R, a metric for multi-view consistency in generated
images. Large-scale generative models for multi-view image generation are rapidly
advancing the field of 3D inference from sparse observations. However, due to
the nature of generative modeling, traditional reconstruction metrics are not
suitable to measure the quality of generated outputs and metrics that are independent
of the sampling procedure are desperately needed. In this work, we specifically
address the aspect of consistency between generated multi-view images, which can
be evaluated independently of the specific scene. Our approach uses DUSt3R to
obtain dense 3D reconstructions from image pairs in a feed-forward manner, which
are used to warp image contents from one view into the other. Then, feature maps
of these images are compared to obtain a similarity score that is invariant to
view-dependent effects. Using MEt3R, we evaluate the consistency of a large set
of previous methods for novel view and video generation, including our open, multi-view
latent diffusion model.
'
project_page: https://geometric-rl.mpi-inf.mpg.de/met3r/
paper: https://arxiv.org/pdf/2501.06336.pdf
code: https://github.com/mohammadasim98/MEt3R
video: https://geometric-rl.mpi-inf.mpg.de/met3r/static/videos/teaser.mp4
tags:
- 3ster-based
- Code
- Diffusion
- Project
- Video
thumbnail: assets/thumbnails/asim2025met3r.jpg
publication_date: '2025-01-10T20:43:33+00:00'
date_source: arxiv
- id: shin2025localityaware
title: Locality-aware Gaussian Compression for Fast and High-quality Rendering
authors: Seungjoo Shin, Jaesik Park, Sunghyun Cho
year: '2025'
abstract: 'We present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework
that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric
scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes,
and propose a novel locality-aware 3D Gaussian representation that effectively
encodes locally-coherent Gaussian attributes using a neural field representation
with a minimal storage requirement. On top of the novel representation, LocoGS
is carefully designed with additional components such as dense initialization,
an adaptive spherical harmonics bandwidth scheme and different encoding schemes
for different Gaussian attributes to maximize compression performance. Experimental
results demonstrate that our approach outperforms the rendering quality of existing
compact Gaussian representations for representative real-world 3D datasets while
achieving from 54.6$\times$ to 96.6$\times$ compressed storage size and from 2.1$\times$
to 2.4$\times$ rendering speed than 3DGS. Even our approach also demonstrates
an averaged 2.4$\times$ higher rendering speed than the state-of-the-art compression
method with comparable compression performance.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05757.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/shin2025localityaware.jpg
publication_date: '2025-01-10T07:19:41+00:00'
date_source: arxiv
- id: yan2025consistent
title: Consistent Flow Distillation for Text-to-3D Generation
authors: Runjie Yan, Yinbo Chen, Xiaolong Wang
year: '2025'
abstract: 'Score Distillation Sampling (SDS) has made significant strides in distilling
image-generative models for 3D generation. However, its maximum-likelihood-seeking
behavior often leads to degraded visual quality and diversity, limiting its effectiveness
in 3D applications. In this work, we propose Consistent Flow Distillation (CFD),
which addresses these limitations. We begin by leveraging the gradient of the
diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based
sampling perspective, we find that the consistency of 2D image flows across different
viewpoints is important for high-quality 3D generation. To achieve this, we introduce
multi-view consistent Gaussian noise on the 3D object, which can be rendered from
various viewpoints to compute the flow gradient. Our experiments demonstrate that
CFD, through consistent flows, significantly outperforms previous methods in text-to-3D
generation.
'
project_page: https://runjie-yan.github.io/cfd/
paper: https://arxiv.org/pdf/2501.05445.pdf
code: https://github.com/runjie-yan/ConsistentFlowDistillation
video: null
tags:
- Code
- Diffusion
- Project
thumbnail: assets/thumbnails/yan2025consistent.jpg
publication_date: '2025-01-09T18:56:05+00:00'
date_source: arxiv
- id: meng2025zero1tog
title: 'Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation'
authors: Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie
Liu
year: '2025'
abstract: 'Recent advances in 2D image generation have achieved remarkable quality,largely
driven by the capacity of diffusion models and the availability of large-scale
datasets. However, direct 3D generation is still constrained by the scarcity and
lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel
approach that addresses this problem by enabling direct single-view generation
on Gaussian splats using pretrained 2D diffusion models. Our key insight is that
Gaussian splats, a 3D representation, can be decomposed into multi-view images
encoding different attributes. This reframes the challenging task of direct 3D
generation within a 2D diffusion framework, allowing us to leverage the rich priors
of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view
and cross-attribute attention layers, which capture complex correlations and enforce
3D consistency across generated splats. This makes Zero-1-to-G the first direct
image-to-3D generative model to effectively utilize pretrained 2D diffusion priors,
enabling efficient training and improved generalization to unseen objects. Extensive
experiments on both synthetic and in-the-wild datasets demonstrate superior performance
in 3D object generation, offering a new approach to high-quality 3D generation.
'
project_page: https://mengxuyigit.github.io/projects/zero-1-to-G/
paper: https://arxiv.org/pdf/2501.05427.pdf
code: null
video: null
tags:
- Diffusion
- Project
thumbnail: assets/thumbnails/meng2025zero1tog.jpg
publication_date: '2025-01-09T18:37:35+00:00'
date_source: arxiv
- id: gerogiannis2025arc2avatar
title: 'Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID
Guidance'
authors: Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros
Potamias, Alexandros Lattas, Stefanos Zafeiriou
year: '2025'
abstract: 'Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing
detailed 3D scenes within multi-view setups and the emergence of large 2D human
foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing
a human face foundation model as guidance with just a single image as input. To
achieve that, we extend such a model for diverse-view human head generation by
fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain
a dense correspondence with a human face mesh template, allowing blendshape-based
expression generation. This is achieved through a modified 3DGS approach, connectivity
regularizers, and a strategic initialization tailored for our task. Additionally,
we propose an optional efficient SDS-based correction step to refine the blendshape
expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar
achieves state-of-the-art realism and identity preservation, effectively addressing
color issues by allowing the use of very low guidance, enabled by our strong identity
prior and initialization strategy, without compromising detail.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05379.pdf
code: null
video: null
tags:
- Avatar
- Diffusion
thumbnail: assets/thumbnails/gerogiannis2025arc2avatar.jpg
publication_date: '2025-01-09T17:04:33+00:00'
date_source: arxiv
- id: tianci2025scaffoldslam
title: 'Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and
Photorealistic Mapping'
authors: Wen Tianci, Liu Zhiang, Lu Biao, Fang Yongchun
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis
in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods
utilizing 3DGS have failed to provide high-quality novel view rendering for monocular,
stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for
RGB-D cameras but suffer significant degradation in rendering quality for monocular
cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous
localization and high-quality photorealistic mapping across monocular, stereo,
and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art
visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D
Gaussians to better model image appearance variations across different camera
poses. Second, we introduce a frequency regularization pyramid to guide the distribution
of Gaussians, allowing the model to effectively capture finer details in the scene.
Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that
Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic
mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular
cameras.
'
project_page: null
paper: https://arxiv.org/pdf/2501.05242.pdf
code: null
video: null
tags:
- SLAM
thumbnail: assets/thumbnails/tianci2025scaffoldslam.jpg
publication_date: '2025-01-09T13:50:26+00:00'
date_source: arxiv
- id: bond2025gaussianvideo
title: 'GaussianVideo: Efficient Video Representation via Hierarchical Gaussian
Splatting'
authors: Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem
year: '2025'
abstract: 'Efficient neural representations for dynamic video scenes are critical
for applications ranging from video compression to interactive simulations. Yet,
existing methods often face challenges related to high memory usage, lengthy training
times, and temporal consistency. To address these issues, we introduce a novel
neural video representation that combines 3D Gaussian splatting with continuous
camera motion modeling. By leveraging Neural ODEs, our approach learns smooth
camera trajectories while maintaining an explicit 3D scene representation through
Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy,
progressively refining spatial and temporal features to enhance reconstruction
quality and accelerate convergence. This memory-efficient approach achieves high-quality
rendering at impressive speeds. Experimental results show that our hierarchical
learning, combined with robust camera motion modeling, captures complex dynamic
scenes with strong temporal consistency, achieving state-of-the-art performance
across diverse video datasets in both high- and low-motion scenarios.
'
project_page: https://cyberiada.github.io/GaussianVideo/
paper: https://arxiv.org/pdf/2501.04782.pdf
code: null
video: null
tags:
- Gaussian Video
- Project
- Video
thumbnail: assets/thumbnails/bond2025gaussianvideo.jpg
publication_date: '2025-01-08T19:01:12+00:00'
date_source: arxiv
- id: huang2025fatesgs
title: 'FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian
Splatting with Depth-Feature Consistency'
authors: Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu
year: '2025'
abstract: 'Recently, Gaussian Splatting has sparked a new trend in the field of
computer vision. Apart from novel view synthesis, it has also been extended to
the area of multi-view reconstruction. The latest methods facilitate complete,
detailed surface reconstruction while ensuring fast training speed. However, these
methods still require dense input views, and their output quality significantly
degrades with sparse views. We observed that the Gaussian primitives tend to overfit
the few training views, leading to noisy floaters and incomplete reconstruction
surfaces. In this paper, we present an innovative sparse-view reconstruction framework
that leverages intra-view depth and multi-view feature consistency to achieve
remarkably accurate surface reconstruction. Specifically, we utilize monocular
depth ranking information to supervise the consistency of depth distribution within
patches and employ a smoothness loss to enhance the continuity of the distribution.
To achieve finer surface reconstruction, we optimize the absolute position of
depth through multi-view projection features. Extensive experiments on DTU and
BlendedMVS demonstrate that our method outperforms state-of-the-art methods with
a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction
without the need for costly pre-training.
'
project_page: https://alvin528.github.io/FatesGS/
paper: https://arxiv.org/pdf/2501.04628.pdf
code: null
video: null
tags:
- Meshing
- Project
- Sparse
thumbnail: assets/thumbnails/huang2025fatesgs.jpg
publication_date: '2025-01-08T17:19:35+00:00'
date_source: arxiv
- id: kwak2025modecgs
title: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment
for Compact Dynamic 3D Gaussian Splatting'
authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh,
Munchurl Kim
year: '2025'
abstract: '3D Gaussian Splatting (3DGS) has made significant strides in scene representation
and neural rendering, with intense efforts focused on adapting it for dynamic
scenes. Despite delivering remarkable rendering quality and speed, existing methods
struggle with storage demands and representing complex real-world motions. To
tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting
framework designed for reconstructing novel views in challenging scenarios with
complex motions. We introduce GlobaltoLocal Motion Decomposition (GLMD) to effectively
capture dynamic motions in a coarsetofine manner. This approach leverages Global
Canonical Scaffolds (Global CS) and Local Canonical Scaffolds (Local CS), extending
static Scaffold representation to dynamic video reconstruction. For Global CS,
we propose Global Anchor Deformation (GAD) to efficiently represent global dynamics
along complex motions, by directly deforming the implicit Scaffold attributes
which are anchor position, offset, and local context features. Next, we finely
adjust local motions via the Local Gaussian Deformation (LGD) of Local CS explicitly.
Additionally, we introduce Temporal Interval Adjustment (TIA) to automatically
control the temporal coverage of each Local CS during training, allowing MoDecGS
to find optimal interval assignments based on the specified number of temporal
segments. Extensive evaluations demonstrate that MoDecGS achieves an average 70%
reduction in model size over stateoftheart methods for dynamic 3D Gaussians from
realworld dynamic videos while maintaining or even improving rendering quality.
'
project_page: 'MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval
Adjustment for Compact Dynamic 3D Gaussian Splatting'
paper: https://arxiv.org/pdf/2501.03714.pdf
code: null
video: https://youtu.be/5L6gzc5-cw8?si=L6v6XLZFQrYK50iV
tags:
- Compression
- Dynamic
- Project
- Video
thumbnail: assets/thumbnails/kwak2025modecgs.jpg
publication_date: '2025-01-07T11:43:13+00:00'
date_source: arxiv
- id: yu2025dehazegs
title: 'DehazeGS: Seeing Through Fog with 3D Gaussian Splatting'
authors: Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng
Zhang
year: '2025'
abstract: 'Current novel view synthesis tasks primarily rely on high-quality and
clear images. However, in foggy scenes, scattering and attenuation can significantly
degrade the reconstruction and rendering quality. Although NeRF-based dehazing
reconstruction algorithms have been developed, their use of deep fully connected
neural networks and per-ray sampling strategies leads to high computational costs.
Moreover, NeRF''s implicit representation struggles to recover fine details from
hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve
high-quality 3D scene reconstruction by explicitly modeling point clouds into
3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation
to explain the foggy image formation process through a physically accurate forward
rendering process. We introduce DehazeGS, a method capable of decomposing and
rendering a fog-free background from participating media using only muti-view
foggy images as input. We model the transmission within each Gaussian distribution
to simulate the formation of fog. During this process, we jointly learn the atmospheric
light and scattering coefficient while optimizing the Gaussian representation
of the hazy scene. In the inference stage, we eliminate the effects of scattering
and attenuation on the Gaussians and directly project them onto a 2D plane to
obtain a clear view. Experiments on both synthetic and real-world foggy datasets
demonstrate that DehazeGS achieves state-of-the-art performance in terms of both
rendering quality and computational efficiency.
'
project_page: null
paper: https://arxiv.org/pdf/2501.03659.pdf
code: null
video: null
tags:
- In the Wild
- Rendering
thumbnail: assets/thumbnails/yu2025dehazegs.jpg
publication_date: '2025-01-07T09:47:46+00:00'
date_source: arxiv
- id: lee2025compression
title: Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard
Video Codecs
authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge
year: '2025'
abstract: '3D Gaussian Splatting is a recognized method for 3D scene representation,
known for its high rendering quality and speed. However, its substantial data
requirements present challenges for practical applications. In this paper, we
introduce an efficient compression technique that significantly reduces storage
overhead by using compact representation. We propose a unified architecture that
combines point cloud data and feature planes through a progressive tri-plane structure.
Our method utilizes 2D feature planes, enabling continuous spatial representation.
To further optimize these representations, we incorporate entropy modeling in
the frequency domain, specifically designed for standard video codecs. We also
propose channel-wise bit allocation to achieve a better trade-off between bitrate
consumption and feature plane representation. Consequently, our model effectively
leverages spatial correlations within the feature planes to enhance rate-distortion
performance using standard, non-differentiable video codecs. Experimental results
demonstrate that our method outperforms existing methods in data compactness while
maintaining high rendering quality. Our project page is available at https://fraunhoferhhi.github.io/CodecGS
'
project_page: null
paper: https://arxiv.org/pdf/2501.03399.pdf
code: null
video: null
tags:
- Compression
thumbnail: assets/thumbnails/lee2025compression.jpg
publication_date: '2025-01-06T21:37:30+00:00'
date_source: arxiv
- id: rajasegaran2025gaussian
title: Gaussian Masked Autoencoders
authors: Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer,
Jitendra Malik, Shiry Ginosar
year: '2025'
abstract: 'This paper explores Masked Autoencoders (MAE) with Gaussian Splatting.
While reconstructive self-supervised learning frameworks such as MAE learns good
semantic abstractions, it is not trained for explicit spatial awareness. Our approach,
named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions
and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end
in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based
representation and renders images via splatting. We show that GMAE can enable
various zero-shot learning capabilities of spatial understanding (e.g., figure-ground
segmentation, image layering, edge detection, etc.) while preserving the high-level
semantics of self-supervised representation quality from MAE. To our knowledge,
we are the first to employ Gaussian primitives in an image representation learning
framework beyond optimization-based single-scene reconstructions. We believe GMAE
will inspire further research in this direction and contribute to developing next-generation
techniques for modeling high-fidelity visual data. More details at https://brjathu.github.io/gmae
'
project_page: null
paper: https://arxiv.org/pdf/2501.03229.pdf
code: null
video: null
tags:
- Transformer
thumbnail: assets/thumbnails/rajasegaran2025gaussian.jpg
publication_date: '2025-01-06T18:59:57+00:00'
date_source: arxiv
- id: nguyen2025pointmapconditioned
title: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
authors: Thang-Anh-Quan Nguyen, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry
Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond
year: '2025'
abstract: 'In this paper, we present PointmapDiffusion, a novel framework for single-image
novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our
method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates)
as a conditioning signal, capturing geometric prior from the reference images
to guide the diffusion process. By embedding reference attention blocks and a
ControlNet for pointmap features, our model balances between generative capability
and geometric consistency, enabling accurate view synthesis across varying viewpoints.
Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion
achieves high-quality, multi-view consistent results with significantly fewer
trainable parameters compared to other baselines for single-image NVS tasks.
'
project_page: null
paper: https://arxiv.org/pdf/2501.02913.pdf
code: null
video: null
tags:
- Diffusion
thumbnail: assets/thumbnails/nguyen2025pointmapconditioned.jpg
publication_date: '2025-01-06T10:48:31+00:00'
date_source: arxiv
- id: bian2025gsdit
title: 'GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through
Efficient Dense 3D Point Tracking'
authors: Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng
Li
year: '2025'
abstract: '4D video control is essential in video generation as it enables the use
of sophisticated lens techniques, such as multi-camera shooting and dolly zoom,
which are currently unsupported by existing methods. Training a video Diffusion