01 感知:目標檢測
1. BEV感知
BEV Camera
view transformation
- 2d -> 3d via depth estimation
- 3d -> 2d (originates in 3d space)
- pure network based (implicitly)
BEV Lidar
voxelization and 3d convs
-
VoxelNet 2018(voxelization -> 3d convs -> flatten height dim -> RPN)
-
SECOND 2018(Sparse 3d convs)
No 3d convs (faster)
PointNet 2016(no voxels, using mlp encode pts features)
- PointPillars 2019(Voxelization with pillars as BEV)
BEV Fusion
- BEVFusion MIT(Efficient pv -> bev transformation)
temporal fusion
-
BEVDet4D 2022(Spatial alignment; concatenation of multiple feature map)
-
BEVFormer 2022(adopt a soft way to fusion temporal information)\(\star\)
1.1 dense bev feature
-
LSS 2020(First Depth Distribution)
-
BEVDet 2022(BEV space data augmentation)
-
BEVDepth 2022(Depth Correction)
-
BEVFusion 2022(Fusion on BEV from Camera and LiDAR)
Cam2BEV 2020(Homo-graphic Projection to BEV)
1.2 Attention Mechanism
-
DETR
-
DAB-DETR (收斂慢因為: 沒有提供位置先驗的 learnable queries)
-
DN-DETR (收斂慢因為:匈牙利匹配的離散性和模型訓練的隨機性,導致了 query 對 gt 的匹配變成了一個動態的、不穩定的過程)
-
Deformable DETR
-
DETR3D
-
PETR 2022(Implicit BEV Pos Embed)
-
BEVFormer 2022(Transformer for BEV feature)
Sparse3D

浙公網安備 33010602011771號