Autonomous Driving Datasets

无人驾驶方面的数据集

kitti 数据集

KITTI数据集是由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办，利用组装的设备齐全的采集车辆对实际交通场景进行数据采集获得的公开数据集。该数据集包含丰富多样的传感器数据（有双目相机、64线激光雷达、GPS/IMU组合导航定位系统，基本满足对图像、点云和定位数据的需求）、大量的标定真值（包括检测2D和3D包围框、跟踪轨迹tracklet）和官方提供的一些开发工具等。

Kitti 数据采集平台装配 2 个灰度摄像头，2 个彩色摄像头，1个 velodyne 64线 3D 激光雷达，4 个光学镜头和 1个 GPS 导航系统。为了方便传感器数据标定，固定坐标系放下如下：

Camera： x = right, y = down, z = forward
Velodyne： x = forward, y = left, z = up
GPS/IMU： x = forward, y = left, z = up

学术界比较喜欢的数据集, 但对于工业届来说, 数据量比较小.

raw data数据集按照类别被分为Road、City、Residential、Campus和Person。（道路、城市、住宅、校园、行人）

原始数据采集于2011年的5天，共有180GB数据。

数据集组成：

立体图像和光流图: 389对
视觉测距序列: 39.2 km
3D标注物体的图像组成: 超过200k
采样频率: 10Hz

3D物体检测类别: car, van, truck, pedestrian, pedestrian(sitting), cyclist, tram, misc

calib 标定数据

calib 文件是相机、雷达和惯导等传感器的标定数据。以 000001.txt 文件为例，内容如下：

1
2
3
4
5
6
7


P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03
P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03
R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01
Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01

P0, p1, p2, p3 表示相机的编号， 0 和1 分别表示左边和右边的灰度相机；2，3 分别表示左边和右边的彩色相机。冒号之后分别表示相机内存矩阵，大小是 $3 * 4$
R0_rect 表示 0号相机的修正矩阵，在实际计算中，需要将 $33 $矩阵扩展成$ 44$ 矩阵，方法为在第四行和第四列添加全为 0 的向量，并将 (4,4) 的索引值设为 1
Tr_velo_to_cam 是从雷达到相机的旋转平移矩阵，大小是 $3 * 4$ ，包含了旋转矩阵 R和平移向量 t
Tr_imu_to_velo 是惯导或 GPS 装置到相机的旋转平移矩阵

label 文件

label文件是KITTI中object的标签和评估数据，以“000001.txt”文件为例，包含样式如下：

1
2
3
4
5
6
7


Truck 0.00 0 -1.57 599.41 156.40 629.75 189.25 2.85 2.63 12.34 0.47 1.49 69.44 -1.56
Car 0.00 0 1.85 387.63 181.54 423.81 203.12 1.67 1.87 3.69 -16.53 2.39 58.49 1.57
Cyclist 0.00 3 -1.65 676.60 163.95 688.98 193.93 1.86 0.60 2.02 4.59 1.32 45.84 -1.55
DontCare -1 -1 -10 503.89 169.71 590.61 190.13 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 511.35 174.96 527.81 187.45 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 532.37 176.35 542.68 185.27 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 559.62 175.83 575.40 183.15 -1 -1 -1 -1000 -1000 -1000 -10

总共有 16 列

第1列（字符串）：代表物体类别（type）总共有9类，分别是：Car、Van、Truck、Pedestrian、Person_sitting、Cyclist、Tram、Misc、DontCare。其中DontCare标签表示该区域没有被标注，比如由于目标物体距离激光雷达太远。为了防止在评估过程中（主要是计算precision），将本来是目标物体但是因为某些原因而没有标注的区域统计为假阳性(false positives)，评估脚本会自动忽略DontCare区域的预测结果。
第2列（浮点数）：代表物体是否被截断（truncated）数值在0（非截断）到1（截断）之间浮动，truncated表示指离开图像边界对象的程度。
第3列（整数）：代表物体是否被遮挡（occluded）整数0、1、2、3分别表示被遮挡的程度。0：完全可见 1：小部分遮挡 2：大部分遮挡 3：完全遮挡（unknown）
第4列（弧度数）：物体的观察角度（alpha）取值范围为：-pi ~ pi（单位：rad），它表示在【相机坐标系】下，以相机原点为中心，相机原点到物体中心的连线为半径，将物体绕相机 $y$ 轴旋转至相机 $z$ 轴，此时物体方向与相机 $x$ 轴的夹角，如下图所示：3D物体的观察角和方位角。

$r_{y} + p i / 2 - θ = α + p i / 2$ （即图中紫色的角是相等的）所以** $α = r_{y} - θ$ **

第5~8列（浮点数）：物体的2D边界框大小（bbox）四个数分别是xmin、ymin、xmax、ymax（单位：pixel），表示2维边界框的左上角和右下角的坐标。
第9~11列（浮点数）：3D物体的尺寸（dimensions）分别是高、宽、长（单位：米）
第12-14列（浮点数）：3D物体的位置（location）分别是x、y、z（单位：米），特别注意的是，这里的xyz是在【相机坐标系】下3D物体的中心点位置。
第15列（弧度数）：3D物体的空间方向（rotation_y）取值范围为：-pi ~ pi（单位：rad），它表示，在【照相机坐标系】下，物体的全局方向角（物体前进方向与相机坐标系x轴的夹角），即上图所示的 $r_{y}$ 角。
第16列（浮点数）：检测的置信度（score）检测的置信度，仅用于结果：浮点，p / r曲线所需，越高越好。要特别注意的是，这个数据只在测试集的数据中有。

kitti stereo camera （双目摄像头)

kitti depth 数据集

This benchmark is related to our work published in Sparsity Invariant CNNs (THREEDV 2017). It contains over 93 thousand depth maps with corresponding raw LiDaR scans and RGB images, aligned with the “raw data” of the KITTI dataset. Given the large amount of training data, this dataset shall allow a training of complex deep learning models for the tasks of depth completion and single image depth prediction. Also, we provide manually selected images with unpublished depth maps to serve as a benchmark for those two challenging tasks.

这个数据集是和整个数据集在一块，位于 raw data 目录下

This is our single image depth prediction evaluation, where dense depth maps have to be predicted from a single RGB image input. It consists of 93k training and 1k eval as well as 500 test images.

train, val 和 test 数据量是这样的。

Odometry 视觉测距仪

这个是另一个任务，需要看看车载有没有相应的设备。

nuScenes 数据集

nuScenes 发布方– motional

motional 是由现代汽车集团和 aptiv 合资成立的一家无人驾驶公司，致力于使无人驾驶汽车安全、可靠和可达。

nuScenes数据集是由Motional团队开发的用于无人驾驶的公共大型数据集。为了支持公众在计算机视觉和自动驾驶的研究，Motional公开了nuScenes的部分数据。该数据集不仅包含 camera 和 lidar，还记录了雷达数据。（这 maybe 是唯一一个有雷达数据的数据集）

nuScenes数据集在波士顿和新加坡这两个城市收集了1000个驾驶场景，这两个城市交通繁忙而且驾驶状况极具挑战性。这个数据集有 1000 个场景（scenes）组成，每个 scene 长度为 20 秒，包含各种各样的情景。在每个 scenes 中，有 40 个关键帧，也就是每秒钟有 2 个关键帧，其他的都是 sweeps。关键帧经过手工标注，每一帧都有若干 annotation，标注的形式是 bounding box。

v1.0-mini 有以下文件夹

1
2
3
4


maps
sweeps
v1.0-mini
samples

http://47.94.35.231:8090/notebooks/notebook/nuscenes-devkit/python-sdk/tutorials/nuscenes_tutorial.ipynb#

这个是不错的 nuscenes 数据集的 tool 工具

A gentle introduction to nuScenes

log - Log information from which the data was extracted.

scene - 20 second snippet of a car’s journey.

sample - An annotated snapshot of a scene at a particular timestamp.

sample_data - Data collected from a particular sensor.

ego_pose - Ego vehicle poses at a particular timestamp.

sensor - A specific sensor type.

calibrated sensor - Definition of a particular sensor as calibrated on a particular vehicle.

instance - Enumeration of all object instance we observed.

category - Taxonomy of object categories (e.g. vehicle, human).

attribute - Property of an instance that can change while the category remains the same.

visibility - Fraction of pixels visible in all the images collected from 6 different cameras.

sample_annotation - An annotated instance of an object within our interest.

map - Map data that is stored as binary semantic masks from a top-down view.

nuscenes 提供了一个数据集的 mini 版本（ https://www.nuscenes.org/data/v1.0-mini.tgz），可以用来进行测试和学习

http://47.94.35.231:8090/notebooks/notebook/nuscenes-devkit/python-sdk/tutorials/nuscenes_tutorial.ipynb

使用这个脚本进行学习（TODO）

https://github.com/nutonomy/nuscenes-devkit 学习笔记

https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/can_bus

This page describes the Controller Area Network (CAN) bus expansion for the nuScenes dataset. 这个不懂但也不重要

http://47.94.35.231:8090/notebooks/notebook/nuscenes-devkit/python-sdk/tutorials/map_expansion_tutorial.ipynb

In database terms, layers are basically tables of the map database in which we assign arbitrary parts of the maps with informative labels such as traffic_light, stop_line, walkway, etc. Refer to the discussion on layers for more details.

这里说的是 traffic light，那么貌似是有用的的。可以在 jupyter 上尝试一下

http://47.94.35.231:8090/notebooks/notebook/nuscenes-devkit/python-sdk/tutorials/nuscenes_lidarseg_panoptic_tutorial.ipynb

这个是 lidar 相关的任务，做 seg 和 panopitic 任务

http://47.94.35.231:8090/notebooks/notebook/nuscenes-devkit/python-sdk/tutorials/prediction_tutorial.ipynb

这个脚本感觉用处不大

How can I participate in the nuScenes challenges?

See the overview site for the object detection challenge.
See the overview site for the tracking challenge.
See the overview site for the prediction challenge.

从这里可以找到 sota 的方法

Can I use nuScenes for 2d object detection?

Objects in nuScenes are annotated in 3d.
You can use this script to project them to 2d, but note that such 2d boxes are not generally tight.

注意 nuScenes 是做 3D 的

标注中所有的 labels

Car or Van or SUV: Vehicle designed primarily for personal use, e.g. sedans, hatch-backs, wagons, vans, mini-vans, SUVs and jeeps.

Truck: Vehicles primarily designed to haul cargo including pick-ups, lorrys, trucks and semi-tractors. Trailers hauled after a semi-tractor should be labeled as “Trailer”.

Pickup Truck: A pickup truck is a light duty truck with an enclosed cab and an open or closed cargo area. A pickup truck can be intended primarily for hauling cargo or for personal use.

Front Of Semi Truck: Tractor part of a semi trailer truck. Trailers hauled after a semi-tractor should be labeled as a trailer.

Bendy Bus: Buses and shuttles designed to carry more than 10 people and comprises two or more rigid sections linked by a pivoting joint. Annotate each section of the bendy bus individually.

Rigid Bus: Rigid buses and shuttles designed to carry more than 10 people.

Construction Vehicle: Vehicles primarily designed for construction. Typically very slow moving or stationary. Cranes and extremities of construction vehicles are only included in annotations if they interfere with traffic. Trucks used to hauling rocks or building materials are considered trucks rather than construction vehicles.

Motorcycle: Gasoline or electric powered 2-wheeled vehicle designed to move rapidly (at the speed of standard cars) on the road surface. This category includes all motorcycles, vespas and scooters. It also includes light 3-wheel vehicles, often with a light plastic roof and open on the sides, that tend to be common in Asia. If there is a rider and/or passenger, include them in the box.

Bicycle: Human or electric powered 2-wheeled vehicle designed to travel at lower speeds either on road surface, sidewalks or bicycle paths. If there is a rider and/or passenger, include them in the box.

Bicycle Rack: Area or device intended to park or secure the bicycles in a row. It includes all the bicycles parked in it and any empty slots that are intended for parking bicycles. Bicycles that are not part of the rack should not be included. Instead they should be annotated as bicycles separately.

Trailer: Any vehicle trailer, both for trucks, cars and motorcycles (regardless of whether currently being towed or not). For semi-trailers (containers) label the truck itself as “Truck”.

Police Vehicle: All types of police vehicles including police bicycles and motorcycles.

Ambulance: All types of ambulances.

Adult Pedestrian: An adult pedestrian moving around the cityscape. Mannequins should also be annotated as Adult Pedestrian.

Child Pedestrian: A child pedestrian moving around the cityscape.

Construction Worker: A human in the scene whose main purpose is construction work.

Stroller: Any stroller. If a person is in the stroller, include in the annotation. If a pedestrian pushing the stroller, then they should be labeled separately.

Wheelchair: Any type of wheelchair. If a pedestrian is pushing the wheelchair then they should be labeled separately.

Portable Personal Mobility Vehicle: A small electric or self-propelled vehicle, e.g. skateboard, segway, or scooters, on which the person typically travels in a upright position. Driver and (if applicable) rider should be included in the bounding box along with the vehicle.

Police Officer: Any type of police officer, regardless whether directing the traffic or not.

Animal: All animals, e.g. cats, rats, dogs, deer, birds.

Traffic Cone: All types of traffic cones.

Temporary Traffic Barrier: Any metal, concrete or water barrier temporarily placed in the scene in order to re-direct vehicle or pedestrian traffic. In particular, includes barriers used at construction zones. If there are multiple barriers either connected or just placed next to each other, they should be annotated separately.

Pushable Pullable Object: Objects that a pedestrian may push or pull. For example dolleys, wheel barrows, garbage-bins with wheels, or shopping carts. Typically not designed to carry humans.

Debris: Debris or movable object that is too large to be driven over safely. Includes misc. things like trash bags, temporary road-signs, objects around construction zones, and trash cans.

标注中的 attributes

visibility

Vehicle:

bicycle

…

这里有非常详细的标注注意事项。

The nuScenes dataset consists of data collected from our full sensor suite which consists of:

1 x LIDAR,
5 x RADAR,
6 x cameras,

参考文献

nuScenes数据集标注格式

数据集标注，这个 link 中讲解的比较细

无人驾驶数据集 | nuScenes数据集介绍及下载

nuScenes 介绍和下载

waymo

waymo 数据集包含 3000 段驾驶记录，时长共计 16.78 小时。平均每段长度约为 20 秒。改数据集涵盖不同的天气条件，白天，夜晚不同的时间段，市中心，郊区不同地点，行人，自行车等不同的道路对象。

最关键的 3D 目标监测和追踪任务中，最重磅的数据集是 Waymo，适合具有充足计算资源的自动驾驶部门进行研究，相比之下 nuScunes 的大小更加适合学界进行研究。

lyft

超过 55000 个 3D 人工注释框架

来自 7 个摄像头和 3 个激光雷达收集到的数据

waymo open dataset

内含 3000 个、长达 16.7 小时的架势场景，场景数量是 nuScenes 数据的三倍，总共包括了 2500 万个 3D边界框，2200 万个 2D 边界框。

argoverse

Argoverse数据集是由Argo AI、卡内基梅隆大学、佐治亚理工学院发布的用于支持自动驾驶汽车3D Tracking和Motion Forecasting研究的数据集。数据集包括两个部分：Argoverse 3D Tracking与Argoverse Motion Forecasting。

Argoverse数据集包含LiDAR数据、 RGB视频数据、前向双目数据、6 DOF的定位数据以及高精地图的数据，所有的数据都跟高精地图数据进行过配准。

Argoverse是第一个包含高精地图的数据集，它包含了290KM的带有几何形状和语义信息的高精地图数据。

SODA10M

此前，Waymo拥有最大的2D自动驾驶数据集。不过现在，华为诺亚方舟实验室联合中山大学发布了新一代2D自动驾驶数据集SODA10M。比Waymo现有的大10倍。包括了1000万张无标注图片以及2万张带标注图片。

SODA10M数据集收集了不同城市在不同天气条件、时间段以及位置的场景。1000万张无标注图片来自32个城市，囊括了国内大部分地区。

2万张带标注的图片，直接标出了6种主要的人车场景类别，分别是：Pedestrian、Cyclist、Car、Truck、Tram、Tricycle。

针对涉及隐私的信息，例如人脸和车牌等都会进行模糊处理。

以下是使用该数据集搞的一个竞赛。

仅仅通过全监督训练出俩的模型取得的效果并不好，全监督训练的结果夜晚和白天的精度差别很大。在Moco系列（城市景观语义分割），以及基于像素和中间层特征的自监督方法DetCo, DenseCL上，SODA10M自监督训练的效果与ImageNet相仿。

（这个数据集很重点，可以好好学习一下）

文章目录