数据集:
sayakpaul/nyu_depth_v2
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K预印本库:
arxiv:1903.03273其他:
depth-estimation许可:
As per the dataset homepage :
The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect . It features:
The dataset has several components:
There are other tasks supported by this dataset as well. You can find more about them by referring to this resource .
English.
A data point comprises an image and its annotation depth map for both the train and validation splits.
{ 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB at 0x1FF32A3EDA0>, 'depth_map': <PIL.PngImagePlugin.PngImageFile image mode=L at 0x1FF32E5B978>, }
The data is split into training, and validation splits. The training data contains 47584 images, and the validation data contains 654 images.
You can use the following code snippet to visualize samples from the dataset:
from datasets import load_dataset import numpy as np import matplotlib.pyplot as plt cmap = plt.cm.viridis ds = load_dataset("sayakpaul/nyu_depth_v2") def colored_depthmap(depth, d_min=None, d_max=None): if d_min is None: d_min = np.min(depth) if d_max is None: d_max = np.max(depth) depth_relative = (depth - d_min) / (d_max - d_min) return 255 * cmap(depth_relative)[:,:,:3] # H, W, C def merge_into_row(input, depth_target): input = np.array(input) depth_target = np.squeeze(np.array(depth_target)) d_min = np.min(depth_target) d_max = np.max(depth_target) depth_target_col = colored_depthmap(depth_target, d_min, d_max) img_merge = np.hstack([input, depth_target_col]) return img_merge random_indices = np.random.choice(len(ds["train"]), 9).tolist() train_set = ds["train"] plt.figure(figsize=(15, 6)) for i, idx in enumerate(random_indices): ax = plt.subplot(3, 3, i + 1) image_viz = merge_into_row( train_set[idx]["image"], train_set[idx]["depth_map"] ) plt.imshow(image_viz.astype("uint8")) plt.axis("off")
The rationale from the paper that introduced the NYU Depth V2 dataset:
We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpretation.
The dataset consists of 1449 RGBD images, gathered from a wide range of commercial and residential buildings in three different US cities, comprising 464 different indoor scenes across 26 scene classes.A dense per-pixel labeling was obtained for each image using Amazon Mechanical Turk.
This is an involved process. Interested readers are referred to Sections 2, 3, and 4 of the original paper .
Who are the annotators?AMT annotators.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The preprocessed NYU Depth V2 dataset is licensed under a MIT License .
@inproceedings{Silberman:ECCV12, author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus}, title = {Indoor Segmentation and Support Inference from RGBD Images}, booktitle = {ECCV}, year = {2012} } @inproceedings{icra_2019_fastdepth, author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}}, title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}}, booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, year = {{2019}} }
Thanks to @sayakpaul for adding this dataset.