TY - JOUR
T1 - Ego4D
T2 - Around the World in 3,600 Hours of Egocentric Video
AU - Colaboración
AU - Grauman, Kristen
AU - Westbury, Andrew
AU - Byrne, Eugene
AU - Cartillier, Vincent
AU - Chavis, Zachary
AU - Furnari, Antonino
AU - Girdhar, Rohit
AU - Hamburger, Jackson
AU - Jiang, Hao
AU - Kukreja, Devansh
AU - Liu, Miao
AU - Liu, Xingyu
AU - Martin, Miguel
AU - Nagarajan, Tushar
AU - Radosavovic, Ilija
AU - Ramakrishnan, Santhosh Kumar
AU - Ryan, Fiona
AU - Sharma, Jayant
AU - Wray, Michael
AU - Xu, Mengmeng
AU - Xu, Eric Zhongcong
AU - Zhao, Chen
AU - Bansal, Siddhant
AU - Batra, Dhruv
AU - Crane, Sean
AU - Do, Tien
AU - Doulaty, Morrie
AU - Erapalli, Akshay
AU - Feichtenhofer, Christoph
AU - Fragomeni, Adriano
AU - Fu, Qichen
AU - Gebreselasie, Abrham
AU - González, Cristina
AU - Hillis, James
AU - Huang, Xuhua
AU - Huang, Yifei
AU - Jia, Wenqi
AU - Khoo, Weslie
AU - Kolář, Jáchym
AU - Kottur, Satwik
AU - Kumar, Anurag
AU - Landini, Federico
AU - Li, Chao
AU - Li, Yanghao
AU - Li, Zhenqiang
AU - Mangalam, Karttikeya
AU - Modhugu, Raghava
AU - Munro, Jonathan
AU - Murrell, Tullie
AU - Nishiyasu, Takumi
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception.
AB - We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception.
KW - datasets and benchmarks
KW - egocentric video
KW - first-person vision
KW - Video understanding
UR - https://www.scopus.com/pages/publications/85199560331
U2 - 10.1109/TPAMI.2024.3381075
DO - 10.1109/TPAMI.2024.3381075
M3 - Article
C2 - 39058617
AN - SCOPUS:85199560331
SN - 0162-8828
VL - 47
SP - 9468
EP - 9509
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 11
ER -