Show simple item record

dc.contributor.authorYang, Jingkang
dc.contributor.authorLiu, Shuai
dc.contributor.authorGuo, Hongming
dc.contributor.authorDong, Yuhao
dc.contributor.authorZhang, Xiamengwei
dc.contributor.authorZhang, Sicheng
dc.contributor.authorWang, Pengyun
dc.contributor.authorZhou, Zitang
dc.contributor.authorXie, Binzhu
dc.contributor.authorWang, Ziyue
dc.contributor.authorOuyang, Bei 
dc.contributor.authorLin, Zhengyu
dc.contributor.authorCominelli, Marco
dc.contributor.authorCai, Zhongang
dc.contributor.authorLi, Bo
dc.contributor.authorZhang, Yuanhan
dc.contributor.authorZhang, Peiyuan
dc.contributor.authorHong, Fangzhou
dc.contributor.authorWidmer, Joerg 
dc.contributor.authorGringoli, Francesco
dc.contributor.authorYang, Lei
dc.contributor.authorLiu, Ziwei
dc.date.accessioned2026-06-15T10:36:22Z
dc.date.available2026-06-15T10:36:22Z
dc.date.issued2025-06-15
dc.identifier.urihttps://hdl.handle.net/20.500.12761/2041
dc.description.abstractWe introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities - including discussions, shopping, cooking, socializing, and entertainment - using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references. This effort resulted in the EgoLife Dataset, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation. Leveraging this dataset, we introduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide meaningful assistance in daily life by addressing practical questions such as recalling past relevant events, monitoring health habits, and offering personalized recommendations. To address the key technical challenges of (1) developing robust visual-audio models for egocentric data, (2) enabling identity recognition, and (3) facilitating long-context question answering over extensive temporal information, we introduce EgoButler, an integrated system comprising EgoGPT and EgoRAG. EgoGPT is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.es
dc.language.isoenges
dc.titleEgoLife: Towards Egocentric Life Assistantes
dc.typeconference objectes
dc.conference.date11-15 June 2025es
dc.conference.placeMusic City Center in Nashville, Tennessee, USAes
dc.conference.titleIEEE/CVF Conference on Computer Vision and Pattern Recognition*
dc.event.typeconferencees
dc.pres.typeposteres
dc.type.hasVersionVoRes
dc.rights.accessRightsopen accesses
dc.description.refereedTRUEes
dc.description.statuspubes


Files in this item

This item appears in the following Collection(s)

Show simple item record