EDUB-SegDesc is a dataset that can be used either for egocentric events segmentation or for egocentric sequences description. It was acquired by the wearable camera Narrative Clip, taking a picture every 30 seconds (2 fpm). It consists of 55 days acquired by 9 people. Each day was manually segmented in events or sequences with the following criteria:
An event is a semantically perceptual unit that can be inferred by visual features, without any prior knowledge of what the camera wearer is actually doing.
The EDUB-SegDesc dataset contains a total of 48,717 images, divided in 1,339 events (or image sequences) and 3,991 captions, and has an average of 3 captions per event. It was divided in training, validation and test splits making sure that all the sequences from the same day should belong to the same data split.
Contents
All the day sets are named as ‘SubjectX-SetY’ for data anonymization. Each person is identified as ‘SubjectX’, where X is a number between 1 and 9. And each of the day sets is identified as ‘SetY’, where Y is a number identifying the different days.
Following we describe the files available in each of the folders:
- Images/SubjectX-SetY – folders representing each of the day with all the images.
- GT/segmentations/SubjectX-SetY.xls – ground truth segmentation of the days in events. The GT segmentation used during the experiments is the one present in the second column of the .xls files. Each row represents an event and has the following format: ‘name_x-name_y’. Where ‘name_x’ is the name of the first image in the event and ‘name_y’ is the name of the last image of the same event.
- GT/descriptions/SubjectX-SetY.txt – list of descriptions for each of the days. Each line of the .txt files represents a description belonging to some of the segments with the following format ‘SegmentZ,description…’, where Z is a number that identifies the row (i.e. event) from the corresponding file in GT/segmentations. Note that some events contain the description ‘ERROR’, which indicates that all the images from that event are dark and/or blurry, which implies that are not valid neither for segmentation nor for sequences description and should not be considered.
- Annotations/captions_final.id.en – contains all the descriptions (i.e. captions) present in the dataset with the following format: ‘SubjectX-SetY_Segment_Z#N—-description’, where N are ithe identifiers of the descriptions from a very same segment (i.e. event).
- Annotations/S_list_final.txt – ‘S’ corresponds to one of the dataset splits ‘train’, ‘val’ or ‘test’. Each file contains the list of segments (i.e. events) in chronological order that belong to each of the dataset splits.
Citation
If you use this dataset for any purpose, please, do not forget to cite the following paper:
Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Sergi Soler and Petia Radeva. «Egocentric Video Description based on Temporally-Linked Sequences.» In Special Issue on Egocentric Vision and Lifelogging Tools.
Journal of Visual Communication and Image Representation (VCIR), (SUBMITTED), 2017.
If you have any doubt or proposal, please, do not hesitate to contact the first author:
Contact
Marc Bolaños
marc.bolanos@ub.edu