Video Description using Bidirectional Recurrent Neural Networks
Members: Marc Bolaños, Petia Radeva, Álvaro Perís (UPV), Francisco Casacuberta (UPV)
Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work, we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames.
References:
- Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta: Video Description Using Bidirectional Recurrent Neural Networks. ICANN (2) 2016: 3-11
- Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta: Video Description using Bidirectional Recurrent Neural Networks. CoRR abs/1604.03390 (2016)
- VQA
- M. Dimiccoli, M. Bolaños, E. Tavalera, M. Aghaei, S. Nikolov, P. Radeva. Semantic Regularized Clustering for Egocentric Photo Streams Segmentation. To appear in Computer Vision and Image Understanding, 2016.
Egocentric Image Retrieval with Convolutional Neural Networks
Members: Gabriel de Oliveira, Mariella Dimiccoli, Petia Radeva
Recent advances in lifelogging technologies, and in particular, in the field of wearable cameras, have made possible to capture continuously our daily life from a first-person point of view and in a free-hand fashion. However, given the large amount of images captured and the rate to which they increase (up 2000 images per day), there is a strong need for efficient and scalable indexing and retrieval systems over egocentric images. To cope with those requirements, we develop a full Content-Based Image Retrieval system based on Convolutional Neural Network (CNN) features. In our approach, we use egocentric images to create a Lucene index with off-the-shelf features extracted from a pre-trained CNN. The extracted features are integrated into Solr, an open-source, state-of-the-art inverted index search platform. Finally, we provide a web-based prototype for egocentric image search and retrieval and tested its performances on the EDUB egocentric dataset.
References:
- Gabriel Oliveira-Barra, Mariella Dimiccoli, Petia Radeva: Egocentric Image Retrieval with Convolutional Neural Networks. CCIA 2016: 71-76
- G. de Oliveria, A. Cartas, M. Bolaños, M. Dimiccoli, M. Aghaei, M. Carné, X. Giro-i-Nieto, P. Radeva LEMoRe: A Lifelog Engine for Moments Retrieval at the NTCIR-Lifelog LSAT Task}. NTCIR-12 Conference & EVIA in NII, Tokyo, Japan, June 2016
- Aniol Lidon, Marc Bolaños, Markus Seidl, Xavier Giró i Nieto, Petia Radeva, Matthias Zeppelzauer: UPC-UB-STP @ MediaEval 2015 Diversity Task: Iterative Reranking of Relevant Images. MediaEval 2015