Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Active speakers in context

  • Juan León Alcázar
  • , Fabian Caba Heilbron
  • , Long Mai
  • , Federico Perazzi
  • , Joon Young Lee
  • , Pablo Arbeláez
  • , Bernard Ghanem

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

64 Citas (Scopus)

Resumen

Current methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already benefits active speaker detection performance. We also find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.

Idioma originalInglés
Número de artículo9157027
Páginas (desde-hasta)12462-12471
Número de páginas10
PublicaciónProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI
EstadoPublicada - 2020
Evento2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, Estados Unidos
Duración: 14 jun. 202019 jun. 2020

Nota bibliográfica

Publisher Copyright:
© 2020 IEEE.

Huella

Profundice en los temas de investigación de 'Active speakers in context'. En conjunto forman una huella única.

Citar esto