• Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition 

      Stefanov, Kalin; Beskow, Jonas; Salvi, Giampiero (Journal article; Peer reviewed, 2019)
      This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive ...
    • Semantically Meaningful Metrics for Norwegian ASR Systems 

      Rugayan, Janine Lizbeth Cabrera; Svendsen, Torbjørn Karl; Salvi, Giampiero (Peer reviewed; Journal article, 2022)
      Evaluation metrics are important for quanitfying the performance of Automatic Speech Recognition (ASR) systems. However, the widely used word error rate (WER) captures errors at the word-level only and weighs each error ...
    • Semi-supervised learning for Automatic Speech Recognition 

      Rahim, Felicia (Master thesis, 2020)
      Denne masteroppgaven undersøker et talegjenkjenningssystem som trent på en delvis annotert database innenfor fagområdet talegjenkjenning (ASR). Et dypt nevralt nettverk (DNN) klassifiserte tilstander som tilhørte individuelle ...
    • Sequence-to-sequence articulatory inversion through time convolution of sub-band frequency signals 

      Sabzi Shahrebabaki, Abdolreza; Siniscalchi, Sabato Marco; Salvi, Giampiero; Svendsen, Torbjørn Karl (Peer reviewed; Journal article, 2020)
      We propose a new acoustic-to-articulatory inversion (AAI) sequence-to-sequence neural architecture, where spectral sub-bands are independently processed in time by 1-dimensional (1-D) convolutional filters of different ...
    • Silent Speech Communication Using Facial Electromyography 

      Backsæther, Mathias Gullikstad (Master thesis, 2021)
      Språk er uvurderlig for mennesket som art, og tale som kommunikasjonsmiddel muliggjør samarbeid mellom mennesker hver dag. Allikevel finnes det ulike situasjoner der vokalisert tale ikke er et alternativ. Interessen for ...
    • Spatial Bias in Vision-Based Voice Activity Detection 

      Stefanov, Kalin; Adiban, Mohammad; Salvi, Giampiero (Peer reviewed; Journal article, 2021)
      We develop and evaluate models for automatic vision-based voice activity detection (VAD) in multiparty human-human interactions that are aimed at complementing acoustic VAD methods. We provide evidence that this type of ...
    • A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity 

      Adiban, Mohammad; Siniscalchi, Sabato Marco; Salvi, Giampiero (Peer reviewed; Journal article, 2023)
      Cyber attacks and anomaly detection are problems where the data is often highly unbalanced towards normal observations. Furthermore, the anomalies observed in real applications may be significantly different from the ones ...
    • Transfer learning of articulatory information through phone information. 

      Sabzi Shahrebabaki, Abdolreza; Olfati, Negar; Siniscalchi, Sabato Marco; Salvi, Giampiero; Svendsen, Torbjørn Karl (Journal article; Peer reviewed, 2020)
      Articulatory information has been argued to be useful for several speech tasks. However, in most practical scenarios this information is not readily available. We propose a novel transfer learning framework to obtain ...
    • Using Modified Adult Speech as Data Augmentation for Child Speech Recognition 

      Fan, Zijian; Cao, Xinwei; Salvi, Giampiero; Svendsen, Torbjørn Karl (Peer reviewed; Journal article, 2023)
    • wav2vec2-based Speech Rating System for Children with Speech Sound Disorder 

      Gertman, Yaroslav; Al-Ghezi, Ragheb; Voskoboinik, Ekaterina; Grósz, Tamás; Kurimo, Mikko; Salvi, Giampiero; Svendsen, Torbjørn Karl; Strömbergsson, Sofia (Peer reviewed; Journal article, 2022)
      Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability to communicate efficiently. Speech ...
    • Word Discovery from Unsegmented Speech 

      Aune, Astrid (Master thesis, 2020)
      Hensikten til denne oppgaven er å finne ord i sammenhengende tale ved hjelp av ikke-veiledet maskinlæring. Det ble testet med to metoder av latent faktoranalyse; Non-Negative Matrix Factorization (NNMF) og Beta Process ...