Workshop on AI for Speech & Conversation

Workshop on AI for Speech & Conversation

28th March 2025 @ University of Glasgow


The Social AI group and the Social AI Centre for Doctoral Training (CDT) are organising a Workshop on AI for Speech & Conversation on 28th March 2025 at the Advanced Research Centre, University of Glasgow, United Kingdom. 


Social AI involves developing an AI domain aimed at endowing artificial agents with social intelligence, the ability to deal appropriately with users’ attitudes, intentions, feelings, personality and expectations. Understanding speech and conversation is a key component. This full day workshop will host a series of invited talks by renowned experts followed by roundtable discussion with the audience. Our goal is to bring together academic experts, students and industry professionals to encourage dialogs around the progress, challenges and opportunities aound this important topics as AI continues to permeate all aspects of our social presence. 

Invited Speakers

Senior Researcher (Directeur de Recherche), Inria, France | Professor, Avignon University, France

Talk: Explainability in speaker recognition (and more generally in speech processing)

Explainability has become a mandatory topic in AI in general. This is largely due to the need for greater trust on the part of experts and the general public, in the face of AI's limitations, manipulations, biases, errors and hallucinations. New AI regulations, such as those of the EU, also play an important role, as explainability aspects are now required for certain applications. Speech processing applications  are particularly concerned, as they are often linked to critical human aspects, such as HR, healthcare or forensics. This talk will present in few words some of the main approaches  in AI explainability (XAI), as well as their limitations. Using speaker recognition as example, a new explainable by design approach will be presented. By representing speech in terms of the presence or absence of speech attributes taken from a small and bounded set, it enables simple explanations that can be interpreted by anyone. Some potential extensions, such as a more general scheme capable of mixing knowledge-based and automatically discovered attributes, or the application of this principle to pre-trained encoders, will be discussed.

Assistant Professor, Utrecht University, The Netherlands

Talk: Towards Fair and Interpretable Speech-based Depression Severity Modeling

Recently, with increasing momentum, many state-of-the-art deep learning models have shown to be successful in detecting depression based on multimodal cues. However, such efforts and models render to be useless in clinical applications due to both legal (such as due to the new EU AI law) and practical reasons. Therefore, we aim to make such critical machine learning tasks employed for high-risk application responsible and trustworthy. From responsibility in ML, here we mean transparency/interpretability, algorithmic fairness and privacy. Since speech is relatively less prone to automatic subject identification via public tools/search engines compared to vision (i.e., face) and hence is more privacy-preserving, we work on speech modality for such critical tasks as depression. Therefore, this talk will focus on our recent and ongoing efforts in speech-based depression prediction with responsible AI considerations.

Associate professor, University of Twente, The Netherlands

Talk: From speech technology to spoken conversational interaction technology

Nowadays, speech technology is at our fingertips. Automatic speech recognition (ASR) and speech synthesis have evolved drastically to the point where ASR performance has reached human parity and where an artificial voice is not discernible anymore from a natural human voice. However, as soon as you start talking to machines, one will start to notice that speech technology is still facing many challenges. In order to move to technology that really understands you, this technology also needs to be able to process non-speech or paralinguistic information. In this talk, I will highlight some of the research we are carrying out on spoken conversational interaction technology. I will talk about how current open-source ASR systems deal with non-speech elements and different speaker groups. And I will present some of our work around designing robot communication (that does not always need to involve speech).

Program


TBD

Registration

Registration is free but space is limited. Please register only if you intend to attend the workshop for the entire day.

Link: https://events.bookitbee.com/social-ai-cdt/social-ai-cdt-workshop-social-ai-for-speech-and-co/