Abstract: Multi-modal intent detection aims to utilize various modalities to understand the user’s intentions, which is essential for the deployment of dialogue systems in real-world scenarios. The ...