Abstract: As a newly emerging task, audio-visual question answering (AVQA) has attracted research attention. Compared with traditional single-modality (e.g., audio or visual) QA tasks, it poses new ...
Outperforms Qwen2.5-Omni-7B, Kimi-Audio-Instruct-7B on multiple key audio understanding tasks. Although MiDashengLM demonstrates superior audio understanding performance and efficiency compared to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results