Blender Tutorial Audio Visual

Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering

Abstract: As a newly emerging task, audio-visual question answering (AVQA) has attracted research attention. Compared with traditional single-modality (e.g., audio or visual) QA tasks, it poses new ...

GitHub

Efficient audio understanding with general audio captions

Outperforms Qwen2.5-Omni-7B, Kimi-Audio-Instruct-7B on multiple key audio understanding tasks. Although MiDashengLM demonstrates superior audio understanding performance and efficiency compared to ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering

Efficient audio understanding with general audio captions

Trending now