Paper-Conference

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors featured image

EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work …

avatar
Ryuhei Miyazato
BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering featured image

BookAsSumQA: An Evaluation Framework for Aspect-Based Book Summarization via Question Answering

Aspect-based summarization aims to generate summaries that highlight specific aspects of a text, enabling more personalized and targeted summaries. However, its application to …

avatar
Ryuhei Miyazato
Ensembling Multiple Hallucination Detectors Trained on VLLM Internal Representations featured image

Ensembling Multiple Hallucination Detectors Trained on VLLM Internal Representations

This paper presents the 5th place solution by our team, y3h2, for the Meta CRAG-MM Challenge at KDD Cup 2025. The CRAG-MM benchmark is a visual question answering (VQA) dataset …

yuto-nakamizo