Detecting Hallucinations In LLM Responses Using Token-level Log-probability Signals

Eliseev, V.; Maksimova, A.

doi:10.66693/mathai.1025

Article #1025

Issue MathAI 2026 Selected Papers Special Issue

Received 05 May 2026

Accepted 28 May 2026

Published 28 May 2026

Detecting Hallucinations In LLM Responses Using Token-level Log-probability Signals

Vadim Eliseev *

A

Aleksandra Yurievna Maksimova

MathAI 2026 Selected Papers Special Issue

Mathematics & AI 2026, 1, 22

DOI: 10.66693/mathai.1025 Published: May 28, 2026 Accepted: May 28, 2026 Received: May 5, 2026

LLM text classification RAG NLP dataset construction AI agents machine learning

Abstract

Large language models (LLMs) have proven themselves to be powerful tools for many natural language tasks — from being a high-quality text classifiers to acting as agents in complex retrieval-augmented generation (RAG) systems. However, from early beggining they suffer from a major limitation: hallucinations, i.e. confidently generating incorrect or misleading information that can also slightly correlate with the given task. This issue is critical in error-sensitive domains such as finance, medicine, and law, where even small inaccuracies can cause significant harm and detriment. In this study we address the early detection of hallucinating answers based on user input (prompt), answer by the LLM, and which is more important — token-level probabilty signals that can also be extracted from the LLM during its inference time. We constructed a dataset that combines textual information with sequences of token log-probabilities and their statistics (mean, min, variance, percentiles, etc.), labeled the answers whether they are hallucinations or not. We trained a lightweight classifier that outputs the probability that a given response is a hallucination. We evaluate the classifier and perform ablation studies to quantify the contribution of token-level signals versus text-only features. The intended use of the trained model is to be a standalone output guard agent in multi-agent system that rejects the answer of LLM-generator if its hallucination probability is above acceptance threshold and protects the users of it from having incorrect or misleading answer by making the whole system regenerate such answer or confirm that it cannot give the faithfull reply.

Cite this article

Eliseev, V.; Maksimova, A. Detecting Hallucinations In LLM Responses Using Token-level Log-probability Signals. Mathematics & AI 2026, 1, 22. https://doi.org/10.66693/mathai.1025

593 views 0 downloads

Mathematics & AI

Detecting Hallucinations In LLM Responses Using Token-level Log-probability Signals

Abstract

Cite this article

Full Text (PDF)