LLM-KGMQA: large language model-augmented multi-hop question-answering system based on knowledge graph in medical field

Wang, FeiLong; Shi, Donghui; Aguilar, Jose; Cui, Xinyi; Jiang, Jinsong; Longjian, Shen; Li, Mengya

doi:10.1007/s10115-025-02399-1

Files

Artículo principal (881.0Kb)

Identifiers

URI: https://hdl.handle.net/20.500.12761/1950

ISSN: 0219-3116

DOI: 10.1007/s10115-025-02399-1

Metadata

Show full item record

Author(s)

Wang, FeiLong; Shi, Donghui; Aguilar, Jose; Cui, Xinyi; Jiang, Jinsong; Longjian, Shen; Li, Mengya

Date

2025-06

Abstract

In response to the problems of poor performance of large language models in specific domains, limited research on knowledge graphs and question-answering systems incorporating large language models, this paper proposed a multi-hop question-answering system framework based on a knowledge graph in the medical field, which was fully augmented by large language models (LLM-KGMQA). The method primarily addressed the problems of entity linking and multi-hop knowledge path reasoning. To address the entity linking problem, an entity fast-linking algorithm was proposed, which categorized entities based on multiple attributes. Then, it used user mentions to obtain the target attribute set of attributes and further narrowed the entity search scope through attribute intersection operations. Finally, for entities that remained too numerous after the intersection, the method suggested using a pre-trained model for similarity calculation and ranking, and to determine the final entity through construction instructions. Regarding multi-hop knowledge path reasoning, the paper proposed a three-step reasoning framework that included an -hop subgraph construction algorithm, a knowledge fusion algorithm, and a semantics-based knowledge pruning algorithm. In the entity fast-linking experiments, the maximum computational complexity was reduced by 99.90% through intersection operations. Additionally, an evaluation metric called CRA@n was used alongside the classic nDCG metric. When using the RoBERTa model for similarity calculations, the CRA@n score reached a maximum of 96.40, the nDCG scores reached a maximum of 99.80, and the entity fast-linking accuracy was 96.60%. In multi-hop knowledge path reasoning, the paper first validated the need for knowledge fusion by constructing three different forms of instructions. Subsequently, experiments were conducted with several large language models, concluded that the GLM4 model showed the best performance in Chinese semantic reasoning. The accuracy rates for GLM4 after pruning were 99.90%, 83.30%, and 86.60% for 1-hop, 2-hop, and 3-hop, respectively, compared to 95.00%, 6.60%, and 5.00% before pruning. The average response time was reduced by 1.36 s, 6.21 s and 27.07 s after pruning compared to before pruning.