搜索

x
中国物理学会期刊

大语言模型在电池科研全流程应用的测评与无机固态电解质综合数据库构建

CSTR:32037.14.aps.74.20250572

Evaluation of the application of large language models in the entire process of battery research and development of a comprehensive database forinorganic solid electrolyte

CSTR:32037.14.aps.74.20250572
PDF
HTML
导出引用
  • 大语言模型的出现极大地推动了科学研究的进步. 以ChatGPT为代表的语言模型和DeepSeek R1为代表的推理模型, 为科研范式带来了显著变革. 尽管这些模型均为通用型, 但它们在电池领域, 尤其是固态电池的研究中, 展现出强大的泛化能力. 本研究系统性地筛选了2024年及之前重点期刊中的5309268篇文章, 精准提取了124021篇电池相关文献. 同时, 我们全面检索了欧洲专利局与美国专利局2024年及以前的申请与授权专利, 共计17559750篇, 从中筛选出125716篇电池相关专利. 利用这些文献与专利, 对语言模型的知识储备、实时学习、指令遵从和结构化输出能力进行了大量实验. 通过多维度的模型评估与分析发现: 当前的大语言模型在信息分类和数据提取等的精度基本达到了研究生水平, 语言模型在内容总结和趋势分析方面也展现出强大的能力. 同时, 我们也发现模型在极少数情况下可能出现数值幻觉问题. 而在处理电池领域海量数据时, 模型在工程应用方面仍存在优化空间. 我们根据模型的特点和以上测试结果, 利用模型提取了无机固态电解质材料数据, 包括离子电导率数据5970条、扩散系数数据387条、迁移势垒数据3094条, 此外还包括1000多条化学、电化学、力学等数据, 涵盖了无机固态电解质所涉及的几乎所有物理、化学、电化学性质, 这也意味着大语言模型对科研的应用已经从辅助科研转向主动促进科研发展阶段. 本文数据集可在中国科学院凝聚态物质科学数据中心查看, 网址https://cmpdc.iphy.ac.cn/literature/SSE.html (DOI: https://doi.org/10.57760/sciencedb.j00213.00172).

    The emergence of large language models has significantly advanced scientific research. Representative models such as ChatGPT and DeepSeek R1 have brought notable changes to the paradigm of scientific research. While these models are general-purpose, they have demonstrated strong generalization capabilities in the field of batteries, especially in solid-state battery research. In this study, we systematically screen 5309268 articles from key journals up to 2024, and accurately extract 124021 papers related to batteries. Additionally, we comprehensively search through 17559750 patent applications and granted patents from the European Patent Office and the United States Patent and Trademark Office up to 2024, identifying 125716 battery-related patents. Utilizing these extensive literature and patents, we conduct numerous experiments to evaluate the structured output capabilities of knowledge base, contextual learning, instruction adherence, and language models. Through multi-dimensional model evaluations and analyses, the following points are found. First, the model exhibits high accuracy in screening literature on inorganic solid-state electrolytes, equivalent to the level of a doctoral student in the relevant field. Based on 10604 data entries, the model demonstrates good recognition capabilities in identifying literature on in-situ polymerization/solidification technology. However, its understanding accuracy for this emerging technology is slightly lower than that for solid-state electrolytes, requiring further fine-tuning to improve accuracy. Second, through testing with 10604 data entries, the model achieves reliable accuracy in extracting inorganic ionic conductivity data. Third, based on solid-state lithium battery patents from four companies in South Korea and Japan over the past 20 years, this model proves effective in analyzing historical patent trends and conducting comparative analyses. Furthermore, the model-generated personalized literature reports based on the latest publications also show high accuracy. Fourth, by utilizing the iterative strategy of the model, we enable DeepSeek to engage in self-reflection thinking, thereby providing more comprehensive responses. The research results indicate that language models possess strong capabilities in content summarization and trend analysis. However, we also observe that the model may occasionally experience issues with numerical hallucinations. Additionally, while processing a large number of battery-related data, there is still room for optimization in engineering applications. According to the characteristics of the model and the above test results, we utilize the DeepSeek V3-0324 model to extract data on inorganic solid electrolyte materials, including 5970 ionic conductivity entries, 387 diffusion coefficient entries, and 3094 migration barrier entries. Additionally, it includes over 1000 data entries related to chemical, electrochemical, and mechanical properties, covering nearly all physical, chemical, and electrochemical properties related to inorganic solid electrolytes. This also means that the application of large language models in scientific research has shifted from auxiliary research to actively promoting its development. The datasets presented in this paper may be available at the website: https://cmpdc.iphy.ac.cn/literature/SSE.html (DOI: https://doi.org/10.57760/sciencedb.j00213.00172).

    目录

    返回文章
    返回
    Baidu
    map