MedMCQA

MedMCQA数据集是一个大规模医学多选题库,专为真实世界医学入学考试设计,汇总了来自印度全印医学科学研究所(AIIMS)和国家资格暨入学考试(NEET PG)的超过194,000道精选试题。这些题目不仅涵盖2,400多个医疗主题和21个医学专业领域,而且平均长度精确控制在12.77个词元(token),充分体现了医学理论与临床知识的深度,以及数据集对质量和专业多样性的严格要求。

xianweichengxiang
可视化图片
MedMCQA_0.png
MedMCQA_0.png
MedMCQA_1.webp
MedMCQA_1.webp
MedMCQA_2.webp
MedMCQA_2.webp
数据集元信息
模态other
任务类型other
解剖结构全身
解剖区域全身
数据量193,155
文件格式.json
文件结构
MedMCQA
│
└── data
    ├── dev.json
    ├── test.json
    └── train.json
图像尺寸统计
统计类型 间距 (mm) 尺寸
最小值 - -
中位值 - -
最大值 - -
引用
@InProceedings{pmlr-v174-pal22a,
  title = {MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering},
  author = {Pal, Ankit and Umapathi, Logesh Kumar and Sankarasubbu, Malaikannan},
  booktitle = {Proceedings of the Conference on Health, Inference, and Learning},
  pages = {248--260},
  year = {2022},
  editor = {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan},
  volume = {174},
  series = {Proceedings of Machine Learning Research},
  month = {07--08 Apr},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v174/pal22a/pal22a.pdf},
  url = {https://proceedings.mlr.press/v174/pal22a.html},
  abstract = {This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.}
}
来源信息

官方网站:
访问官网

下载链接:

登录后下载
需要登录并获得知识星球权限

相关论文:
查看论文

发布日期: 2022-03

统计信息

创建时间: 2025-09-10 10:21

更新时间: 2025-09-12 17:57