General/
LLM
EquityMedQA dataset for evaluationg harm and biases in LLMs
A collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries.
Related publication: A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models. Stephen R. Pfohl, Heather Cole-Lewis, Ivor Horn, Karan Singhal ,et al. arXiv:2403.12025v1 [cs.CY]