Twentieth Meeting of the Yale NLP/LLM Interest Group

Speaker: Anran Li, PhD, Postdoctoral Fellow in Biomedical Informatics and Data Science at Yale University

Title of Talk: Exploring Memorization in Medical Large Language Models

When: Wednesday, October 31, 4:30pm-5:30pm

Location: 100 College Street, 11th Floor, Workshop 1167

Recording Link: https://www.youtube.com/watch?v=ezm5GTQNDAY

Speaker bio:

Medical large language models (MedLLMs) have demonstrated advanced capabilities across various medical tasks, e.g., medical question answering and text summarization. Despite the remarkable advancements, MedLLMs are confronted with several privacy and copyright challenges. For example, they memorize and regenerate large amounts of segments of their training data instances at inference time, which potentially leak patients' protected health information (PHI) to third parties. These concerns emphasize the importance of exploring the capability of MedLLMs to memorize their training data, especially given the fact that the medical training data are much sensitive and is copyrighted. In this project, we demonstrate that MedLLMs memorize and leak individual training samples.We propose an effective method for extracting memorized contents and conducted a comprehensive evaluation of memorization of medical LLMs. We aim to achieve the following objectives in this project. First, we aim to systematically investigate and quantify the memorization in large medical models, highlighting the potential risks of memorization in medical models. Second, we aim to analyze factors that influence memorization of MedLLMs. Third, we aim to explore whether we can identify copyrighted data because of memorization. Finally, we aim to mitigate memorization issues in the model development and release process.

Get Involved!

We invite all members to actively participate in the activities of the Yale NLP/LLM Interest Group. Whether you're a seasoned NLP practitioner or just starting to explore the field, there's a place for you in our community. Stay tuned for updates on upcoming events and initiatives! Join our mailing list to stay informed about future meetings and events.