Privacy-Aware Clinical NLP with Differentially Private Fine-Tuning of Large Language Models
DOI:
https://doi.org/10.71465/Keywords:
Differential Privacy, Clinical NLP, Large Language Models, Parameter-Efficient Fine-TuningAbstract
The integration of Large Language Models (LLMs) into clinical workflows promises to revolutionize medical informatics by automating tasks such as clinical note summarization, diagnostic coding, and patient triage. However, the deployment of these models is severely constrained by the sensitivity of clinical data and stringent regulatory frameworks regarding Protected Health Information (PHI). Standard de-identification techniques often fail to prevent memorization of training data, leaving models vulnerable to membership inference and reconstruction attacks. This paper presents a comprehensive framework for Privacy-Aware Clinical NLP, utilizing Differentially Private Fine-Tuning (DP-FT) on transformer-based architectures. We propose a hybrid approach that integrates Low-Rank Adaptation (LoRA) with Differentially Private Stochastic Gradient Descent (DP-SGD) to mitigate the computational overhead and utility degradation typically associated with private training. By injecting calibrated Gaussian noise into the gradient updates of low-rank adapters while keeping the pre-trained backbone frozen, we achieve a rigorous privacy guarantee without catastrophic forgetting. Our experimental results on the MIMIC-III and MIMIC-IV datasets demonstrate that our method retains high clinical utility in Named Entity Recognition (NER) and summarization tasks while satisfying strict differential privacy budgets (ε<3). This work bridges the gap between theoretical privacy guarantees and practical clinical utility, offering a viable path for the secure deployment of LLMs in healthcare environments.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.