Sensitivity Evaluation of Healthcare Data Using Biomedical NLP Using Weighted Features Analysis

Brajesh Chaturvedi

PDF

Published: May 19, 2025

Keywords:

Natural language processing; transfer learning; Biomedical NLP; BioALBERT.

Brajesh Chaturvedi, Harish Patidar

Abstract

This paper demonstrates the use of transfer learning in biomedical NLP to identify sensitive data in electronic healthcare records. This research aims to improve the efficiency of multiclass classification of biological texts for sensitivity evaluation by combining two distinct feature representation methodologies. Multiple statistical weighting techniques, including as class probability (CP), inverse document frequency (IDF), and term frequency (TF), were considered for use with each component of the WE vectors in an effort to unify the two feature representations. Application of transfer learning is biomedical NLP opens up a great opportunity to exploit a lot of insights from the electronic medical records (EMR). BioALBERT, a variant of A Lite Bidirectional Encoder Representations from Transformers, was used in this investigation (ALBERT). It was taught with medical and biological databases. To classify all possible actions on the feature vector combinations we looked at, we developed a BioALBERT-based multiclass classification model. Experimental testing backs up the findings of the theoretical study of the proposed system. This research analyzed the usefulness and practicability of the proposed task using the MIMIC-III database. The MIMIC III and the PubMed dataset were utilized to construct the linguistic model. Our deep neural network model and other cutting-edge ML methods were used to test the efficacy of our weighted feature representation strategies for multiclass classification.

Issue

Vol. 46 No. 1 (2025): May 2025

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details