Gross Report Classification

on August 4, 2020

We have been working with BERT[1], a natural language processing (NLP) AI model Google released a few years ago. BERT can be used for a number of NLP tasks, including multi-label classification. I have been using BERT to determine the tissue type (part of determining case complexity for scheduling) from the gross pathology report, and I am getting good results.

We are able to achieve 95% accuracy on 38 labels using just 200 reports per label. People are starting to augment (training with unclassified data) BERT models with clinical text[2], we have done with surgical pathology reports, which in my case seems to improve results 2-5%. It would be reasonable for us to train models on specific types of reports from other areas (radiology, neurology, etc.), which might further improve accuracy. This is not fine-tuning training, but rather changing the structure transformer model to better work with the domain vocabulary. Hypothetically (and my limited experience), a Pathology-BERT performs better on Pathology fine-tuned models, than a base BERT model.

[1] https://en.wikipedia.org/wiki/BERT_(language_model)

[2] https://www.aclweb.org/anthology/W19-1909.pdf

Categories:

NLP Pathology

Tags:

project

contact