We have been working with BERT[1],  a natural language processing (NLP) AI model Google released a few years ago.  BERT can be used for a number of NLP tasks, including multi-label classification.  I have been using BERT to determine the tissue type (part of determining case complexity for scheduling) from the gross pathology report, and I am getting good results.  

We are able to achieve 95% accuracy on 38 labels using just 200 reports per label.  People are starting to augment (training with unclassified data) BERT models with clinical text[2], we have done with surgical pathology reports, which in my case seems to improve results 2-5%.  It would be reasonable for us to train models on specific types of reports from other areas (radiology, neurology, etc.), which might further improve accuracy.  This is not fine-tuning training, but rather changing the structure transformer model to better work with the domain vocabulary.  Hypothetically (and my limited experience), a Pathology-BERT  performs better on Pathology fine-tuned models, than a base BERT model.

[1] https://en.wikipedia.org/wiki/BERT_(language_model)

[2] https://www.aclweb.org/anthology/W19-1909.pdf



No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *