A Web-based Tool for Machine Learning

Clinicians often produce large amounts of data, from patient metrics to drug component analysis. Classical statistical analysis can provide a peek into data interactions, but in many cases, machine learning can provide additional insight into new features. Recently, with the boom of new artificial intelligence models, these clinicians are more interested in applying machine learning to their data. However, in many cases, they may not possess the necessary knowledge and skills to effectively train and infer a model. Fortunately, using ML techniques and a user-friendly web interface, we can provide these clinicians with a way to automatically train tabular data on many different machine learning models to find which produces the best results. Therefore, we present CLASSify as a way for clinicians to bridge the gap to artificial intelligence.

Even with a web interface and clear results and visualizations for each model, it can be difficult to interpret how a model achieved its results or what it could mean for the data itself. Therefore, this interface can also provide explainability scores for each feature that indicates its contribution to the model’s predictions. With this, users can see exactly how each column of the data affects the model and could gain new insights into the data itself.

Finally, CLASSify also provides tools for synthetic data generation. Clinical datasets frequently have imbalanced class labels or protected information that necessitates the use of synthetically-generated data that follows the same patterns and trends as real data. With this interface, users can generate entirely new datasets, bolster existing data with synthetic examples to balance class labels, or fill missing values with appropriate data.

CLASSify Specifics

CLASSify currently provides ten unique machine learning models to train and evaluate (random forest, gradient boosting, histogram-based gradient boosting, XGBoost, bagging, logistic regression, SGD classifier, K-nearest neighbors, multi-layer perceptron, TabPFN), four of which support multiclass classification (random forest, logistic regression, K-nearest neighbors, multi-layer perceptron). Each of these models has customizable parameters that you can modify when submitting a job, leave as defaults, or perform parameter tuning to automatically determine the most optimal parameter combinations for each model.

A variety of additional tools and programs are used in CLASSify, such as ClearML for job queueing, Optuna for parameter tuning, and S3 for secure storage. All training and evaluation is run on our DGX cluster, providing quick and efficient processing. The Synthetic Data Vault (SDV) library provides the models used for synthetic data generation. Explainability scores are calculated using the SHAP algorithm to identify feature importances for each model.

CLASSify is not HIPAA compliant, but private, HIPAA compliant instances can be created on request. Please reach out to ai@uky.edu to learn more.

Accessing CLASSify

CLASSify is available on an individual basis on CAAI’s self-service tool website. Before you can get started, you must be granted the necessary permissions from a CAAI Administrator. Please contact us for access or submit our collaboration intake form here

User Guide Documentation

The User Guide provides a detailed overview of the systems capabilities and functions, check it out here.

Collaborative Projects using CLASSify

Below are just a few examples of the types of projects that CLASSify has been used with by a variety of researchers.

– Hepatitis A diagnosis using a variety of demographic and medical data
– Predicting adherence to a remote alcohol monitoring program
– Identifying key predictors of adipose miR-1 levels after exercise
– Word domain clustering using LLM generated vector embeddings
– Classifying osteoporosis/osteopenia with patient data and measurements


Instructional Video

Below is a ~10 minute tutorial video walking through the process of using CLASSify.


Citation

A paper detailing the development and usage of this tool was submitted and accepted to the American Medical Informatics Association (AMIA) in 2023. This paper can be found here: https://arxiv.org/abs/2310.03618