CLASSify is tool developed by the Center for Applied Artificial Intelligence to allow researchers access to a powerful, automated machine learning platform for tabular data. CLASSify takes in your rows and columns, and with minimal configuring, you’re able to train a multitude of models on your data, and even create a variety of useful visualizations of the quality of your data, generate synthetic data for class balancing, and much more.
REDCap data is exportable as a CSV or Excel format, which means it is a perfect candidate for use with CLASSify. The present workflow would require a user to generate a data export, manually select and change a column to match required formatting for CLASSify, and finally upload it at the CLASSify website. With the development of the CLASSify Connect External Module, you’re able to add the module to a project, perform your configuration for the dataset in REDCap, and then immediately upload the data to the CLASSify system. From there, you’re able to go to CLASSify’s interface and select your dataset, configure which models you want to run, and which features you want to engage.
CLASSify Connect is not currently ready for wide publication to the REDCap External Modules Repository as it is still undergoing heavy testing. However, if you have interest in testing the module, you can submit a request to use it by filling out our Collaboration Request Form.
(Note: The following is subject to change as the CLASSify Connect module for REDCap is still in active development.)
CLASSify Connect Requirements
- The REDCap administrator for your institution must enable the plugin for use on your REDCap instance. At present, the plugin has not been published to the REDCap External Module Repository as it is in a beta state and is undergoing changes and testing.
- You must have access to a CLASSify account. These are granted on an application basis. To apply for access you should submit CAAI’s Collaboration Request Form for CLASSify access.
Walkthrough
The module must be enabled for REDCap by your system administrator. At present, it is only available on our development instance as it is being tested before submitting for approval. Once the module is enabled for your institution’s REDCap instance, it can be added to your individual project in the external modules tab at the bottom left of the main dashboard. Selecting manage as seen in the image below will take you to the modules page.
From here, you’ll select “Enable a module” which allows you to import any of the modules enabled for your instance of REDCap to your project.
A window should pop up showing modules that are available to you. If it is available, you should select CLASSify Connect. There should only be one version to select from, but in the event there are multiple, you should select the highest version number.
After selecting enable on the module, the popup window should close automatically. From here you should see that CLASSify Connect is on your Project Module Manager page.
Select the Configure option to open the window where you will configure settings to allow you to connect to CLASSify. Note that you MUST save the settings before you hit the Upload to CLASSify button or the process WILL fail. Below you will see an example configuration of the module’s settings.
The CLASSify Email field will check the CLASSify servers to get your account ID in order to ensure that your data is accessible to a valid user once it is uploaded. If you want to check if you have an account, you may use the “Check CLASSify Account Status” button.
CLASSify uses a column titled “class” as the classifier when building models. The classifier field here is a dropdown of fields in your project. Selecting a field will change that column’s name to “class” before uploading. This does NOT affect your data in REDCap.
The filename field allows you to provide a custom name for the data to be uploaded to CLASSify under. Keep in mind that you should change the name on subsequent uploads or the upload will fail.
A sample completion of the configuration may look like the image below.
Now you can hit the “save” button which will close the pop up and return you to the modules page. Open the configuration menu for CLASSify Connect again and hit the “Upload Form Data to CLASSify” button. At present, it will take all data in a project indiscriminately. This means that projects with multiple surveys will have their survey data aggregated into a single dataset. We intend to refine this process with subsequent versions.
Once you have hit the upload button, you will see another popup that follows. This one allows you to exclude fields from the dataset uploaded. We recommend excluding fields like form completion status, record id, and other fields that may impact the model without having direct relation to the data. Additionally, CLASSify is currently not HIPAA secure, so you should not use data with PHI.
This page also offers a number of selections for the type of data that each header represents. These are automatically determined, but you should check each one to ensure that your dataset will be treated the way you expect. Note that categorical columns will be one-hot encoded which will result in a column being created for each individual option.
Once you have completed the configuration of your dataset, you may select the “Upload Dataset” button. After a moment, you should see a “View Uploaded Data” button appear. This will take you to the CLASSify website and allow you to select the remaining options for which models you would like to run or which additional features you would like to apply to your dataset.
More information on CLASSify can be found here.