Summary

Advances of AI in medical domains are driven by large volumes of multi-modal data, often derived from clinical pipelines from different institutions. When establishing multi-institutional data repositories one approach is to centralize the collection of data within a single institution or cloud provider, which simplifies operations, but can be limited by computational capacity or, in the case of public cloud providers, financial considerations.  An alternative is to establish a distributed system where data is maintained by host institutions but made available as part of a global repository.  However, especially in the medical domain, stringent cyber security measures designed to protect data and maintain service reliability can restrict the ability of participating systems to communicate across distributed repositories.  In support of the Federated digital pathology platform for AD/ADRD research and diagnostics [NIH project 1U24NS133945-01] we are developing distributed repositories that allow for data access, localized processing, and federated training across all participating sites. The data repositories store case information, image metadata, pixel-level image annotations, and whole slide imaging (WSI).  Users might need access to the entire slide (transfer) or some small part of the slide (localized patch extraction).  A custom S3-compatable server was developed, which provides a global namespace of files published from distributed participants.  A central clearing house is used to track the location of assets across sites and is used to direct programmatic access to resources across the system.  As part of the ongoing development process, we leverage the FABRIC Testbed to deploy resources across the US, allowing us to perform at-scale testing with real-world infrastructure.  Common problems such as network latency, firewalls, and protocol translation are tested as part of the FABRIC infrastructureUsing computational agents we establish secure mesh communications between sites, even where direct communications due to protocol, firewall, or other restrictions are not possibleThe described computational network allows for unified data and resource access across sites. The project described was supported by the NIH National Institute of Neurological Disorders and Stroke through grant 1U24NS133945-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. 

Resources

On March 12, 2025, Vaiden Logan and Mitchell Klusty presented their work on creating a unified slide repository across distributed fabric nodes that would generally be prevented by firewall rules and network restrictions. Trusted nodes can be registered in the network, allowing slides to be shared across nodes. Check out the poster here:

Tags: