Advances of AI in medical domains are driven by large volumes of multi-modal data, often derived from clinical pipelines from different institutions. When establishingmulti-institutional data repositories one approach is to centralize the collection of data within a single institution or cloud provider, which simplifies operations, but can be limited by computational capacity or,in the case of public cloud providers, financial considerations.An alternative is to establish adistributed system where data is maintained by host institutions but made available as part of a global repository.However, especially in the medical domain, stringent cyber security measuresdesigned to protect data and maintain service reliabilitycanrestrict the ability of participating systems to communicate across distributed repositories.In support ofthe Federated digital pathology platform for AD/ADRD research and diagnostics[NIHproject1U24NS133945-01]we are developing distributed repositories that allow for data access, localized processing, and federated training across all participating sites.The data repositories store case information, image metadata, pixel-level image annotations, and whole slide imaging (WSI).Users might need access to the entire slide (transfer) or some small part of the slide (localized patch extraction).A custom S3-compatable server was developed, which provides a global namespace of filespublished from distributed participants.A central clearing house is used to track the location of assets across sites and is used to direct programmatic access to resources across the system.As part of the ongoing development process, we leverage the FABRIC Testbed to deploy resources across the US, allowing us to perform at-scale testingwith real-world infrastructure.Common problems such as network latency, firewalls, and protocol translation aretested as part of the FABRIC infrastructure. Using computational agents we establishsecure mesh communications between sites,even where direct communications due to protocol,firewall, or other restrictions are not possible. The described computational network allows forunified data and resource access across sites. The project described was supported by the NIH National Institute of Neurological Disorders and Stroke through grant 1U24NS133945-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Resources
On March 12, 2025, Vaiden Logan and Mitchell Klusty presented their work on creating a unified slide repository across distributed fabric nodes that would generally be prevented by firewall rules and network restrictions. Trusted nodes can be registered in the network, allowing slides to be shared across nodes. Check out the poster here: