Mission
Patient data generated from different clinical departments across health systems is often heavily siloed in different repositories and hard to access for research purposes. To address this challenge, HPI·MS and its partner Data4Life are developing AI Ready Mount Sinai (AIR·MS). AIR·MS is a cloud-based, multi-modal health data platform that integrates this data.
By creating an innovative environment in which highly skilled data scientists can access clinical data such as EHR, imaging, -omics, and sensor data, AIR·MS is facilitating easy access to a revolutionary unified data source that will accelerate the advancement of health-care-driven, AI-based solutions.
This platform is equipped with state-of-the-art computational frameworks and architectures to enable rapid scientific discovery alongside translational applications into real-world clinical settings. Facilitating multi-modal data access within a large academic medical institution such as Mount Sinai is an enormous challenge but promises to greatly accelerate the application of clinical data in developing the next generation of health care systems.
Vision
Through our work in building AIR·MS, we are filling an urgent need within the Mount Sinai ecosystem. AIR·MS will serve as a vital resource for researchers, providing seamless access to clinical data that can yield pathbreaking insights underlying our goal of improved patient care. Ultimately, AIR·MS will accelerate the use of real-world clinical data as we aim to tailor medicine for individuals across a spectrum of disease areas.
Background
Gaining knowledge and actionable insights from complex, high-dimensional, and heterogeneous biomedical data remains a key challenge in transforming medicine. Innovations in health care and medical discovery have emerged in modern biomedical research through the utilization of unique data sources such as EHRs, biomedical images, and clinical text notes. Large health care systems, such as Mount Sinai, generate these diverse data independently throughout various departments and often through very different methodologies. Data captured in electronic health records may be recorded during patient-doctor interactions within outpatient clinics, or while in-patient at the eight hospitals across the health system. At Mount Sinai, these data are collected and managed for research purposes by a non-clinical entity, the Mount Sinai Data Warehouse (MSDW). Another example of a unique research data source is the BioMe™ biobank–managed by The Charles Bronfman Institute for Personalized Medicine — which enrolls participants during outpatient clinical visits and collects genetic data for research purposes.
In the current state, securing access to disconnected datasets poses a significant challenge as researchers are required to interact with different stakeholders to obtain and unify the data they need. This can be a complicated endeavor and is often time- and effort-intensive, adding further to significant delays of innovative research. If data can be linked, it is often an additional struggle for researchers to identify suitable storage and architecture platforms for analyses in light of patient privacy concerns and complexities of ethical storage of patient data. Moreover, while traditional research in medicine is defined by the preemptive formulation of a hypothesis to test, the advent of big data has led to a shift where data itself is used to formulate research hypotheses to be investigated. Consequently, data scientists often require large amounts of data without a specific research question, which creates regulatory issues that often lead to additional delays and challenges. Given that each unique data source at Mount Sinai is siloed and there is no standard method of access, researchers often generate bespoke datasets contingent on specific study requests, which yield varying levels of completeness. Analyzing fractured datasets can limit the reproducibility of models, and reproducibility is imperative for the development of robust AI applications and for maintaining integrity in research.
Objectives
The AIR·MS initiative is dedicated to improving access to the different modalities of available data scattered across the health system. Partnering with the Mount Sinai Data Warehouse (MSDW), the Mount Sinai Data Imaging Research Warehouse (IRW), the Mount Sinai BioMe™ biobank, the Digital Discovery Program, and the software development collaborator Data4Life gGmbH, we will develop, maintain, and operate AIR·MS to enable unprecedented data access and analytics capabilities to data science researchers throughout Mount Sinai. To serve this strategy, AIR·MS aims to:
Unify the disjointed data within Mount Sinai to create a single point of access where users can explore the linked database without having to interact with different data stakeholders.
Allow researchers and data scientists to use to explore, query, and visualize multimodal data, create cohorts of patients, and evaluate predictive models and AI techniques for personalized medicine and clinical decision support.
Leverage advanced data analytics tools and high-performance in-memory database technologies such as SAP HANA to create a computationally efficient framework that promises to be more efficient than any system currently offered by Mount Sinai.
Our initial objective is to launch and validate the AIR·MS platform for affiliates of HPI·MS. Expansion to a broader research community will be evaluate following the release of the AIR·MS MVP. Our ultimate goal is to provide and unparalleled data science resource for all researchers at Mount Sinai.