GaliHealth is developing an advanced personalized healthcare assistant, called Gali, to provide tailored advice to people suffering from chronic disease. It’s a slick mobile app backed by a combination of advanced AI technology and the power to continuously process infinite amounts of chronic disease-related data. Translating these sources and insights to make data relevant and timely to global patient medical communities, and scientific researchers alike can make collaboration faster and more productive. It’s a fortuitous bargain: each of these three communities can benefit from the value of each other’s data, each one as suited to its unique needs

ElevationData built and deployed a coherent data architecture to data across sources to be harvested by machine learning models. Assuring the flow of data to models ensures every single patient always gets the best data to better manage her disease.

The Challenge

Gali uses blockchain to ensure confidentiality, but its most important benefit is reliable data aggregation. In practice, turning the broad and rich mix of data into more effective collaboration requires converging three domains of data: patient-generated health data (PGHD) with data repositories from clinical records (EHRs) and research data. Implementing this an approach to data infrastructure and data processing meant an agile setup suited to continuous change. Key requirements were to:

  • Pull data updates from research and clinical records and integrate with streaming data sources
  • Stage the integrated data from both relational and streaming sources with well-specified APIs
  • Provision well-structured data resources for both interactive analytics and machine learning ingestion
  • Readily integrate new custom data formats and feeds from all three domains
  • Provide for the integration of new applications of the data across all participants: clinical patient management, research organizations, and end-user patients
  • Accommodate changes to streaming data ingestion logic so data scientists can expand the scope of machine learning experimentation as applications change

The focus was to ensure a virtuous cycle between the intake of data from changing sources, and changes in consumption of the data to provide better outcomes for researchers, clinicians, and patients.

The ElevationData Solution

Data Ingest from medical research partners

The recent rapid evolution of standardized electronic health records (EHRs) anticipated the radical improvements in data storage costs, often viewed through the lens of “big data”. It would be a mistake to diminish the value of uniform, well-structured information about each patient’s condition and demographics, across a huge spectrum of clinical processes. Across different institutions, those transactions are read and written in that stalwart workhorse, the relational database.

Research organizations working with clinical data take the same approach. Data shared via the Fast Healthcare Interoperability Resources (FHIR, pronounced “fire”) standard, created by the Health Level Seven International (HL7) health-care standards organization, levels the playing field for data management.

Standards notwithstanding, managing separate databases still imposes a lot of overhead, from backup to admin and all steps in between. Amazon Web Services provide a compelling alternative to the traditional standalone RDBMS without compromising the power of semantic compatibility. AWS-native data services, including Aurora, RDS-Postgres and RDS-MySQL, preserve perfect semantic transparency but eliminate the cost and performance disadvantages stand-alone databases.

In a number of cases, Gali’s research partners maintained multiple instances of data that were functionally identical, albeit with different data. For example, different research trials ran the same business processes for different patients, but the differences between fact tables and schemas were immaterial. For Gali, this presented an excellent opportunity for database consolidation. ElevationData creating a single, more cost-effective Aurora instance that faithfully organized and managed data that originated from different sources.

Using the Amazon Database Migration Service simplified setting up and running a consolidated target instance on Aurora. It let Gali benefit from running many database instances, each from its different independent third parties, at a fraction of the cost of maintaining an instance for each respective partner. And because the AWS Database Migration Service features continuous data replication, Gali’s Aurora instance is always up to date, synchronizing with source data from wherever the 3d party instances run.

Redshift powered data warehouse and data lake

The big change of in the era of big data is that not all data lives in a single database for a single well-bounded business process. In addition to the process of collecting and consolidating data from multiple 3rd party SQL databases running on AWS Aurora, Gali can benefit from many other data formats. These include sources originating with a partnered research institution, as well as data generated by Gali Health apps and end users. The ability to quickly merge this data into queryable data-sets is in a core of Gali Health business.

The possibilities of integrating queries data across multiple streams literally endless. How many people with a given combination of height, weight and age reported a change in health when they gained five pounds during the winter holiday season? How much did they exercise? What should Gali recommend as a change in diet for people who are lactose-intolerant in this situation? Do users respond more consistently to prompts  through the chat interface, or is it more effective to provide scheduled reminders in the Smartphone calendar?

With such a broad landscape of experimentation, the challenge was to create a semantically enriched data warehouse object model. We chose AWS-native Redshift for a number of reasons:

  • Powerful and versatile query interface for fast results for complex queries on any scale
  • Unique combination of cost-effectiveness, scalability, and performance
  • Extensibility to add new sources and grow their data footprint with minimal friction

Extensibility is powerful. For example, streaming data from end users and application logs is stored in S3. Use of Redshift can allow the system to query both data warehouse and S3 objects through one query interface; structured data stored in S3 objects can easily be accessed ad-hoc SQL querying with Amazon Athena.

Another database source implemented by ElevationData was the generating of conformed dimensions from streamed data using StreamSets open source pipelining logic. Differences across data sources were addressed with a continuous process that preserved field and record level compatibility. For example, research partners whose clinical trials have different nomenclature for steps of the process can be aligned with a single uniform field name, creating a global fact table across the range of data sources.

The continuous expansion of data volume and variety, including the creation of brand new data sets, requires continuous data engineering. This data is stored on Amazon S3 in the data lake and available to direct load into downstream systems or interactive exploratory analytics. ElevationData established a process using Amazon EMR to process the data and create new data sets. The continuous flow of new data and new data sets into a unified model powers the powering the Gali Health decision engine and the applications that rely on it.

Machine Learning

New data is not restricted to acquisition and ingestion of external sources. In addition to the wealth of possibilities for query exploration, ElevationData helped the Gali Health engineering team and its research partners constantly bring new data sets and strive to use them to run new experiments. Together, we established a well-managed process for creating, deploying, training and optimizing new Machine Learning models.

The AI at the heart of Gali’s product is known as The Gali Brain. It is made up of three key components:

  • Gali Health and Disease Models, which enable her to understand the context of specific health conditions and support users in their health journey;
  • a Behavioral Model that helps her interact intelligently in response to various events, provide advice or connect with a relevant coaching program or service;
  • and the Deep Learning Layer, which ensures that Gali constantly learns from new users

Each of these three components has its own data source flows and integrations, but the Deep Learning Layer is where the challenge of chronic disease management is most acute. This is what lets Gali gets “smarter” over time, using ML.

By definition, ML is a continuous process. ElevationData used Amazon’s SageMaker platform to give Gali Health a robust process for developing, training and running ML models. First of all, its framework-independent structure means a consistent workflow for the steps needed to train, tune, and deploy various combinations of data and algorithms. Because Amazon SageMaker manages and automates the full range of sophisticated training and tuning techniques. This makes it easier for ML models are constantly trained with changing new clients and research data.

A critical step is that when machine learning is tied to clinical trial data, the recommendations by ML models are reviewed by doctors and scientists before being approved for production use. ElevationData has built this into the workflow for the release of new capabilities and features so that no critical functions are implemented without expert human oversight.

Consumer-facing mobile assistant

Gali Health is creating a new frontier in patient-centered health through its innovative data-driven mobile app. This innovative approach means patients can benefit from the data in new ways without having to become data scientists.

This personalized approached is front and center throughout the platform: onboarding and ongoing Health monitoring interactive chat, reminders and health tips. For example, for individuals with Crohn’s disease, it is essential to understand the stages of the disease and its subtypes, the known treatments and procedures or tests and the types of symptoms that Gali needs to monitor. Gali tracks these essential attributes and manages each user’s experience accordingly.

Building the app with blockchain technology helps make it both decentralized as well as highly secure
Source: Gali Health

Building the app with blockchain technology helps make it both decentralized as well as highly secure. Daily health and lifestyle information combined with medical history and lab and genetic data create an enormous and valuable cloud with billions of data points from each person. Through the exchange of health data for tokens, patients not only get more tools to manage their disease. Unlike the current healthcare economy, there is a balance between providers and consumers of the data. Partners can benefit from access to the community data or the ability to offer additional services to community members with specific health backgrounds.

Continuous Data Integration

The data integration platform that drives Gali’s multiple application engines relies on a complex, integrated distributed infrastructure. The infrastructure relies on a process of continuous integration of changes to the software logic that drives the data processing at every layer of the stack.

ElevationData built Gali Health a software development environment and using the concept of “infrastructure as code.” It applies equally well to the underlying IaaS cloud infrastructure services as it does to data repositories and the data processing logic. All artifacts reside on a single consistent repository infrastructure. Each change to software lives exclusively in the source code, rather than through standard operating procedures and manual processes. It automates the infrastructure deployment process, be it data configuration or machine learning logic, in a repeatable, consistent manner.

The same approach applies to every step of development, deployment, test automation, release management, and production operations. With the rate of change on all the moving parts, the ElevationData DevOps team implemented a CI/CD process, architected using the based on CloudGeometry.io Solution.

The most immediate advantage of the CI/CD is in eliminating delays in coding and testing improvements by the platform development team. With a transparent, predictable release process, developers could readily push new software to manage the ongoing changes to all data and application interfaces.

Another key benefit of the CI/CD process was to eliminate disconnects and test escapes that software test automation can encounter in a system of such complexity. The built-in continuous testing approach is about controllable consistency. This guarantee that the same changes tested on in development are applied in production: the same DB changes, same app changes, in the same order, byte to byte, query by query.

The Benefits

Data engineering and cloud expertise delivered by ElevationData was a key success factor in introducing the revolutionary patient-centered approach if Gali Health and its AI brought to the challenges of chronic disease management, starting with Inflammatory Bowel Disease (IBD) — including Crohn’s and Ulcerative Colitis. Gali was able to realize its vision of converging data from patients, clinicians, and researchers.

  • Deliver on the promise of a virtuous cycle of data that attracted new research institutions, hospitals, domain experts and leading healthcare organizations to collaborate within the Gali Health Ecosystem
  • Leverage the Amazon native data technology stack for data at any volume, velocity, variety, and value,
  • Raise the bar for data leverage by providing a rich, productive dataflow of consistent data for query access in ad-hoc research, as well as ML models that enable both advanced medical data science as well as innovative new applications driven by that data
  • Give people who suffer chronic diseases a new way to improve their health with direct control of their data and a more effective way to collaborate with the professionals who treat them

Alex Ulyanov, CTO

Alex is CTO of ElevationData, a battle-hardened data infrastructure architect, and an AWS Certified Professional Solution Architect. For the past 5 years, he’s led efforts consulting to the company’s top clients, including GE Digital, Zypmedia, Origami Logic and ThinFilm Electronics on data architecture and system performance solutions. He leads an extended team of data engineering and solutions practitioners on across the US and Europe.

View posts by

More from our Tech Blog

Talk to Us

Bekitzur—Amazon Partner Network Consulting Partner

ElevationData is a certified AWS Consulting Partner and expert in legacy systems migrations to AWS.

Free Database Migration

Move your SQL database to AWS RDS with CloudGeometry

Learn more…