Part 2 – Data Warehouse Automation is the next logical step for stability and growth

https://omnidata.com/about#teamThis article on Covid-19 Data Warehouse Automation was written by Douglas Textor with Susan Pessemier.

In Part One, you read about a rapid prototype built in Washington State for rapid response to COVID-19. Government officials, right up to Governor Inslee, use this information to make agile decisions and to inform the public. The dashboards are available to the general public here. The dashboards update at varying intervals, as the data becomes available. Of note, the Data Architecture is ad hoc, a result of rapid response. This article make the case that Data Warehouse Automation is the next logical deployment.

Washington, with The HealthTech Community Response Council, has produced nothing short of an Information Technology miracle with their current prototype. It is “best in class” as far as state and even national governments go. However, if Washington continues down an ad hoc path, their best in class response may slow dramatically. The results could slow due to a lack of supportability and scalability. The recommendation is to deploy Data Warehouse Automation (DWA). DWA is a required next step to stabilize and grow the response platform.

Current State COVID-19 Response – Before Data Warehouse Automation

Illustration of Data Architecture, including challenges in current Washington State COVID-19 information system prototype.
Current COVID-19 Data Response Environment in Washington State. Best of breed, but lots of potential points of failure going forward.

After Data Warehouse Automation

Data Warehouse Automation applied to Washington State pandemic information system.
A DWA Tool addresses the needs for speed, reliability, scalability, supportability and simplicity. Nothing could highlight the usefulness better than a public crisis.

I have made a choice to keep this simple. If I had ranged beyond my own subject, the scope of this article would have been too broad. Consequently, it would have been far too technical to be helpful, traversing subjects like epidemiology, data science, and massive logistics. So, the subject is only the basics of information technology development and adoption. That means it involves people, data, systems, and new developments. The goals are speed, reliability, agility, and simplicity.

Here are some deep dive links to the experts in the related areas. The needs brought on by a crisis change daily. Later, in the aftermath, new requirements arise with the same regularity. For a deep dive into the stages of crisis management, read here, from the The Tohoku Journal of Experimental Medicine. This picture, from the article, summarizes the complexity and need for information and speed in disaster recovery.

Framework for phases of Health Crisis Management
Framework by Frederick M. Burkle Jr., from The Tohoku Journal of Experimental Medicine

Need for Speed

The virus is changing and our understanding of how to deal with it changes daily, too. Mistakes cost a lot of lives. There isn’t much more to say on the subject of speed.

Unstructured Data

Unstructured data includes any data stored in a computer for one purpose that needs to be applied to a new purpose. It also includes people trying to gather data into scrawled lists and spreadsheets. In order to prepare the data for consumption, typically a patchwork of scripts (code) emerge to structure, organize, and translate the data for a new purpose. All the while the data is making its way to a data warehouse that allows the dashboards to actually work. The first graphic above, “before” illustrates the current state of the unstructured COVID-19 response data.

My first impulse was to map all of the data sources in the state that flow into the state dashboards, using the well documented descriptions on the state web site that lend credibility to the graphs. This is a normal exercise when stepping into a Data Warehouse Automation project. Had I continued, it may have taken me days to simply get a draft, with no verification and certainly no real details. Instead, I counted over twenty five different sources of the data when I surveyed the Washington State site dashboards. Each source has varying levels of speed, availability, human intervention, staffing and thus reliability.

Partial list of Pandemic Response Data Sources

  1. All Hospitals – New Covid Hospitalizations, Deaths, Releases, Status at release, etc.
  2. Local Health Jurisdiction Websites – The data can possibly be automatically “scraped”, but is more likely currently read and manually input by a person.
  3. Tracked Illnesses – This data is very incomplete, as testing has been slow in the U.S. to get to appropriate levels. However, as the data becomes more complete, this measure will become useful to help those who may have been exposed.
  4. Demographics – Requires the basic data becomes richer and more enhanced, tagged with age, sex, other medical complications, etc.

The list goes on, and the more the list goes on, the greater the need for a Data Warehouse Automation tool, assuming the intention is to provide decision makers and the public with better information.

Re-purposed data

An example of re-purposed data might be hospital admissions. Each hospital may keep admissions data for everything from tracking patient progress and movement, to providing insurance information. So, yes, the data exists in a system. New demands require this data be provided twice a day to the state system. There is suddenly a different purpose for the data. Consequently, it may not be as easy to provide the information in a form that the pandemic systems can easily consume. This means very busy people suddenly need to shift gears and figure out how to provide the data. Multiply that problem by each of the 115 hospitals in the state and it is easy to understand the complications that arise exponentially on this one example.

Why Data Warehouse Automation?

Here is an explanation of Data Warehouse Automation for lay people. Under ordinary circumstances, unstructured data exists all over. “Unstructured Data” is defined as people trying to gather data into scrawled lists and spreadsheets, any data stored in a computer that is meant for one thing and now needs to be applied to a new purpose.

In order to prepare the data for consumption, typically a patchwork of scripts (code) emerge to structure, organize, and translate the data for a new purpose. These “scripts” are the product of highly skilled and expensive computer programmers. As each computer script is written, the benefit is that the data gets to where it needs to go. However, with a proliferation of computer scripts, a high cost for maintaining the scripts emerges. If an expensive script “breaks” at a critical time when a decision-maker needs the information, the costs can be high. In addition, the time and money required to hunt down the faulty script raises maintenance costs exponentially unless the computer scripts are effectively organized. Data Warehouse Automation tools organize all of these computer scripts in one place, improving the ability to maintain and expand on the information solution.

Expansion of the issues that DWA addresses from the first graphic above

Data Warehouse Automation dramatically improves insights

All the while the data is making its way to a data warehouse that allows the dashboards on the right of the diagram to actually work. See the second graphic above in this article for a picture of DWA applied to Washington’s systems. Note in the second graphic, all of the patchwork of scripts are replaced by the DWA tool. On the left, the available data is still chaotic. As we move to the right in the graphic, the organized data warehouse is much fuller than in the first graphic. The availability of information, represented by the dashboards on the right has increased dramatically. The entire ability to respond to the pandemic with information has improved on the dimensions of speed, reliability, scalability, supportability and simplicity.

For a deeper dive into the technical aspects of Data Warehouse Automation, Eckersen consulting offered this last year. At OmniData, we like TimeXtender and chose them as our software provider of choice in DWA. Their software does not break the bank, it has been in the market for years and it simplifies the complex. TimeXtender expresses their value in simple terms: Position yourself to save 10X in time to deploy your data warehouse. I would also highlight the on-going positives for operational maintenance and support. It is a terrific tool to prepare the data for display using Microsoft’s Power BI analytics and visualization platform.

Next steps for Washington and other states

For Washington, the obvious next step is to adopt a Data Warehouse Automation tool and keep going. The leadership in Washington defined their key information requirement very well. The information they have fuels decisions and helps communicate to the public at large. With the help of the private companies of The HealthTech Community Response Council, Washington has created a best in class Pandemic Response Platform. In conclusion, a DWA will make the system, faster, more reliable, supportable, scalable, and agile.

For other states, these articles will provide a foundation to copy Washington. Because, a pandemic is no time for NIH (not invented here) behavior. The results in Washington State will help those who are risk averse in Information Technology believe that they too can achieve the same level of success. Where the goal is to build better emergency response systems, the need is clear. Washington’s example is inspiring and will be a catalyst for others’ success.

About OmniData

OmniData provides products and services at every phase of the data lifecycle. When you need us, we are passionate about your success. We mine your hidden data assets and we will accelerate your time to data insights 10X.

Microsoft Gold Data Partners