The shortest analytic distance between two pointsRedshift plays a keystone function in the constellation of technologies for data-centric workloads across the Amazon portfolio. It’s an essential ingredient in the expansion of data agility. It supports established technologies for transactional operations and Business Intelligence and extends to streaming data processing, machine learning, and more. Data Warehouses, largely originated by Teradata prior to the era of “Big Data,” aspired to crack the code: make data useful even when it came from different sources and different operational processes. Today, that’s history. We’re long past the time when a data warehouse acted as the final word on reconciling variation across applications. The “single source of truth” (which data warehouses once claimed to be) has since drowned in the data lake. In modern analytics, versatility and utility go hand-in-hand. The MPP philosophy behind Redshift has pushed out the frontiers of scale-out compute parallelization, to ever more sophisticated management of underlying data storage mechanisms. The payoff is in combining AWS elastic compute options with the virtues of S3 data storage and genuine development flexibility. Let’s look at some examples.
- In healthcare, the transactional RDMBS tracks each patient procedure as a step-by-step interaction, each as different from one another as endocrinology and cardiac care. But it’s the same patient. Using Redshift unifies the picture of each patient across these various sources. That means not only a lot more versatility *in* analytics but also more experimentation to identify *which* analytics in the course of ongoing research.
- Data flow originating from multiple third-party sources is often neither predictable nor uniform, much less as well structured as a relational database. The flexibility of columnar storage is well-suited to continuously changing streams such as are common in Adtech, for example. Using Redshift as a central store for all historical information lets it serve as the trends baseline for actionable analytics like the ups and downs of media campaign timing optimization.
The cloud data warehouse: not just a data warehouse on cloudAddressability and proximity for compute are fundamental design considerations in any data architecture. In the evolution from its predecessor shared-nothing parallel data processing architecture to today’s Redshift, AWS has tackled substantial problems that cut time and cost from top to bottom. (Dogfooding at galactic scale on the consumer shopping side of the Amazon house no doubt helped). But the real magic is in the rethinking of the fundamentals:
- Columnar storage across flexible compute clustering delivers far greater I/O optimization. It starts with reduced storage requirements. The ability to shrink and grow columns independently for any particular key, together with improved compression ratios via the cardinality of values, contributes directly to reduced I/O and faster or reads and writes respectively (Caveat: simultaneous read/write operations require a different approach)
- Massive parallelization adapts to different client workload demands with different operational profiles. Modern Business intelligence, for example, is no longer the same analysts hitting the same reports to produce unique Powerpoint slides for different bosses every Mondays. Redshift’s dynamic workload management handles a wide range of BI workloads. It supports a broader range of analytics use cases to flexibly manage priorities within the mix of workloads, so short, fast-running queries won’t get stuck behind complex, long-running queries. No more buying big, expensive appliances just to hit the peak Monday morning chart rush.
- Redshift runs as a managed data service, so you can focus less on administration and more on using the data in your workloads. Like so much of the AWS data infrastructure, the secret sauce is S3. Redshift continuously and automatically backs up your data to S3; it can asynchronously replicate those snapshots to S3 in another region for disaster recovery. (A bonus: Concurrency Scaling lets you point queries right at S3 without loading data first.) It can automatically spin up the VACUUM DELETE operation to reclaim disk space occupied by rows marked for deletion. It auto-monitors cluster health, re-replicates data from failed drives, and replaces nodes as necessary for fault tolerance.
How Bekitzur ElevationData can helpAs our practice leads for in-depth did a workload problem solving, we at ElevationData are focused on constantly improving agility and cost-effectiveness of “Time to First Answer” across the broad variety of data use cases.
- Data Migration: Relational databases are often a victim of their own success. Whether you’re overpaying for a commercial enterprise license or need just to put the database engine where it’s easier to manage, check out our database migration service as an onramp to unlocking the benefits of the Redshift and its ecosystem.
- Data Engineering Operations: Modern data applications accelerate extension to more and more use cases. We can help improve the adoption of your data resources, whether driven by application development, more versatile dashboarding, or the quenching thirst for ML / AI.
- Data Science Pipelines: Data diversity is a problem both of supply and demand; otherwise, it’s hard to make data scientists productive enough to achieve sustainable strategic value. We can help you with design, deployment, and operations of your complete data lifecycle, managing data source operations upstream continuously with the breadth of data consumers, as much for machine learning as human learning.