Data Warehousing Business Intelligence Predictive Analytics
Today, the performance and scale of nearly every data warehouse in production is throttled — by its underlying storage infrastructure
The traditional work of data warehousing and business intelligence teams is difficult enough. Supporting the rapid growth of mainstream and mobile business intelligence, enabling hundreds of simultaneous BI users, asking complex business questions dozens of times a day, while also meeting the organization’s demand for more frequent data warehouse refresh cycles (typically 3-5 times per day) makes the decision about the storage layer of your data warehousing environments a critical architectural and operational choice. Today, the ability to add more users, more data and more BI functionality to a production data warehouse is usually limited by the performance of the underlying storage: something no database administrator, ultimately, can tune their way around.
And new demands on data warehouses — the addition of semi-structured textual data, real-time streaming data sets, IO-intensive predictive algorithms and statistical and non-statistical in-database analytics — bring conventional enterprise storage and upstart hybrid arrays to their knees, compromising the entire data warehousing environment, and damaging the credibility of the IT organization with data-hungry end-users.
Modern data warehousing, business intelligence and predictive analytics require scalable, high-performance storage engines that are designed for the unique requirements of ultra high-volume query processing, and the particular methods that conventional relational database management systems (RDBMS) and text processing environments use to process queries against large volumes of structured and semi-structured data. High-performance query-intensive computing requires:
- the right mix of flash storage and fast-read/fast-write hard disk drives
- automatic tuning of the storage performance for the actual query demand present at any moment in time, regardless of the limitations and gaps in the DBMS indexing strategies
- built-in support, without manual tuning and intervention, for the simultaneous mixed workloads that characterize modern data warehouses: insert- and update-oriented extraction, transformation and loading (ETL) workloads, read/write-intensive query-oriented workloads, and the complex high-throughput “I/O storms” created by algorithmically-intensive statistical and predictive analyses.
Conventional enterprise storage arrays, and integration-based hybrid storage arrays that depend on commodity hard drives for their capacity, cannot meet the rapidly-escalating demands for performance, throughput and capacity in modern data warehousing environments.
And those demands cannot be met economically by exotic alternatives like pure-SSD storage and in-memory database environments, both of which are responses to the failure of conventional enterprise storage arrays, rather than appropriate solutions to otherwise-unsolvable problems. Throwing millions of dollars in exotic hardware at a storage problem simply doesn’t make economic sense when cost-effective alternatives exist, and are proven to work as well or better than the exotic alternatives.
X-IO’s Intelligent Storage Element (ISE) is ideally suited for both conventional data warehousing and business intelligence environments, as well as leading-edge predictive and statistical analysis applications. The ISE’s Continuous Adaptive Data Placement (CADP) engine ensures, at every moment, that all in-demand BI data sets are resident in super-fast enterprise-grade solid state storage, while simultaneously providing the industry’s best serialized-read storage performance, for the table scans that, inevitably, appear in even the most carefully-tuned data warehouse workload. In X-IO environments, DBAs spend time adding new capabilities to the data warehousing environment, rather than attempting — often, with mixed success — to tune their way around intractable storage problems, and storage administrators spend their time working on other problems, rather than supporting frustrated DBAs.
The ISE’s patented self-healing storage features mean that data warehouses are never subject to unplanned downtime due to drive failures or data loss.
The ISE’s active-active mirroring capability allows organizations to run fault-tolerant data warehouse clusters: mirrored pairs of ISEs ensure that the data warehouses’s underlying data set is always available.