Curated on 08th March 2012
The Data Management Landscape
Sumedha opened his presentation by noting that each enterprise application comes with unique attributes and behaviors including its lifecycle, owner and access pattern, as well as whether the data is structured, not structured, or semi-structured. To appropriately handle these attributes, businesses must understand the requirements for storing the different types of data and then choose the appropriate form of data management.
To date, relational databases have been the de facto standard for managing data, Sumedha said; "They allow us to store data, read data, search for data and transactions, and support moderate scaling so that a database can withstand a certain type of load."
However, enterprises also use a range of alternative ways to manage data—spreadsheets, message queues, registries, file systems, and caches—each of which of offers some trade-offs.
More recently enterprises have turned to a new class of NoSQL database management systems. Because NoSQL systems are designed to support heavy read/write workloads, they can more effectively manage the large volumes of data that are generated when applications move into the cloud. Among NoSQL database approaches is the key-value store, which is supported by systems, such as the open source Apache Cassandra.
NoSQL systems offer technical advantages for managing data in the cloud. However, unlike SQL databases for which there are plenty of resources to support technical issues, NoSQL is a technology for which there is virtually no expertise, Sumedha cautioned. IT professionals who adopt a NoSQL database will need to become specialists in their systems, as well as with the systems’ processes, he advised.
Evaluating Data Management Options
Given the different data models available, IT professionals need to address several questions when deciding on an appropriate data management option for an application.
“The data type an application will deal with and the application's structure—if it even has structure or is semi-structured—these are the very first things that you need to understand, before starting an implementation,” Sumedha said.
For example, Sumedha noted, “If your application is transaction-intensive, small in scale, and needs joints between data elements, use a relational database." Conversely, he said, "If you have to deal with unstructured data, use key-value storage or use a column family type of storage. And if your application has the potential of scaling, and many reads and rights, relational databases don't support this after some point. The relational database fails or performs really slowly."
Sumedha also stressed the importance of examining a given data storage system as a long-term solution. For instance, architects and developers need to consider both vertical and horizontal scaling capabilities as options for maximizing scalability and addressing any future processing demands.
Additionally, Sumedha recommended using the CAP theorem as a way to evaluate the limitations of a model, applying the basic concept that a system cannot simultaneously guarantee consistency, availability, and partition tolerance.
At the end of the day, Sumedha said, “You have to justify the use of a particular storage engine based on the particular information coming in.”
To learn more about data management approaches and options supported by WSO2, view Sumedha's full presentation here.