Top Data Developments
Dr. Mohan noted that in the area of data storage, IBM is working to develop very high-density tapes, and there is newer technology to support more efficient data retrieval. Another development he described is solid-state memory technology, which goes beyond flash memory. Ultimately it will be cheaper than DRAM, and have persistence and better efficiency. Equally important is how you present the data, Dr. Mohan said. Visualization technologies are important, and IBM is doing quite a lot of work on how to use visualization to present results. Finally, IBM is looking at ways to exploit massive parallelism, so that even in the presence of failures, it is possible to continue processing the data without having to restart.
Mr. Hohpe recalled that, “When I started writing applications, the data used to be very separate. That has completely evaporated. Everything is married to the data.” This has represented a fundamental shift for the software engineer. The kind of data collected has also changed dramatically, he said. In the past, data used to be limited to items, such as customer name and address. Now some of the most interesting data is dynamic data about what users are doing on a website.
Mr. Rubasinghe said, “I started life earlier doing databases and performance analysis on databases.” Looking from an RDBMS point of view, he observed that, in the last few years, the nature of interactive software was changed significantly. As people access interactive software through the Internet, it is creating more and bigger data. “RDBMS, which should have been the backbone, haven’t kept pace,” Mr. Rubasinghe said. Moreover, many popular applications create data that does not fit in a relational model.
New Approaches to Storing Data
Dr. Perera asked the panelists to share their thoughts about new technologies for storing and managing data, such as NoSQL.
Dr. Mohan described how NoSQL developed as various organizations, which were not traditional software companies, found that relational databases did not fit their needs and that it was hard for non-database people to think along the lines of SQL. However, NoSQL databases have brought their own challenges, he said. The API for NoSQL is very primitive, a painful tradeoff in order to achieve greater flexibility and scaling. He added that a lot of features from relational databases are now being reintroduced into their NoSQL counterparts.
Mr. Hohpe observed that the nature in software is for things to grow until they get too big; then they get cut back to the basics, and developers start building back up from there. He added that one of the most interesting areas is the Metadata Access Point (MAP), noting that, “The model is a nice abstraction.” The biggest alternative to MAP is temporal logic, Mr. Hophpe said. However, he noted that they solve different problem spaces, so it is hard to compare them.
Mr. Rubasinghe noted that with the requirement to support multiple data formats is driving the demand for transformation. In the case of relational databases, it works to have the model predefined. He added that when data has to be transformed, there is a need to identify where you want to the process data.
Data in the Cloud and Security
Dr. Perera next asked the panelists to explore the challenges of managing data in the cloud securely.
Dr. Mohan said that data security in the cloud is a big issue, particularly for larger companies and those that need to comply with government requirements. He explained that this is the reason why private clouds are more popular, as are hybrid clouds where enterprises use the public cloud for what it is good at delivering and then manage the rest in a private cloud. He also advised enterprises to make sure that cloud providers can meet their expectations and that there is a service-level agreement (SLA) for performance and availability.
Mr. Hohpe stated that the point we have gotten to with data in cloud is fantastic. “The cloud is leveling the playing field and democratizing the market. Now two guys in a garage can have access to terabytes and petabytes,” he observed. Through public clouds, “what we have really learned is the ease of deploying applications,” he added. “That is the biggest factor. I think a lot of that can be utilized for internal clouds.”
To learn more about the panelists’ perspectives on technologies and best practices for managing big data, view the full panel discussion here.