The key lies in Wikipedia’s definition that “Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”.
For years the Telco, Retail and Financial sectors were being convinced that they needed big databases to ensure all data was accessible for all purposes …the famous “just in case”. That was great for database and equipment sales but not always feasible because those analytics and reporting functions tended to slow the systems down and the results never come up. The answer was to add more boxes, more horsepower and even more databases and link them all together using some sort of middleware bus.
Things got even more bizarre when CRM systems were introduced to collect and display all relevant customer data, resulting in replicated and often unsynchronized data. If that wasn’t enough, data warehouses were introduced that collected data from a number of disparate sources, ‘rationalized’ and reindexed it so that it could be used for offline analytics.
Big Data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers." According to IBM, eighty per cent of the world’s data is unstructured, and most businesses don’t even attempt to use this data to their advantage. And these days it is all about real-time access to that data.
The whole premise of providing the best customer experience is that information can be accessed from anywhere on the network whilst there is interaction with the customer. Data is not resident in one large database, and is certainly not in one common format across multiple servers on a network and even loyalty to any single DB supplier of appears to be a thing of the past.
Google has just announced they it is moving from MySQL to MariaDB, a monumental task. It would seem that open source is key in many Big Data initiatives. Today, the amount of data being collected is growing exponentially, fuelled somewhat by mobile communications and a growing number of connected devices including smart meters.
There is a large amount of activity around Apache Hadoop, another open source software project that enables the distributed processing of large data sets across clusters of commodity servers. However, unlike Hadoop (i.e. distributed computing), in-memory processing is also being pushed with a mainly proprietary flavour with SAP Hana being one of the strongest contenders.
Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. Add to this the growth in use of cloud services and the virtualization of data and we can start seeing issues arising with data integrity and a company’s ability to ‘audit’ that data for regulatory and accounting compliance reasons.
Savvy operators are extending their existing Revenue Management and Assurance practices and tools across all application servers, network elements and databases to try and keep up with the challenges and yet unseen problems that might arise from Big Data.
Are you prepared?