Big Data - over-hyped, over-priced and over-done

Huw Davis HD

The science of customer insight and modelling does not come from creating enormous data warehouses, but through understanding the value of maintaining the right level of information.

Big Data, two words with so little meaning, but potentially with a large commercial price tag. Every consultancy, hardware and software business has jumped onto the big data gravy train. Meanwhile the vast major of corporations are still on the station platform waiting for the business benefits and returns from the initial investment into these two small words. Yes, there are the usual examples that are wheeled out by the likes of IBM or PwC of the success of big data solutions, but they are few and far between. If I hear the Google story of predicting a USA flu epidemic from online searches across North America once more, I will personally have a fever and throw-up!

The same happened when neural nets were being developed for marketing purposes, and before that data mining, and before that CRM, and for those who remember EIS (Enterprise Information Systems) and MIS (Management Information Systems) of the 1980s. But the difference with big data is that it is such a vague term that really any volume of data and its internal structure can be defined as big data.

Let’s face it, we have always had data to improve targeting and segmentation, but it has normally been generated as a byproduct of administration systems for customer records, inventory management or financial reporting. What we are now in the realms of is big, bigger and biggest. As most of us are end-users of this tidal wave of information we need to somehow cut through the rhetoric. We need to fully understand what is truly useful for us and not forget that we are looking for the right data not any data. Right data is far more usable, trackable and controllable, which will increase value within your business quicker, rather than swamping it with everything you can possibly capture and will probably never use.

Thirty years of analysing client customer data has given me the knowledge that there are only about fifteen to twenty strong factors within a client’s own database and probably only about another five to ten factors which we can glean with confidence from consumer generated information. What are these factors that drive engagement and value? Well that depends on the client industry and where you are in the supply chain of delivery to the end-user buyer. All I can tell you is that knowing that 5% of people only visited five of eight pages on your site will not be a strong driver of general lifetime value prediction. But, a good starting point comes from using factor analysis and then applying a range of step-wise regression techniques to identify the strong factors within your data.

Before you can get clever with data and dramatically reduce its volume you first need to understand each variable from your system. As an example of really understanding your data, I have been telling the following story for many years about the department store that sells umbrellas and it is raining heavily outside the store. The store quickly runs out of their fashionable navy blue umbrellas and are only left with the bright yellow ones. It continues to rain and shoppers continue to demand an umbrella and eventually the sales of yellow umbrellas goes up and they too sell out. If we now took the sales data and used this in a CMS solution for the following day, based on retail outlet sales it will show that we should be selling yellow umbrellas as our trending product. What is being modelled is a stock control problem and not consumers’ desires.

What has surprised me in the last three to five years is the fact I have become a born again market researcher. I realised that as we drown in consumer self-generated data the only way to swim against the tide is to take control of the emotional data that consumers are generating. The right qualitative techniques interpreted by an expert can home in on not just the whats and whens that we can gain from the on and offline transactional data, but the why factors that great qual will identify in research groups. Combining this data with the twenty or so hard data factors will generate the right data, not just big data.