Just how bad is your analytics data?

If you do not know how bad your analytics data is, then the chances are, it is much worse than you think. With data analytics, it is not the known data quality issues that will cause you the most trouble, not the known unknowns, but the ‘unknown unknowns’ – those issues you uncover and discover as you explore and analyse your data.

Usually, it is only the practitioners who are very close to the data who understand the full extent of the data quality problem. Too often the poor quality of data is kept as something of a dirty secret not to be fully shared with senior management and decision makers.

Common issues in web and marketing analytics

Let’s look at just some of the most common issues affecting web and marketing analytics data. To begin with, do not assume that the data sources provided by the most common analytics solutions are robust by default. Even the best ones are prone to big data quality issues and gaps. Take Google Analytics referrer traffic, which often reports an unrealistic level of ‘Direct’ traffic, supposedly visits made directly through users typing in URLs, or bookmarking, both low frequency methods of site access. The reason is that ‘Direct’ is, in fact, a default bucket used where no referrer data is available to the analytics server.

Experienced web analysts know that high levels of direct traffic usually mean high levels of broken or missing tags, or other technical issues, that means the true referrer data has been lost.

The major providers are also contributors to that other major source of poor data quality, which is fragmentation and disjoint data sources. Google search marketing tags will track conversions, but only from the Google search and display network. Facebook similarly provides tags which only link Facebook marketing to sales, ignoring all other channels. Affiliate networks do the same thing leading to widespread duplicated over attribution of sales to multiple sources. This challenge is exacerbated by different marketing attribution platform look back windows and rules which are different between platforms.

Having worked with multiple brands of all sizes, I have yet to come across a brand that does not have some level of tagging issue. A typical issue is a big mismatch between emails delivered and emails opened and clicked. Another is social campaigns which are delivered by 3rd party solutions and then appear as referral sources, due to the use of redirect technology.

Tagging and tracking

Tag management systems help manage this, but unfortunately not by linking the data, just by de-duplicating tag activity at source, which is hardly satisfactory if your goal is to understand Multi Touch Attribution (MTA) and marketing channel synergy.

Assuming you solve all your tagging issues and have well-structured soundly applied tags, you should not forget that the tag is only as good as the tracking itself. A great challenge here is the gap that exists tracking users across devices. You cannot link visits by the same user on different devices without sophisticated tracking that users have signed up to beforehand. This means your tags cannot tell the difference between the same user visiting twice on two different devices and two different users.

The idea every one of us can be closely tracked and monitored online is an illusion for all the biggest technology companies – and perhaps we should be glad of that. Indeed, unique ID tracking and linking is now more closely under scrutiny the age of data security breaches, increased concerns over user privacy and the GDPR. This is yet another source of difficulty for companies looking for a 360-degree view of the customer. Companies have to work with fully consented and well-defined samples of data to make progress in understanding their customers.

For the analyst, this is yet another reason why having huge volumes of data is not enough for user insight and data-driven decision making.

So what can you do about all these data quality challenges?

Data quality is perhaps like muscle memory in sport. You use it or you lose it. It’s only by trying to analyse and find patterns in your data that you uncover the issues that need to be addressed. Where there is a need, strategies can be devised to manage these gaps in data quality and take steps for improvement. It is a process.

The best advice is to get stuck in. Pick one data source and run with it, making sure to compare it to others and ask if the data makes sense given what you know about your customers. There are always discrepancies between data sources which in theory should report the same numbers: in my experience, this is a kind of law of all data analytics, so you need to get used to it. Use these differences to help you validate your sources, understand why differences might arise, and just accept that there is an acceptable level of difference – say 2-3%.

In data analytics, as in life, you must not let perfect be the enemy of the good. Be wary of the massive data technology project which promises to link all data together in one big data lake and thereby solve your challenges. Bad data plus more bad data does not equal good data. Face up to your terrible data quality, and tackle the ugliest issues head-on. If you ignore the problem, it can only get worse and you will continue to struggle forwards in the dark.

Gabriel Hughes PhD


Can we help unlock the value of your analytics and marketing data? Metageni is a London UK based marketing analytics and optimisation company offering support for developing in-house capabilities.

Please email us at hello@metageni.com