Posts

Build your own attribution model with machine learning

Sounds too good to be true? Maybe so, but as machine learning and cloud data technology become more accessible and scalable, building your data-driven Multi-Touch Attribution (MTA) model is becoming increasingly realistic. The advantage of a predictive machine learning model is that it can be objectively assessed based on a blind hold-out data set, so you have clear criteria as to whether one model is ‘better than another.

What you need for an in-house approach

First, you need data and lots of it. Fortunately, digital marketing is one of the most data-rich activities on the planet. Chances are that if you are a B2C online business, you have hundreds and thousands, or else millions of site visit logs generated every week. These logs contain referrer data which helps you analyse and understand the customer journey to sale, the first essential step in building your attribution approach.

Second, you need a solid experienced data science and marketing analytics team. Go for a mixture of skills. Typically, some data scientists are strong on theory, but weaker on the application, while others are great communicators and see the strategic angle but are weaker at data-related programming. You also need domain expertise in marketing analytics. You need visualization experts and data engineering experts. The fabled ‘unicorn’ data scientist is impossible to hire, so instead, you should go for a team with the right mix of skills, with strong leadership to move the project forward.

Third, you need patience. The truth is, getting to an attribution model using machine learning is not easy. It is not a case of throwing some data at a model and waiting for the answers to pop out by magic. Your team needs to decide what data to use, how to configure it as a feature set, what types of model to use, what an appropriate training set is, how to validate the model and so much more besides. You will need to make multiple attempts to get to a half-decent model.

Choosing the final model

The best candidate ML models depend on your data – we have had good results with well-optimized classifiers and regression models, which we find often outperform even high order Markov models. While a black box or ensemble method may get better predictive accuracy, you need to consider the trade-off in terms of reduced transparency and interpretability. The best advice is not to commit to a particular modelling approach or feature set too early in the process, but to compare multiple methods.

But what then? An advanced machine learning model does not speak for itself. Once you have a model, you then need to be able to interpret it in such a way as to be able to solve the marketing mix allocation problem. What exactly is the contribution of each channel? What happens if spend on that marketing channel is increased or decreased? How does the model handle the combined effects of channels?

All of this will take months, so it is a small wonder that many companies ignore the problem or else go for a standard vendor out-of-the-box approach. It’s worth remembering then that there are some key benefits of a, ‘do it yourself approach to consider.

Benefits of an in-house model

If you create your model, you will discover a great deal about your marketing activity and data in the process. This can lead to immediate benefits – for example with one major international services firm we worked with we found significant marketing activity occurring in US states and cities where the company has no local service team. Even with no attribution model defined at that stage, the modelling effort uncovered this issue and saved the company huge sums right away. The point is that your data quality is tested and will become cleaner and more organised through the process of using it, and this, in turn, supports all your data-driven marketing.

Another beneficial side effect is that if you create your attribution model you will also learn about your business and decision making. This process will force your marketing teams to collaborate with data scientists and engineers to work out how to grow sales. Other teams need to be involved, such as finance, and your agencies, and this will often spawn further opportunities to learn and collaborate across all these marketing stakeholders.

Attribution is all about how different marketing channels work together, so your various marketing teams and agencies need to collaborate as well – social, search, display as well as above the line, and brand and performance more broadly. Again, this provides intrinsic and additional value over and above the modelling itself.

Finally, it is worth pointing out that you will never actually arrive at the final model. This is quite a fundamental point to bear in mind. By its nature, a machine learning approach means you need to train the model on fresh data as it comes in. Your marketing and your products are also changing all the time, and so are your customers. So really you need to build a marketing attribution modelling process more than you need to build a single attribution model.

So, go ahead, build your model, be less wrong than before, and then when you have finished, start all over again. As we say at Metageni, it is all about the journey.

Gabriel Hughes PhD


Can we help unlock the value of your analytics and marketing data? Metageni is a London UK based marketing analytics and optimisation company offering support for developing in-house capabilities.

Please email us at hello@metageni.com

Just how bad is your analytics data?

If you do not know how bad your analytics data is, then the chances are, it is much worse than you think. With data analytics, it is not the known data quality issues that will cause you the most trouble, not the known unknowns, but the ‘unknown unknowns’ – those issues you uncover and discover as you explore and analyse your data.

Usually, it is only the practitioners who are very close to the data who understand the full extent of the data quality problem. Too often the poor quality of data is kept as something of a dirty secret not to be fully shared with senior management and decision makers.

Common issues in web and marketing analytics

Let’s look at just some of the most common issues affecting web and marketing analytics data. To begin with, do not assume that the data sources provided by the most common analytics solutions are robust by default. Even the best ones are prone to big data quality issues and gaps. Take Google Analytics referrer traffic, which often reports an unrealistic level of ‘Direct’ traffic, supposedly visits made directly through users typing in URLs, or bookmarking, both low frequency methods of site access. The reason is that ‘Direct’ is, in fact, a default bucket used where no referrer data is available to the analytics server.

Experienced web analysts know that high levels of direct traffic usually mean high levels of broken or missing tags, or other technical issues, that means the true referrer data has been lost.

The major providers are also contributors to that other major source of poor data quality, which is fragmentation and disjoint data sources. Google search marketing tags will track conversions, but only from the Google search and display network. Facebook similarly provides tags which only link Facebook marketing to sales, ignoring all other channels. Affiliate networks do the same thing leading to widespread duplicated over attribution of sales to multiple sources. This challenge is exacerbated by different marketing attribution platform look back windows and rules which are different between platforms.

Having worked with multiple brands of all sizes, I have yet to come across a brand that does not have some level of tagging issue. A typical issue is a big mismatch between emails delivered and emails opened and clicked. Another is social campaigns which are delivered by 3rd party solutions and then appear as referral sources, due to the use of redirect technology.

Tagging and tracking

Tag management systems help manage this, but unfortunately not by linking the data, just by de-duplicating tag activity at source, which is hardly satisfactory if your goal is to understand Multi Touch Attribution (MTA) and marketing channel synergy.

Assuming you solve all your tagging issues and have well-structured soundly applied tags, you should not forget that the tag is only as good as the tracking itself. A great challenge here is the gap that exists tracking users across devices. You cannot link visits by the same user on different devices without sophisticated tracking that users have signed up to beforehand. This means your tags cannot tell the difference between the same user visiting twice on two different devices and two different users.

The idea every one of us can be closely tracked and monitored online is an illusion for all the biggest technology companies – and perhaps we should be glad of that. Indeed, unique ID tracking and linking is now more closely under scrutiny the age of data security breaches, increased concerns over user privacy and the GDPR. This is yet another source of difficulty for companies looking for a 360-degree view of the customer. Companies have to work with fully consented and well-defined samples of data to make progress in understanding their customers.

For the analyst, this is yet another reason why having huge volumes of data is not enough for user insight and data-driven decision making.

So what can you do about all these data quality challenges?

Data quality is perhaps like muscle memory in sport. You use it or you lose it. It’s only by trying to analyse and find patterns in your data that you uncover the issues that need to be addressed. Where there is a need, strategies can be devised to manage these gaps in data quality and take steps for improvement. It is a process.

The best advice is to get stuck in. Pick one data source and run with it, making sure to compare it to others and ask if the data makes sense given what you know about your customers. There are always discrepancies between data sources which in theory should report the same numbers: in my experience, this is a kind of law of all data analytics, so you need to get used to it. Use these differences to help you validate your sources, understand why differences might arise, and just accept that there is an acceptable level of difference – say 2-3%.

In data analytics, as in life, you must not let perfect be the enemy of the good. Be wary of the massive data technology project which promises to link all data together in one big data lake and thereby solve your challenges. Bad data plus more bad data does not equal good data. Face up to your terrible data quality, and tackle the ugliest issues head-on. If you ignore the problem, it can only get worse and you will continue to struggle forwards in the dark.

Gabriel Hughes PhD


Can we help unlock the value of your analytics and marketing data? Metageni is a London UK based marketing analytics and optimisation company offering support for developing in-house capabilities.

Please email us at hello@metageni.com

Your attribution model is unique

‘What attribution model should we use?’ is now a question being asked not just by marketing analysts, but by the CMO and even the CDO, CFO and CEO.

It would be fantastic if there was a simple formula to answer that question but of course, life is never so easy. Here are some simple guidelines you can follow to help you answer this question.

Differences in attribution reflect differences in your business

If your business offers a higher value considered purchase, like a holiday or a financial product, then you should expect a longer more complex research to purchase journey, and therefore a higher value given to the earlier touch points. This means attribution models with longer ‘look back’ windows and rules favouring earlier clicks.

If your product is in an especially competitive market, with lots of similar alternatives available to your customers, then you need to focus on the research and comparison phase, again valuing earlier clicks, and thinking about how to address the cross-device attribution challenge. Users who shop around research on their phones and tablets, and flip through several options, before settling on their final choice. Your product needs to be part of that journey so if you are to have a chance of being selected then leaving it all to the last click is leaving it too late.

If your business relies heavily on repeat custom, then you should attach higher attributed value to new customer acquisition and retention, exploring a cautious application of lifetime value. You cannot assume customers will stay loyal without work, but a repeat purchase is almost always easier and less costly to achieve than the first one.  

Do you have a diverse portfolio of different products, targeting different customer needs and with a wide range of price points? In this case, you may need different models for different product categories. It sounds complex, but if you think about it makes no sense to treat the buyer of (say) an expensive hi-fi system the same as the buyer of a replacement phone charger. Different customers also behave in different ways. Customer segmentation should therefore map on to differences in marketing attribution.

Is your market heavily driven by brand perception and do you invest in TV, outdoor or print? In this case, you will need to explore how to link above the line analysis to your digital attribution.

And so, it goes on…. The truth is, each business is unique, with unique complexity around the product offer, their customers, how they engage, and the value of each sale. The only way to address the uniqueness of your business is to develop appropriate unique attribution models.

Incorporate these unique features into your model

Once you accept that your attribution challenge and therefore your attribution model is unique, everything becomes easier. For example, instead of plugging into a standard tag or data collection framework, you can leverage the special and unique features of your data as inputs into your model.

You can structure your attribution model to reflect the different customer journeys that you see for each customer segment and product group. As you learn more and more about your customer journeys, your data-driven modelling can adapt.

At Metageni we believe a customised data-driven approach unique to your business is the only way to get to grips with this complex business challenge and turn it into opportunities to increase marketing efficiencies and grow sales. You will not look back!

Gabriel Hughes PhD


Can we help you with your custom attribution model? We collaborate with our clients’ organisations, helping marketing & data science teams create integrated on & offline investment analytics solutions for optimising sales growth.

Please email us at hello@metageni.com