Attribution across devices is one of the major measurement challenges for digital advertising particularly affecting how mobile and upper funnel activity is valued. Many choose to ignore the problem and plump for guesswork. Is this wise? And what are the best strategies for dealing with this key source of measurement bias?
What exactly is the problem here?
The challenge is simply that we cannot measure actual user marketing influences before sales when users switch devices. For small purchases, most people research and buy in a single visit on a single device. But for a larger more complex purchase, such as a holiday, TV, or an insurance policy, often a little research is done on the phone, maybe a bit more on a tablet, and then the purchase might be made on a laptop.
A website visitor is anonymous unless they explicitly sign in at some point. So to be visible as the ‘same’ person more than one device, users have to be an existing customer already. Heavily used online services like Amazon, Google and Facebook have done a good job of streamlining user management but the norm for the rest of the internet is sites cannot tell whether visits from different devices are made by the same person or different people. This is a major challenge for advertising measurement since an ad click followed up by a purchase on a different device shows up as the ad as leading nowhere and having no apparent ROI.
This measurement bias generally gets worse with more complex and expensive purchases which involve ‘upper funnel’ channels and have longer multi-touch paths to purchase. The early touch points are already penalised by last click attribution, and then the cross-device challenge penalises them even further.
How big is the issue?
Comscore produces global stats on this using their usage panels and report than globally around 46% of users use multiple devices within any given month, but the figure is much larger in more developed market at around 50-60% in countries including the US, Germany and UK. The more people use multiple devices in this way, especially during a purchase process, the greater the issue.
Although the growth in the devices is a relatively recent phenomenon, the tracking issue is not new. For as long as internet use has been the subject of research, analysts have worried about ‘home versus work’ use whereby someone might do some shopping research during a lunch break and then pick it up later at home – in fact, this is still a major measurement issue. Also, note that the cross-device issue is strictly a cross-browser issue i.e. a user who switches browser or app on the same device cannot usually be linked across events either.
The bias in marketing measurement is clear: user journeys appear much shorter and less multi-touch than they really are.
Most companies still overvalue the last click and undervalue earlier touch points. With cross device switching even a first click positional model will give too much credit to the last click. This is because a single click is both first and last, and for many advertisers, these are the largest single type of user journey visible in their data. In reality, many of these single touch journeys will be from users who have visited before, maybe even very recently, but just on another device.
Multi-touch journeys will also get broken by a switch to a different device. Maybe you make 2-3 clicks doing some research on your phone, then 2-3 more choosing and then purchasing on another device, with a short break in between. Once again the first few touches via upper funnel channels get undervalued and their true contribution to marketing ROI is partially hidden.
How to ‘solve’ cross-device attribution
Even just thinking about taking this issue into account is a major step which the majority of advertisers don’t take.
The first step is to consider how big a problem it is for your business specifically. As with other attribution issues, the more considered and complex purchases, often the higher value ones like holidays, high tech or financial services, tend to have a long path to purchase and therefore a greater attribution challenge. You can use published research to get cross-device estimates but better is to get some idea of your own customers cross-device behaviour, for example looking at the cross report in Google ads reporting which leverages the benefit of their own cross-device tracking.
Many companies have some customer tracking across devices thanks to user login and sign up, apps, and email/ CRM – and while this data is heavily skewed to the purchasers (more on this below….) it does at least provide a window into your customer cross-device behaviour which you can use to explore just how big a challenge this is for your attribution.
Since most companies just ignore the issue, educated guesswork may actually be a better than average approach. In an attribution course which I periodically run we do an exercise where participants estimate the size of the cross-device bias, simply by considering the proportion of sales which are likely to be affected by the issue and using this estimate to upweight upper funnel attribution models. Maybe you guess around 40% of your sales involve multi-touch cross-device journeys. This suggests that when comparing first and last click models, the shifts that occur for each channel whether up or down, are weaker than they should be – in reality shifting by a factor of a further 40% or so.
This kind of simple analysis may be enough for you to give an apparently low ROI channel the benefit of the doubt, as your estimates could show the channel driving more upper funnel activity than initially appears.
Device graphs and other technical solutions
The main thrust of technical solutions fall into either ‘deterministic’, or ‘probabilistic’ methods and are generally a mixture of both. This is not as scientific as it sounds. Deterministic means you can actually track a user using some kind of technology, while probabilistic means you make inferences (guesswork) in the most robust and correct way possible given the data you have. Crucially the probabilistic approach depends on some level of deterministic linking since it relies on using information about ‘known’ cross-device behaviour to try to infer the unknown cross-device behaviour.
So the basic idea is to link as many users as you can using a tracking database, and then make a well educated statistical guess as to the rest.
When you consider a cross-device solution you will no doubt encounter platforms which promise that you can leverage their massive ‘device graph’ database to join users up. It sounds like they have all the information you need. A ‘graph’ in this context means a dataset describing a network of relationships, in this case between different devices. The tactic employed by these companies is to draw on data from multiple companies and networks to work map cookies to a wide set of more robust and persistent login based IDs. They therefore ‘know’ that when there are visits from two different browsers, they actually belong to the same user: this is deterministic linking.
Technology providers also use their linked data to train a model to predict what happens when these ID mappings are not available, and which visits are likely to be associated with other visits from other browsers and locations, and following a similar pattern as observed in the deterministic linking. This modelling process is called probabilistic linking.
Before you decide to shell out the large sums required to use these solutions, there are some really major challenges you need to be aware of.
GDPR and user consent, and other challenges
First, the only way a third party can track someone who is not logged is by mapping IDs based on shared login and tracked users from other sources. This type of large-scale mapping is almost certainly a violation of user privacy and identity under the GDPR legislation, and its only matter of time before these platforms have to delete this data. User data which has been properly consented is likely to be very small relative to the total universe of users out there. After all, would you consent to a company you deal with sharing cross-device usable data with a whole network of other companies?
Second, the data gets out of date quite quickly. Users frequently change logins, and suppliers, and also change and upgrade devices. So a large proportion of users who show up have not been already linked to the device graph ID set, which is only ever a small subset of the total number of users who login in. They cannot track everyone, so, for the most part, they have to estimate.
Which brings us to the third issue – and this is actually the biggest one – which is that claims about probabilistic linking are almost certainly overstated. Claims about highly accurate matching rates tend to fudge the difference between precision and recall (look these terms up if you want to understand more). When you dig into the problem, a basic fact hits you square in the eyes: there are many occasions when users do something and then switch devices, and many occasions where users do exactly the same thing, but then do not switch devices. No amount of probabilistic modelling can change this fact.
For example, suppose I see 100 people make 4 clicks on a mobile device, and my data tells me that there is a 3% chance that their next click is on a desktop. This suggests that 3 people should now be modelled as switching to desktop. But which 3 people? There is no way of knowing from the data you have. If you knew, you wouldn’t be using probabilistic linking in the first place! In technical terms, the ‘recall’ is fine, but ‘precision’ is very weak.
Deterministic linking is clearly superior since this is where we simply have all the data we need to match user to user on different devices. What solution providers do then is to effectively offer deterministic linking based on their database, with probabilistic linking used as a fallback option to ‘plug in the gaps’ left by this method.
Again, if you are piggybacking on someone else’s user data, then there are major privacy concerns to consider. However, it is worth noting that most companies already have some data allowing them to join up users across devices – what we might call ‘Do It Yourself’ (DIY) linking. For example, if your website asks users to log in each time, and collects an email address, then if they read their email from you on a phone, you can potentially match them from device to device. It seems there should be a way to leverage that.
Of course, the challenge is that this is always going to be a limited percentage of users, representing a big gap in your knowledge of cross-device usage. However deterministic linking is never 100% complete. So one way to leverage it is to try your own probabilistic matching, using the data you can match to make inferences about the data you cannot match.
If you do go down this route you should be aware of another major challenge with this partially linked data, which is that it’s heavily biased towards fully signed up customers and against non-customers. It would be very easy to use your own sign in data to reach the conclusion that people who buy from you have complex cross-device usage, whereas people who do not buy from you have simpler single device interactions. The problem with this is that the people who sign up and thus become trackable tend to be the buyers, and so you inevitably have more visibility on their complex user journey data than for the non-buyers. This kind of data bias can easily generate misleading conclusions.
An alternative solution
Here is where you have to forgive me for plugging our own Metageni solution which we call a ‘cross-device inference model’. We are interested to know the feedback from the community, so let me explain the principles.
The basic idea is that the known or ‘matched’ cross-device data can be viewed in both its matched and non matched forms, and we can observe how the process of matching itself changes the data distribution. The distributions of interest are relevant data features such as the user path length, device use frequency, and device position in the user path. We use this information to resample from our raw data, to create a sub-sample of non matched data which has a similar distribution of these features.
Thus, unlike probabilistic matching, we give up on the attempt to somehow create new cross-device data out of the unknown and instead settle for trying to make the overall sample more representative of what the complete cross-device data set would look like.
For example, we might find that when known data is linked, there tends to be a reduction in single clicks from tablet devices, as these get matched to multiple clicks on other devices. Let’s say these types of interaction fall by 10%. So before we use the raw data which has no matching ID, we create a subsample which randomly drops around 10% of these single click tablet interactions. We do this but for many different features of the data which we know shift around when the data can be matched. We end up with a data set which is not fully linked across devices, but which includes only non-linked data which is considered representative.
We would love to hear what other experts think about this less aggressive and more GDPR complaint approach.
Cross-device use continues to evolve but is not going away
Whatever you decide to do, hopefully, you can now see that it is best not to ignore the problem. Recent research suggests we may be past the peak of cross-device usage for some users who now just use smartphones for almost all their internet use. Mobile has become dominant over desktop, vindicating those companies that adopted a mobile-first strategy a few years back.
However cross-device use is in many ways a symptom of a continuing rise in multi-task activity, as we sit listening to music on Spotify, watching a show on YouTube on the TV and playing with holiday ideas on our mobile phone, often all at the same time. Measuring how users move through their digital universe continues to be vital to understanding their behaviour. Marketing analysts ignore this problem at their peril.
Gabriel Hughes PhD
Can we help unlock the value of your analytics and marketing data? Metageni is a London UK based marketing analytics and optimisation company offering custom solutions for in-house capabilities.
Please email us at firstname.lastname@example.org