An overview of 👀 lookalike 👀 modelling

Lookalike modelling is when we try to label one group of people who we think might be similar to another group of people we believe are valuable.

Three words: label, similar, valuable.

We start by identifying who is valuable. The obvious criterion is those who complete our primary business goal, e.g. buyers, power users, referrers etc. However, this group of people are typically too small to draw any conclusions about what made them valuable in the first place. That is, the sample size is almost always statistically insignificant.

How might we decide who are valuable when it's not so obvious?

We could do this based on who they are, e.g. female, 25-35, lives in an Australian metro area, or based on what they do, e.g. visited our website twice in the last week and added an item to the shopping cart. At this point we can separate everyone into these two groups — those who met the criteria ('Team A') and those who didn't ('Team B').

Now we have two teams, we need to answer the question:

what other charactistics or behaviours do people in Team A share?"

We want to know if there is something we can learn about Team A such that it helps us splitting Team B into 'people that might be in Team A in the future', and everyone else. Let's label these special people 'Maybe Team A in the future'.

How do we know if we are splitting Team B correctly? We observe everyone in Team B and compare the conversion rates. If we see our Maybe Team A in the future people are converting at a higher rate than the Team B average, we're onto something. If not, we guessed wrong. Start again.

That's it.

At this point you can see the assumptions piling up, and this is why lookalike modelling gets such a bad name. It's not because the concept is flawed, but the assumptions are rarely understood.

Proponents of lookalike modelling tend to be marketers with a paid media focus. Why? Their incentive is to spend the same limited media dollars more efficiently.

Lookalike modelling to these people tends to mean offloading it to the all-knowing Facebook, Google, Amazon, and adtech industry. The lookalike modelling narative is dominated by the adtech players comparing their solution to other adtech companies. Although they rarely share their process or accuracy rates.

The truth is that paid media lookalike modelling serves a very specific purpose. Even if you throw amazing AI at the problem, something you'll see in every adtech sales deck, these solutions are only as good as how well you describe the audience you are modelling from.

In the next article, I'll show you how to use Graph Compose to describe an audience and check if these attributes are a good measure to create a lookalike.