This document is proprietary and confidential. No part of this document may be disclosed in any manner to a third party without the prior written consent of DirectAvenue. Babbel TV Signal Exploration

Babbel TV Signal Exploration

Introduction

In the marketing world, one of the often-asked questions is: how much payout X should be anticipated based on marketing expanse of Y dollars in period Z. And sometimes both marketers and their clients find it difficult to reach an agreement as identifying true signal from noise is extremely challenging, especially in such an age where clients would also like to invest money into digital, radio, email, OTT and all other ad-hoc/emerging media vehicles.

In this analysis, we'll focus on the TV campaign performance and will approach the question from a non-modeling path. More specifically, we'll look at the immediate TV signal spikes triggered right after each airing. By doing so, we'll take the below advantages:

  1. The baseline impact has been considered as we'll only look at the spikes above the general traffic level.

  2. No need to worry about noise from most of the online campaigns as these campaigns are user-driven. This means the ads exposure is continuous and there is no specific stimulus at any data/time stamp point. Thus, it should already be considered in the baseline (baseline = natural traffic + user-driven campaign effort traffic).

  3. We'll have a clear read on the short-term incremental lift driven by TV at minute level that avoids debate about the determination of the long-lasting drag effect.

In the following discussions, the date range we'll be using is 2019-01-01~2019-06-30 with all data time zone converted to EST. The two data sets we'll look at are Google Analytics Session data, as this metric reflects the closest traffic signals to the airing time of each TV spot, and the TV logs data.

For Google Analytics session data, we've included the web traffic from 6 major channel groups: Direct, Organic, SEM Brand(cpc), SEM Nonbrand, Social and Content Marketing. The below chart shows the web traffic volume trend by channel group.

 

Clearly, we can see that there is a strong upward trend for Content Marketing traffic and downward trend for Organic traffic since mid-May 2019. In addition, Direct traffic has been decreasing since March 2019. Did any of these trends correlate with TV campaign effort? Let's take a look below.

Correlation Curve

In this section, we'll take a look at the correlation curve, which shows correlation value trend at each minute-lag (0~15 minutes) used to map minute-level spot and session data. Correlation value is a scaled measurement to approximate the size of the traffic spike within each minute after spot airing time. Minute-lag indicates the number of minutes offset in so that we can see if any stronger correlations between TV stimulus and web sessions exist.

Overall Correlation Curve

First, we'll look at the correlation curve using session data from all channel groups. In general, however, we believe signals from Direct, Organic and Paid (SEM) sources have the highest correlation with TV campaign. Here, we also include the Social and Content Marketing as their traffic volumes are also significant.

 

We can see from the above chart that from minute 1 to 4 we see significant (2~3 times) higher correlation between TV spot spend and Direct session volume. Now, the next question is: does this observation hold for each channel group?

Correlation Curves by Channel Group

 

By plotting the same correlation curve for each channel group, we see spikes in 3 plots that belong to Direct, Organic and SEM Brand channel group, which aligns with our historical experiences. In addition, SEM Brand has the highest correlation, which might suggest that a relatively higher proportion of the SEM Brand traffic are from TV audience who searched for content they saw from TV airing and accessed the website from search engine result.

Correlation Curves Overview

If we stack all 3 valid correlation curves in the same plot, along with the overall session curve, we'll have the below chart.

 

Though with different correlation values at each spike, from the above summary chart, we generally see that lag minutes 1~3 (area highlighted in grey) show the highest correlations between TV spend and session volume (Direct, Organic, SEM Brand, Overall). Therefore, we will take the incremental lift from the 1st to the 3rd minute after each spot's airing time as true signals of TV campaign.

In addition, we find that only the Direct session curve start to drop after minute 2, while the other 3 curves peak at minute 2 (and start to drop after minute 3). This might be something worth exploring (e.g. landing page design, customer behavior, ads message, etc.).

Incremental Session Lift Estimation

In this section, we'll estimate the spot-level incremental lift driven by TV. Still, we'll compare the results among different channel group, as well as the cumulative traffic volume consisted of sessions from the 3 targeted channel groups (Direct, Organic and SEM Brand), by calculating a pre/post 5-minute session difference. The below table shows the calculated result.

Traffic Lift Lift.per.spot Total.Spend Cost.per.lift
1 Direct_session 54984.00 7.14 $2,565,462.65 $46.66
2 Organic_session 45129.00 5.86 $2,565,462.65 $56.85
3 SEM_brand_session 37526.00 4.87 $2,565,462.65 $68.36
4 Total_session (3-channel) 120587.00 15.66 $2,565,462.65 $21.27
5 Baseline (3-channel) 84318.00 $0.00
6 Baseline (6-channel) 211367.00 $0.00

 

Please note:

  • All negative lift is default to 0.

  • Baseline is calculated based only on the sessions within the 1~3 lagged minutes window we defined from the previous correlation curves.

  • 3-channel baseline is calculated based on sessions from Direct, Organic and SEM Brand channel groups.

  • 6-channel baseline is calculated based on sessions from all 6 channel groups.

From the above table we can see that:

  • TV campaign does drives initial web traffic for the above 3 channel groups. In addition, the channel group rank based on lift volume is: Direct > Organic > SEM Brand. While the raw session volume rank is Organic [2,447,636] > Direct [2,246,099] > SEM Brand [470,523].

  • If we look at the 3-channel total session lift and baseline, the TV attributable session lift is about 58.85% of overall session volume within the 1~3 lagged minutes window. This percentage would be 2.34% if we include all the minute (w./w.o TV airing) across the whole period.

  • If we look at the 6-channel total session lift and baseline, the TV attributable session lift is about 36.33% of overall session volume within the 1~3 lagged minutes window. This percentage would be 1.47% if we include all the minute (w./w.o TV airing) across the whole period, or:
    • 2.45% for Direct
    • 1.84% for Organic
    • 7.98% for SEM Brand

With that said, if we assume:

  1. These 6 channel groups are the major sources of leads/sales.
  2. There is no significant conversion rate difference among each channel.

Then, about 1.47% of total sales should be attributed to TV campaign effort, based on the session % estimation from the above lift analysis, or:

  • 2.45% of sales resulted from initial Direct sessions
  • 1.84% of sales resulted from initial Organic sessions
  • 7.98% of sales resulted from initial SEM Brand sessions

However, we know that conversion rate varies by different channel groups. Therefore, we need to create an index to adjust the above percentages based on each channel group's session-to-transaction conversion ratio.

Sales Percentage Estimation by Channel Group

Session-To-Sale Conversion Rate Estimation

To estimate the session-to-sales conversion rate, we'll refer to the transaction data from Google Analytics, which is pulled in the same way as the session data we just looked at. By doing so, we create the below summary table, along with the proposed index at the very right column.

Channel.Group Session Transaction Conversion.rate Index
1 Direct 2246099.00 24062.00 1.07% 140.00
2 Organic 2447636.00 14349.00 0.59% 76.00
3 SEM Brand 470523.00 8407.00 1.79% 233.00
4 SEM Nonbrand 623607.00 4088.00 0.66% 85.00
5 Social 434472.00 8593.00 1.98% 258.00
6 Content Marketing 1958053.00 3270.00 0.17% 22.00
7 Total 8180390.00 62769.00 0.77% 100.00

 

Interestingly, we see that Direct, SEM Brand and Social channel groups have above average session-to-transaction conversion rate, while the other 3 are below the average line.

Then, we can use this index to tune the TV attributable order percentage by each channel group as shown below:

  • For Direct, the adjusted % of orders that are attributable to TV would be 2.45% * (140 / 100) = 3.43%
  • For Organic, the adjusted % of orders that are attributable to TV would be 1.84% * (76 / 100) = 1.40%
  • For SEM Brand, the adjusted % of orders that are attributable to TV would be 7.98% * (233 / 100) = 18.58%

Then, we can either:

  1. use the estimated average conversion rate of 1.47% to allocate overall leads/sales to TV campaign; With that said, if we assume most of the leads/sales came from the 6 channel groups we discussed in this analysis and there is no significant session-to-transaction conversion rate difference, we would allocate about 1.47% of the total leads/sales to TV campaign (1,783 leads and 222 sales). These numbers, in combination with the TV spend of $2,565,463 within the same period, leads to an estimated CPL of $1,438.66 and CPS of $11,537.03. However, we've already seen that session-to-transaction conversion rate varies a lot among channel groups. Therefore, the numbers we get here is not accurate (and most likely underestimated as 2 out of 3 channel groups we targeted have above average session-to-transaction conversion rates).

or:

  1. use a weighted average percentage based on the above adjusted conversion rates for Direct, Organic and SEM Brand for TV campaign leads/sales estimation. Under this method, the overall TV campaign attributable % would be 3.43% * (2,246,099 / 8,180,390) + 1.40% * (2,447,636 / 8,180,390) + 18.58% * (470,523 / 8,180,390) = 8.85%. Thus, the estimated CPL and CPS would be $239.71 and $1,922.33, respectively.

In addition, please note that this estimation is based on short-term traffic spikes only. Therefore, the actual TV attributable order volume should be higher if we consider:

  • The slower traffic (e.g. from aging audience) happened outside of the post 4-minute window of each airing.
  • The long-term brand awareness that exposes the product to potential customers who converted through other campaigns in a later time.