Synthetic Data For Marketers: Sunlit Uplands Or Existential Threat?

Image source: timlouis.ca

As a subscriber to thought-leadership articles from the likes of McKinsey, BCG and Google, I’ve noticed a trending topic over the last 18 months: One third of my received email Subject Lines and/or article titles currently include the acronym “AI” (Artificial Intelligence).

Of course, many of these articles are Content Marketing for the publishers themselves, for whom AI can provide valuable future revenue streams; their interest in the topic is naturally not just academic.

Nonetheless, since the first widespread ChatGPT-3.5 trials in late 2022, an inflection point seems to have been reached. We are now entering what many writers dub the latest “AI Spring” – albeit one that has followed preceding “AI Winters”.

Mass awareness of the ongoing adoption of AI as an increasingly standard business input, appears to have been attained. Hence, the editorial stance noted above.

Of course “AI” is a broad term. Generative tools such as ChatGPT are only one application of the underlying Machine Learning (“ML”) evolution. Moreover, the challenges of safely harnessing potential benefits of widespread AI implementation, while mitigating significant associated risks, have also reached new levels of mass awareness.

Perhaps this current ‘AI Spring” may lead to another cyclical ‘AI Winter’, but either way, Marketers – from junior Associates up to the C-Suite – are now obliged to divert some of their attention towards a better understanding of this unfolding technology “revolution”.

We’re not just talking about skills enhancement. There are also AI-related questions that employees in many sectors are already asking: “Will my job still exist in the future…and if so, what will it be like?

In this article I share my views on the topic, with emphasis on another timely AI-related development masked by the current broad GenAI zeitgeist: the acceleration of “Synthetic Data” models, use cases and possible implications on Marketers, as well as the Marketing discipline itself. 

WHAT IS SYNTHETIC DATA FOR MARKETERS?

Image source: researchandmarkets.com

For the uninitiated, a quick 101: Synthetic Data (“SD”) is artificially-generated data that mimics the characteristics of real-world data, without containing any information actually sourced from real individuals or events.

These algorithmically-derived datasets replicate the statistical properties, patterns, and distributions found in originating real-world datasets, and so can be used at scale to train ML-related models.

In Marketing, SD has potential use cases in activities such as Market Research, Brand Marketing, Performance Marketing, Media Buying, and even Budgeting.

Its three generic benefits in such cases are: 

(a) The ability to synthesise much larger datasets than offered by traditional data collection and tagging methods.

(b) The ability to do so at much higher speed and lower cost than traditionally

(c) With careful programming, the ability to do (a) and (b) with high statistical correlation to smaller originating datasets.

TWO SD MARKETING USE CASES CONSIDERED

In Market Research SD also offers a critical extra benefit. By excluding real-world Personal Information, it can enable researchers to review, analyse and classify customers, cohorts and behaviours in greater depth, without risking sensitive data mismanagement.

Thus, originating real-world datasets from sources such as actual Focus Groups, Online Questionnaires, and even rolling Brand Trackers, all of which are time and budget-intensive to set up, and limited in scale, can “feed” ML-based models. 

In turn, these can augment the depth and breadth of the analysis with large-volume, Personal Data-unencumbered data derived from analogous data sources, including those scraped from the public internet – such as Social and Search data.

Performance Marketing Effectiveness is also potentially greatly supported by SD. For example in A/B or Multivariate content or campaign testing, a long-term SD model can be built over time, being enhanced with results from periodic campaigns. 

This model can then effectively become a Control for any later A/B test of itself, and obviate the perennial cross-channel, cross-platform challenges inherent in managing any group of media channels – especially in a Compliant, Privacy-respecting way.

SD can also help inform testing across – for example – geographical markets, in which historical real-world originating data may vary greatly in depth and quality, especially in labelling or tagging.

Multiple helpful articles on the Performance Marketing Effectiveness topic have been published this year – two examples are here and here.

UPSIDE VS DOWNSIDE

Image source: Shutterstock

So it’s all good news, right? Deeper and more timely insights for hard-pressed Marketers, less heavy lifting, budget benefits…what’s not to like?

To begin with, there are what may be termed “inherent AI challenges” linked to using SD in Marketing. Many of these rest on the need for those technologically and quantitatively adept experts to build, run and develop the ML models. 

In doing so these experts must mitigate one of the greatest risks in the modelling process: embedding, and possibly reinforcing, biases that may be present in originating data.

For example, Online Questionnaires by definition tend to be completed by respondents that may not truly represent your existing or target customers. It is acknowledged that many customers, even if they can be reached with such surveys, won’t be the kind of consumers, or B2B decision-makers, who invest time in completing surveys. Even those who do, may not complete them with deep self-insight or high attention to accuracy.

If such source data is a key part of an ML model, such biases can not only become embedded, but become self-fulfilling by leading unquestioning Marketers to believe that the advanced, scaled-up data analysis results must be true…with the ML models becoming more “digital Black Boxes” upon which they draw without deep statistical understanding.

Beyond such nuts-and-bolts SD considerations, are broader, more abstract, even deeply ethical AI considerations for Marketers, and business more broadly. As noted earlier, a key one is the potential downstream impact of AI on current and future staff – and their roles.

I personally witnessed as long ago as 2018 the first rollout of AI-powered forecasting for Demand Planning within an organisation, with insufficient regard for the understandable concern caused in team members whose job titles were “Demand Planner”.

More recently, in the Marketing areas of source Copywriting, and thereafter Translation and Localisation, I’ve observed both new recruits and more seasoned staff deeply worried by senior management excitement at GenAI tools taking over content authoring. 

I have also witnessed Translation Engines taking over ever more of the process of adaptation for other markets, from multilingual experts. Similar concerns can be surmised in in-house and agency Designers, Creatives, Media Planners…the list is extensive.

Such concerns aren’t limited to Marketing, or to any other function in a given organisation. As Google enthusiastically expresses, AI-driven use cases are already fully-launched across industries and geographies, regardless of their position on longer-term adoption curves. 

The related Organisational Design and Organisational Behaviour ramifications need explicit discussions by CMOs with all their executive peers. Right now.

CONCLUSION

It is clear that even if we collectively do witness some kind of latest AI Winter, following the recent surges forward in SD modelling, GenAI usability, and the like…The Times, They Are a-Changin’.

Core skillsets for new and experienced Marketers must evolve. Sunlit uplands in which we are collectively freed from creative constraints by AI may indeed emerge.

However, our abilities to deeply understand these new ML-driven models presuppose meaningful skills changes. This is in terms of inputs and the processes performed on them. Moreover, these skills changes are needed before any resultant content, insight or actions can be confidently proposed.

The evolution in the maths and technology skills needed for “the Average Marketer” to help us collectively reach such uplands has even resulted in a new job title being posited by WPP: the Creative Technologist.

There’s more: we will still need robust originating data to feed SD modelling for Marketing – especially in ‘Before and After’ macro-scenarios that deeply affect and change consumer sentiment, such as pandemics, socio-economic shocks, wars and civil unrest.

The originating data that feeds these ML tools will itself need to improve, in order for us to leverage its manifold possible SD derivatives with confidence. 

As one regular and pithy luminary noted late last year in a related article, “A lot of current, human derived market research and marketing planning is…well…pants”.

Or, to paraphrase a well-worn expression: ‘Garbage source data in, Garbage synthetic data out.’

Richard Palk has 15 years of senior international Marketing experience, and 10 years of eCommerce leadership, in multiple Consumer sectors. He uses this professional background, and an MBA from London Business School, to bring Digital-First insights and analysis to a range of contemporary Marketing topics.

Column