How and Why to Automate Content Transformation with the help of Machine Learning

Author: Lazarina Stoy

Last updated: 25/11/2024

The past year has further amplified the importance of diversifying traffic sources, and the need for companies to become truly omni-present when it comes to their content and brand presence. In this article, I will explore how users' search behaviors have changed (the devices they use, how and why they search for information, etc), and look at various ways you can transform your content with the help of Machine Learning (ML) APIs and tools.

Here’s what I’ll be covering:

Why you should consider investing in content transformation

Content transformation methods and implementation

Text-to-text transformation
Text-to-speech transformation
Speech-to-text transformation
Text-to-video transformation
Image-to-video transformation

How to responsibly scale your content transformation program in the age of AI

Why you should consider investing in content transformation

Search is changing. Not only how we search, but where we go to find information, and what devices we use to get us there.

We now have more places than ever that we can go to to find information - not only search engines, but social media, and most recently - AI chatbots like ChatGPT. Research by Hubspot shows that 31% of people search on social media, while 12% now use AI chatbots to find answers.

How do consumers search for answers to questions online? - Hubspot Blog Research, 2023.

This research also shows the differences between different generations when it comes to social media searches, with nearly a third of Gen Z and Millenials (31% and 28%, respectively) preferring social media over search engines for searching for and finding information online.

What percentage of each generation prefer to search on social media over search engines? - Hubspot Blog Research, 2023.

Another aspect that’s changing search behavior is that we now have access to various devices that we can perform searches on (e.g. our phone, laptop, tablet, Alexa, or even our car), and, as such the contexts in which we’re searching vary too – we’re searching while at work, while driving, while cooking, while watching TV, etc.

Where might searches take place? - Screen Capture from a conference presentation by Heather Physioc, 2024

In other words, our search journeys are longer and more widely distributed in terms of device, context, and search platform than ever before.

2010s Modern Search path - Screen Capture from a conference presentation by Heather Physioc, 2024

What this means is that the need for your brand to be omni-present is greater than ever:

Having a company blog without social media presence that reflects the same information puts you at a disadvantage at reaching a third of Gen Z and Millennials.
Having a presence on different social media accounts, without it reflecting your company’s know-how is a missed opportunity
Not taking into account the complexity of the modern search path and the implicit intent with which people search for information (see the Jobs to be Done theory for more context) is setting you up to fail in your content production and distribution strategy
Relying solely on organic traffic from search is no longer a viable strategy (especially with latest algorithm updates, which take into account brand authority signals)

And since I know how difficult and costly content transformation can become (and because I’m such a big advocate for machine learning implementation in organic search operations), I will share with you an overview of different methods of automating content transformation with the help of machine learning web tools and APIs.

Hopefully, by the end of this article, you will not only have the tools, but also the understanding of how to get started improving your content accessibility and cross-platform organic visibility, regardless of your currently-dominant brand content format.

Content transformation methods and implementation

Text-to-text transformation

Imagine you have a piece of text, and you want to change it into a different form, while still keeping the core message - that’s the easiest way to explain what text-to-text transformation is all about. Text-to-text transformation algorithms rewrite text according to certain rules or instructions. To illustrate further, text-to-text transformation models can be applied in:

Translation: Starting with a sentence in one language, the model transforms the sentence into another language. Some example models that can do this accurately are GPTs (like OpenAI’s GPT4, or GPT4o), Google Translate API, Deep L translate.
Summarization: You take a long article or document and condense it into a shorter summary, capturing the most important points. BERT is amongst the most popular models for summarisation, but any text-based generative AI model (like OpenAI’s GPT4, or Gemini) can do a great job, too.
Paraphrasing: You rewrite a sentence or paragraph using different words and sentence structures, while maintaining the original meaning. LLMs (Large Language Models) are great for this.
Grammar Correction: You take a piece of text with grammatical errors and transform it into a version with correct grammar. Any LLM excels at this task.
Style Transfer: You change the tone or style of a piece of text. For example, you might turn a formal PR announcement, or a case study into a LinkedIn post. Any LLM excels at this task.

Text-to-text transformation can be incredibly powerful for speeding up the distribution of text-based content to other platforms. Referring to the graph by Olaf Kopp below, the goal we should have as marketers is to ensure that we can convey the brand message in as many content formats as possible, capturing multiple micro-moments. This makes our content work harder.

Classification of content types by micro-intents in the customer journey - Graph by Olaf Kopp, Aufgesang GmbH

Here are a few practical use cases for this type of transformation:

If you have a library of high-performing blog posts, you can easily transform blog posts to social media posts.
If you have a library of high-performing blog posts, but no newsletter, you can use an LLM to rewrite these into newsletter drafts
If you have comprehensive guides or reports in PDF format, you can extract key insights, summaries, or actionable tips from these documents and repurpose them into blogs or social posts/threads
If you have a website that could use an update on headings, titles, and meta descriptions, you can apply a programmatic approach utilising extractive summarisation or abstraction (like BERT) to speed up the process

Here’s a rundown of tools and technologies you can use, and the suitability and limitations of each.

I highly recommend checking out Caitlin Hathaway’s Content Repurposer GPT for a quick-start with a no-code tool for content repurposing. Utilizing any LLM API would be a great place to start for programmatically scaling text-to-text transformation.

Text-to-speech transformation

Text-to-speech technology (or TTS, for short) uses ML models to convert written text into spoken words. These models are trained to detect and replicate intricate patterns and nuances of human speech, allowing them to generate natural-sounding voices that can read out any given text.

Needless to say, this enables a ton of opportunities for platform enablement, like:

If you have a high-performing series of written interviews on your website, you can scale production via TTS to convert those into audio, and improve the accessibility of this content type for your audience
If you have tutorial-style content pieces on your website, you can speed-up the process of producing videos by recording natural-sounding demos
If you have a ton of customer reviews for products on your ecommerce store, you can use TTS to incorporate highlights from these reviews to create product ads more quickly

There are many TTS tools and platforms available, ranging from free to premium options, many of which offer not only different voices, but also voices with different dialects and personalities. Here are some of the main ones, alongside their ideal users and limitations.

Using a model like Google Cloud’s text-to-speech API can be as simple as selecting a language and voice from their library, customizing features like dialect and other voice characteristics, and bulk-processing text-files into audio.

Notably, there are also some web tools like Synthesia, the main function of which is creating a digital avatar that you can then use to create videos from a text input. I’d categorize tools like this as no-code, but they do require a video of you speaking to create a life-like digital avatar.

Speech-to-text transformation

Speech-to-text (STT) algorithms convert spoken language into written text. Such algorithms work by analyzing sound waves to identify individual sounds and words, which are then matched to patterns, and stored in the model’s database to accurately transcribe sounds into text.

When it comes to how this technology can be applied in marketing, here are a few ideas:

If you are collecting call transcripts, you can automate the conversion of audio to text, then extract entities, analyze overall sentiment, and sentiment associated with specific entities in Google Sheets
If you are producing podcasts or videos, you can use these models to create captions and subtitles more easily
If you are interested in investigating YouTube as a platform, you can easily perform competitor analysis by transcribing the videos in your niche, analyzing them to identify key entities and topics, and content structures, then mapping those to performance.

You can use a number of tools and technologies for STT transformation, each with specific limitations, and use cases.

I highly recommend checking out the programmatic approaches, as some like OpenAI’s speech-to-text functionalities can be incorporated into processes easily, even by complete beginners, with no coding experience.

When it comes to the programmatic approaches mentioned, Gladia’s blog post includes some pretty interesting insights. In short, OpenAI, Cloud Speech-to-Text, and Amazon Transcribe all have automated punctuation features, and automatically detect language. Cloud Speech-to-Text and Amazon Transcribe have other advanced features making them more superior than OpenAI.

Comparison of different speech-to-text algorithms by OpenAI, Google Cloud, and AWS on over 15 model capabilities - by Gladia.io

Text-to-video transformation

Text-to-video transformation takes words and transforms them into video. This is by far the most advanced and complex type of content transformation, and needless to say - truly, we’re not there yet from a technological standpoint. The aim for these types of models is to create full-blown videos from a prompt, or a script, including scenes and character design, music, and other stylistic elements.

Text-to-Video model concept, demonstrating the conversion from input (text) to output (video)

Creating text-to-video models is challenging as there isn’t enough paired text-video data to train the model on - mostly there are text-image pairs, which is why text-to-image models are improving at such a good rate in comparison. Computing a text-to-video model from scratch is not only costly, but time and computationally intensive. Meta has resolved this problem by utilizing existing text-to-image models as a foundation of their Make-A-Video Model, which was then trained on motion separately through unlabeled video datasets.

When it comes to generative AI text-to-video models, when you provide the text, NLP techniques are applied to break it down into key elements (like objects, actions, emotions), and then these are mapped to visual elements that match those descriptions, with motion applied.

Examples of these types of models are OpenAI’s SORA, and Meta’s Make-A-Video, but neither have been officially released to the public. There is also another type of text-to-video software available, which takes the text you type in, and maps it to stock videos and images, with some slight image-based motion applied. Examples include web-based programs like Invidio.

Text-to-video technology can be revolutionary to so many marketing functions:

If you are producing person-to-camera or tutorial-style videos already, generative AI text-to-video APIs can help you create more dynamic videos by creating original video segments based on parts of your script
If you are responsible for the content on social media platforms, or dedicated video platforms like YouTube and TikTok, you can create videos to support your social posts, or pursue Shorts.
If you are running an ecommerce store, you can create video descriptions for products - note, this might not be possible until the release of more sophisticated models

Let’s go over some of the main contenders in terms of tools in this category.

I highly recommend this blog post by Hugging Face for a further deep-dive and demos of existing models. My personal opinion on text-to-video software and APIs is that they would be able to supplement a person-to-camera set-up, but not replace the need of it entirely, especially with brand content, targeting informational and commercial intent.

Image-to-video transformation

In contrast to text-to-video, image-to-video is a bit easier to execute. The idea is to envision the motion in the provided image, which is only the second step in text-to-video. The way the machine learning models work here is by understanding spatial and temporal patterns – they not only understand image content but also predict the logical motion that might follow based on the image content identified.

In short, API and web image-to-video tools can analyze the content of an image, identify key objects and patterns, and then apply various techniques to create a dynamic video sequence.

The underlying technologies used are generative adversarial networks (GANs), and deep neural networks to translate still images into videos by predicting probable inter-frame transitions.

Here are some applications of this technology in marketing:

If you are running an ecommerce store, you can create motion videos from modeled photos, showcasing your products in action better.
If you are running a business that involves creating or managing a collection of high-quality photographs, you can turn some into short motion videos
If you have infographics or diagrams that explain a complex concept, turning them into animated videos can make the information more accessible and engaging.

Let me introduce you to the tools you can get started with image-to-video transformation with.

How to responsibly scale your content transformation program in the age of AI

Before I conclude this article, I want to offer some advice on making the best use of these transformation technologies:

Start your content transformation program from the most rich, human-centric content format. Meaning, if you plan on incorporating videos, start by recording them, then transform this content to blog and audio. If audio is your richest format - start with that.
Do what makes sense for your brand and audience - don’t do everything, just because you can, with no regard for the quality of the output or end result
Embrace automation for process enhancement, not as a replacement for a team of experts
Embrace automation for platform enablement, to expand your organic reach responsibly and sustainably

Due to the great ROI and productivity promises that ML-enabled automation has in many areas of business, there is a huge pressure for marketers to embrace automation. However, I believe strongly that we should do so in a way that’s responsible.

When incorporating automated content transformation programs, always keep the following in mind:

The real-life cost of replacing an expert with a program
The broader societal impact of widespread automated content (with no human element in it)
The diminishing effect on brand authority fully-automated content programs might have

Know where to draw the line before things become… freaky, robotic, or inauthentic!

Lazarina Stoy - Founder, MLforSEO

Lazarina Stoy is a Consultant in Organic Marketing, who has worked with countless B2B, SaaS, and big tech teams. She is also the Founder of MLforSEO–a machine learning training platform for organic search marketers, and Women in Marketing – Bulgaria community.