Why we need a new UX for AI to succeed

This post was written as part of our series on investing in AI with my colleague Sivesh.

For those new to this space, we’re undergoing one of the biggest technological shifts since cloud computing. We won’t spend too much time on this shift as there are many great summaries and market maps being published but here’s a slide we created to summarise the stages of development in AI:

One question that often comes up when exploring the recent cambrian explosion of AI-powered applications is defensibility. This is because investors and founders are very aware that AI models are becoming commoditized so the value of the neural network in an AI application is quickly decreasing.

These models have been released out in the open by companies such as OpenAI and Stability.ai. If you haven’t played around with GPT-3 (a model hosted by OpenAI) yet, I strongly advise you do, so you can appreciate the almost magical power of off-the-shelf models. It’s worth noting that even if you wanted to build your own models it’s now becoming increasingly difficult, as AI research is becoming a function of balance sheet strength.

NLP has advanced a lot in the last decade — LSTM networks vastly increased the performance of RNNs, due to their ability to selectively remember or forget different parts of a sequence. Transformers and the “attention mechanism” (alongside scaling laws) then produced a step-change in what was possible as these models began to really understand language. Now, Diffusion Models are reshaping how we can generate content of all forms. It used to be the case that breakthroughs in deep learning were used to build competitive advantage, however, there’s been a cultural and technological shift (Hugging Face had a big part to play) in putting these pre-trained models out into the open. Developers can now embed state of the art AI into their products with a few lines of code.

But how do you build a competitive advantage when everyone is using the same models?

Own the UX

We’re only just beginning to understand the powers hidden within the latent space of these large models. The more context you can provide to these models, the better they get and they can quickly become very good at tasks that previously required a huge amount of robust engineering. One question to ask is “Will the only interface to these powerful models forever be a simple, static text box?”.

Most consumers are very non-AI-aware so just putting them in front of a model will very rarely help them in solving their problem. UX design is a big (and maybe even the biggest) problem in AI products today.

There must be a focus on comprehensive workflows to help gather the required context from users for optimal prompts. There must also be intuitive workflows for gathering feedback data for fine-tuning of models to further build a moat.

There’s a big gap between the “wow” moment of a generative model and getting someone to become a paying user — the majority of this gap is filled with a great UX. Jasper.ai is a great example of a company that has executed on this and is now rumoured to be approaching $80m ARR, having only launched last year.

Leverage Prompt Engineering

There’s now a blank canvas when looking at how we interact with AI, which has led to a shift in focus to Prompt Engineering rather than statistical models. This broadly means engineering your input to a model, optimising for ease, accuracy and cost. A few examples are:

  • Zero-Shot — a natural language prompt as if you’re asking a toddler (who’s read most of the entirety of Wikipedia) to do something e.g. the input would be “task description”:{target text}. This is clearly the simplest way of interacting with AI.
  • Few-Shot — adding a few examples and some context on the expected output (see image below). This requires more “engineering” but can have a vast improvement on accuracy. However, the addition of context into each prompt means it can cost a lot more (more on this below).
  • Fine-Tuning — taking many (hundreds or thousands) examples and re-training a pre-trained model to change the parameters such that you no longer need to include examples in each prompt. This process can be very expensive and can cost $millions but once it’s done, it’s done.

For most startups building on LLMs, few-shot learning seems to be the best balance of ease, accuracy and cost. It’s worth digging into how cost works, taking text applications built on GPT-3 as an example (see Model-nomics section).

Source: OpenAI — Language Models are Few-Shot Learners

Focus on Use-case

AI is becoming a platform, similar to Cloud or Mobile. There are many companies focusing on building that platform and there’s no doubt that they’ll capture a huge amount of value, evidenced by the $20bn valuation of OpenAI. However, there’s a reason AWS doesn’t focus on building vertical SaaS solutions — it’s extremely hard to focus on both building a platform and building use cases on that platform, further evidenced by Apple’s mediocre Apps. We believe that there’s a huge amount of value to be unlocked in focusing on specific AI use-cases and applications, similar to how the Uber business model was unlocked by mobile.

However, this business case must be taken with a pinch of salt. There are many use cases of AI that very much sit within the “feature” bucket, rather than a full product. PhotoRoom, which we recently partnered with, was one of the first companies to leverage Stable Diffusion in building a very practical AI feature and has now accelerated growth. Many larger corporations, such as Notion and Microsoft, are now leveraging off-the-shelf models for enhancement of their products further evidencing that the strategy of owning UI and prompt engineering rather than building your own models seems to be winning.

We should also caveat that in some circumstances it can make sense to own the model and build AI from the ground up. One particularly exciting area is in Decision Transformers and leveraging the breakthrough model architecture to generate actions rather than just content. Adept.ai is an awesome company doing just this. I’ll explore this further in another post…

Understand Model-nomics

OpenAI charges $0.02 per 1000 tokens (roughly 750 words) and this came down from $0.06 this summer. When using few-shot learning, up-to 90% of the prompt can be “context” meaning costs can be ~10x that of zero-shot. Smart businesses can build advantage by optimising the “context” vs “target text” ratio and doing clever things such as removing any words from the “target text” that don’t affect the output.

It’s clear that businesses built on third party models are potentially at pricing risk in the same way businesses built on the Cloud are at pricing risk to cloud providers. We believe that AI will find the same balance Cloud has, in that the value generated justifies paying for the agility and power provided by third party companies such as OpenAI. Many early stage AI businesses we’ve met are able to operate with a gross margin of 70–80% and we think this will increase as they’re able to increase their value proposition going forward.

Additionally, it’s worth noting that the majority of cloud compute is already for deep learning use-cases. This signals that the steady state of pricing for AI platforms may be in the same region as cloud compute is today, which most businesses seem okay with.

There’s a realistic probability of compute power catching up with model expansion so that state of the art (or at least near state of the art) models can be run on-device — this would mean the marginal cost of AI tends to zero. Stability.ai is already able to run some of their models on-device, which could eventually bring the marginal cost down to zero for some tasks. There are also an increasing number of AI platforms (Cohere, AI21 etc…), many of which are choosing to open-source their models. It’s also worth mentioning that there are some clever ways to minimise costs, such as model distillation.

This limits their overall pricing power and they may have to come up with more creative business models, in addition to API calls, in-order to monetise their research.

Conclusion

If the market is large enough, we believe there is huge potential for start-up disruption by building applications from the ground up, with off-the-shelf models at their core. For example, Gong and Otter are two great companies who have built products on top of proprietary transcription models.

Now that state-of-the art transcription models are out in the open and the cost of AI is tending to zero, it’s a level playing field. This has opened up a huge opportunity for start-ups to capture value in the massive productivity market by owning the UX and Prompt Engineering layer. If you’re a founder leveraging AI, we’d love to speak with you.