Investing in Deepset: Giving every enterprise it’s own AI platform

For over twenty years now, our team at Balderton have been investing in open-source solutions, machine learning applications, and enterprise software. Whether it was our early days supporting a new era for databases with MySQL and ETL with Talend, or our investments in Sophia Genetics, leveraging AI for cancer diagnosis, or Wayve’s application of computer vision for driving, we believe that huge and impactful businesses can be built by leveraging the best of OSS and AI.

While hugely impactful, the adoption curve for AI solutions has been relatively slow, especially in the enterprise. But in the last year this curve has been totally reshaped. Thanks to unprecedented improvement in the power of AI with transformer and diffusion models, in particular LLMs, and a heightened awareness of their potential through consumer successes like ChatGPT, the willingness to try, and to buy AI solutions has never been higher.

I've been blown away by the speed of development and quality of the demos we have seen in this time. However despite the many impressive examples of what these new models could do, very few use cases have made it into production so far.

To help us navigate this new paradigm, we published a three part series on some of the principles we were applying to make investments in such a fast changing sector. Today we’re pleased to announce our latest investment along this thesis into deepset, the company behind deepset Cloud and Haystack, who are helping some of the largest organisations in the world develop secure applications on top of LLMs using their own data.

Deepset immediately chimed with us. Firstly we feel that UX is one of the critical bottlenecks to AI adoption, rather than the accuracy of the underlying models and deepset Cloud and Haystack provide crucial abstractions allowing developers and end-users to extract maximum value from the LLMs with as much or little coding as they need.

We also saw that semantic search was by far the largest production use-case of LLMs and deepset Cloud has been hyper-focused on these applications for that reason.

Finally, we predicted that “Actionable LLMs” (i.e. models which generate actions rather than just content) would be the largest opportunity for AI going ahead - Haystack has been positioning itself well for this by being one of the first frameworks to introduce the concept of Agents.

On top of the above, deepset also fits with our broader theme of putting security at the heart of AI. I’ve sat on the board of companies using machine learning on top of incredibly sensitive data, like genetics, and know what a priority security must be. Companies in sectors like healthcare, finance, law and Government all need to put security and model transparency at the heart of the applications they are building, and deepset Cloud allows them to do just that.

The composability of deepset, enabled by the Haystack framework, was also something we instantly recognised as integral to the success of any company looking to thrive in the fast moving world of LLMs. In the last 12 months alone we have seen huge advances in different models abilities, with open source and closed source approaches, specialised models for different industries or compute requirements and many other pieces of infrastructure, like vector databases launching and scaling quickly. Deepset gives their users the ability to trial and compare different models with the click of a button, meaning their customers can keep up with the state of the art.

Finally, deepset’s team were both early and deep into this new field. Founded by Milos Rusic, Malte Pietsch, and Timo Möller, deepset recognised the transformative potential of NLP for enterprise in 2018 and became familiar with the hurdles in developing Large Language Model applications from their first experiences with Google’s BERT, publishing the first German BERT model.

To solve these challenges, deepset launched its open source framework for NLP applications in 2019, Haystack. Haystack is designed specifically to help put LLMs into production, giving developers the ability to choose whichever combination of LLMs (OpenAI, Cohere, LLAMA 2 etc) and databases they desire. This composability, as well as the maturity of the documentation and stability of the tools, has proven incredibly popular, being used by an estimated 50,000 users across the world. To further enhance their offering, deepset rolled out a cloud-based LLM developer platform, deepset Cloud, in 2022 to help enterprises to build applications on their most sensitive internal data securely.

Hear how deepset got started and the vision for enterprise AI here:

James Wise interviewing Deepset's founders on how the platform helps enterprises 

While it is easy to agree with the principles of security and composability on which deepset has been built, the clincher was in our own use of the tool. We're now using deepset Cloud at Balderton to query internal documentation - meaning we can answer questions on issues such as the terms of our legal contracts, to summarising old investment memos and our meeting notes. Historically even simple queries, like which of our portfolio companies have an ESOP over 10%, took hours of digging. Deepset can give us that answer in seconds. The ease with which you can switch pipelines (for example to try new models or test agents ) enables us to keep up with the state of the art in an instant and to customise the approach that we take for different data sets depending on the use case. It's enabled us to make better use of our long history in venture, and everything we have learned along the way. Our legal team is as excited about working with deepset as I am!

By combining best practices in OSS, their machine learning experience and enterprise-grade security, the team at deepset have built a hugely powerful platform. We look forward to helping them scale the product and the team in the years to come to bring secure, composable and scalable solutions for using AI in the enterprise.