As a new architecture for AI applications comes into focus, we’re seeing more and more venture firms share their takes and predictions. Over the past few weeks alone, Sequoia, Greylock, and a16z have each published helpful frameworks and findings informed by the conversations happening among AI startup founders and engineers.
At TheGP, we’ve been forming our perspective by experimenting with new AI applications ourselves. We’ve worked in the trenches to build AI products and features for portfolio companies which have deepened our understanding of what’s working, where problems persist, and how new startups will fill in the gaps. Here’s some of what we’ve learned over the past few months from those projects:
We've chosen to build—not buy—our LLM observability tools
Anyone building applications powered by language models needs a way to monitor the quality of the model outputs. In helping ship Dynaboard AI (the project was led by Steven Li on our team), we looked across the market for a solution that offered a rigorous, quantitative approach to evaluating outputs but couldn’t find exactly what we were looking for. In the end, we built our own. It eliminated any dependency and set up a simpler, more effective system for our needs.
We're betting on the models getting better, faster and cheaper
When we started our work with Dynaboard, the models were too slow (and too expensive) to power their UI-generation features. Rather than de-prioritize the features, we bet that the trend of dropping costs and reduced latency would continue, and set the product roadmap accordingly. By the time Dynaboard AI was ready to ship, model improvements had solved our latency issues, allowing us to deliver a usable feature.
Open-source models haven’t been the best option for our production use cases (yet)
Across our engagements with portfolio companies, we haven’t yet encountered a scenario where it made more sense for us to use an open-source LLM like LLaMa over a closed-source LLM like GPT-3.5/4. Not only is open-source performance still catching up, but getting and keeping those models running is often more expensive than just using the private model alternative. Where the open-source models have made sense so far are the narrower use cases and smaller models that don’t require expensive GPUs to run.
It's wise to maintain a code-first, AI-second product development approach
Used at the right moment, an LLM can unlock magical user experiences and new functionality. Used everywhere, a cascade of errors will render your product unusable. Reserve the LLM for only those things that can't be solved with code.
Going forward
It’s important to note that this is only a snapshot of what it’s like to build with LLMs today. We expect the tradeoffs and best practices to evolve rapidly. As Taras Glek on our team noted earlier this month, open-source models are growing increasingly competent and may soon be the default option for those optimizing for latency, cost, and domain-specific accuracy. As ever, we plan to learn more as we build, which will allow us to constantly question and revise our thinking—a necessity in our new AI norm.
Builders—do your experiments line up with these themes? We’d love to hear your strong opinions and findings. Get in touch to chat with our investment or engineering team:
Many thanks TheGP engineers for their valuable insights in putting this report together.
What our engineers talk about when they talk about AI
The nuance and creativity required for AI leaps demand human intuition.