Jul 21, 2023

Building With the New AI Stack

by Phin Barnes

As a new architecture for AI applications comes into focus, we’re seeing more and more venture firms share their takes and predictions. Over the past few weeks alone, Sequoia, Greylock, and a16z have each published helpful frameworks and findings informed by the conversations happening among AI startup founders and engineers.

At TheGP, we’ve been forming our perspective by experimenting with new AI applications ourselves. We’ve worked in the trenches to build AI products and features for portfolio companies which have deepened our understanding of what’s working, where problems persist, and how new startups will fill in the gaps. Here’s some of what we’ve learned over the past few months from those projects:

We've chosen to build—not buy—our LLM observability tools

Anyone building applications powered by language models needs a way to monitor the quality of the model outputs. In helping ship Dynaboard AI (the project was led by Steven Li on our team), we looked across the market for a solution that offered a rigorous, quantitative approach to evaluating outputs but couldn’t find exactly what we were looking for. In the end, we built our own. It eliminated any dependency and set up a simpler, more effective system for our needs.

We're betting on the models getting better, faster and cheaper

When we started our work with Dynaboard, the models were too slow (and too expensive) to power their UI-generation features. Rather than de-prioritize the features, we bet that the trend of dropping costs and reduced latency would continue, and set the product roadmap accordingly. By the time Dynaboard AI was ready to ship, model improvements had solved our latency issues, allowing us to deliver a usable feature.

Open-source models haven’t been the best option for our production use cases (yet)

Across our engagements with portfolio companies, we haven’t yet encountered a scenario where it made more sense for us to use an open-source LLM like LLaMa over a closed-source LLM like GPT-3.5/4. Not only is open-source performance still catching up, but getting and keeping those models running is often more expensive than just using the private model alternative. Where the open-source models have made sense so far are the narrower use cases and smaller models that don’t require expensive GPUs to run.

It's wise to maintain a code-first, AI-second product development approach

Used at the right moment, an LLM can unlock magical user experiences and new functionality. Used everywhere, a cascade of errors will render your product unusable. Reserve the LLM for only those things that can't be solved with code.

Going forward

It’s important to note that this is only a snapshot of what it’s like to build with LLMs today. We expect the tradeoffs and best practices to evolve rapidly. As Taras Glek on our team noted earlier this month, open-source models are growing increasingly competent and may soon be the default option for those optimizing for latency, cost, and domain-specific accuracy. As ever, we plan to learn more as we build, which will allow us to constantly question and revise our thinking—a necessity in our new AI norm.

Builders—do your experiments line up with these themes? We’d love to hear your strong opinions and findings. Get in touch to chat with our investment or engineering team:

phin@thegp.com

ben@thegp.com

Many thanks TheGP engineers for their valuable insights in putting this report together.

Up Next

Apr 20, 2025

Partnering with Kenzo Security

Helping defenders move faster than threats

by Dan Portillo

Mar 12, 2025

The Art of Intelligence

AI knows everything we've done, but perhaps nothing we've felt

by Lilian Caylee Wang

Portfolio Team About Field Notes