May 20, 2026

Rethinking Build-Measure-Learn in 2026 When Everyone Can Ship

Product Manager

In 2011, Eric Ries wrote The Lean Startup for a world where building was slow and expensive. He introduced the build-measure-learn framework, the central engine of the lean startup methodology, as a direct response to a specific failure pattern.

It’s where companies spend months developing products in isolation, shipping them fully formed, and discovering too late that nobody wanted them. The loop was designed to compress that discovery cycle into fast, cheap iterations.

That world is gone. AI coding tools have collapsed the build phase from months to days, and as a PM shipping features at Userpilot, I see this every week. The fastest teams in 2026 are not winning because they ship faster than everyone else. Everyone ships fast now. They are winning because they measure better and learn faster.

In this article, I’ll walk through what that means in practice and how working product teams are leveraging AI to its full potential.

Does build-measure-learn still make sense in 2026?

Yes, and most of the criticism aimed at build-measure-learn in 2026 is really criticism of how teams interpret it, not at the loop itself.

Eric Ries has a line that has been quoted to death:

“A head start is rarely large enough to matter, and time spent in stealth mode — away from customers — is unlikely to provide a head start. The only way to win is to learn faster than anyone else.”

So I think these three pieces of the lean startup method are still true:

The first is validated learning: The most valuable asset of any early-stage product team is not money, code, or even users. It is validated information about user behavior. Code is cheap in 2026. Opinions are cheaper. Validated learning is the only thing in the loop that still counts.
The second is the rule to observe before you ask: One of the most underrated lines in The Lean Startup is “We must learn what customers really want, not what they say they want or what we think they should want.” That rule is the difference between a feature that ships and a feature that gets used.
The third is the embarrass-to-ship rule: If you are embarrassed by the first version of your feature, it is ready to ship. The point is to put it in front of users in the early stages and replace your guesses with data.

But in 2026, build isn’t the bottleneck anymore, and I can ship a feature in an afternoon. That means the most difficult parts have changed to measure and learn.

How has the build-measure-learn loop changed?

The single biggest change is that build no longer takes so much time and effort as before.

GitHub Copilot generates roughly 46% of the code it touches, internal AI coding tools at Userpilot have collapsed feature scaffolding from days to hours, and the development process across most of the industry is no longer rate-limited by typing speed. Iterative development that used to mean weeks per cycle now means hours per cycle.

What changed alongside the build collapse is who can run the loop at all.

I have three numbers that we should sit with.

Solo-founded startups went from 23.7% of all new companies in 2019 to 36.3% by mid-2025, according to Carta’s 2025 Solo Founders Report. The LinkedIn-expert-turned-founder is not an outlier story anymore. It is the default for new business model experimentation in 2026.
Lovable, the AI app builder, hit $200 million ARR in roughly a year with around 146 employees, doubling from $100M to $200M ARR in the four months from July to November 2025. CEO Anton Osika has credited a mix of staying in Europe, defining the vibe-coding category early, and an open-source audience that knew the team before the commercial product launched. That is what “everyone can distribute” looks like in practice. Audience first, product second.
40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025, per Gartner. The pattern most enterprise teams use to ship those agents is the early-access cohort, where we deploy one agent against one well-defined workflow, measure for four to six weeks against the business goals, and expand if the signal is good.

For example, Anthropic’s Project Glasswing runs Claude Mythos Preview through 12 founding partners and around 40 additional critical-infrastructure organizations on the cybersecurity side. Google launched its Gemini Enterprise Agent Platform in April 2026 to give large customers the same kind of governed rollout. That is the lean startup loop, run by Fortune 500 companies on agent features.

The combined story is that the build-measure-learn approach is no longer a thing only startups do. The solopreneur, the scale-up, and the enterprise are all running variations of the same iterative process, but who is running it most properly?

The measure part of the loop has also changed. The metric set most product teams use was designed for a world where every user was a human at a keyboard, clicking, scrolling, and triggering events.

In an account where an AI agent runs half the workflows through MCP, that assumption is wrong. We need new event types, customer feedback channels, and other ways to collect data from these non-traditional users, too.

That is why a big part of our AI strategy with our AI agent Lia is accelerating the measure part of the loop so product teams can spend more time on the learning part. Lia, our agent for Analytics, focuses on agent observability, frustration detection, topic trends, drop-off analysis, and measuring the retention or revenue impact of AI agents.

The new competitive advantage is not who can build the fastest, but who can learn the fastest from increasingly non-human systems.

Lia ties engagement signals to analytics in one view. The PM does not have to assemble the picture manually, which is how the measure phase gets compressed.

💡 Read related blog posts: Product analytics in 2026: How the measure phase is changing for product teams

What is the build-measure-learn playbook in the AI era?

If the build-measure-learn cycle is the table-stakes loop every product team runs, the differentiator is who can run it well.

The best working example of the 2026 playbook in motion is Lovable. It has the timeline, the doubles-down, and the sunset all in public.

Let me recap the Lovable arc:

Pre-launch: “GPT Engineer” open-source project gains 50,000+ GitHub stars. Audience precedes product.
Nov 2024: Commercial product launches as Lovable. Hits #1 on Product Hunt.
Jan 2025: Figma Import feature ships.
July 2025: Agent Mode launches. 91% fewer errors reported.
Aug 2025: Visual Editing and Design Viewship.
Nov 2025: Figma Import is removed. Lovable doubles down on “prompt-first or visual-edit-first.”
Dec 2025: $330M Series B at $6.6B valuation, $100M ARR, 2.3M+ users.

Now let’s talk about the playbook Lovable has run.

Rule 1: Build the smallest thing that earns the right to keep going

I think one thing people misunderstand about AI is that they assume the minimum viable product concept is outdated because building has gotten cheaper. I actually think the opposite happened. MVPs matter more because teams can ship so much faster that they can waste time faster, too.

The MVP patterns Ries and Blank talked about have been compressed.

The minimal viable products Ries and Blank wrote about in 2011 still work, and most of them got cheaper in 2026.

The landing page test: where you can spin up a one-page site describing the value proposition, drive traffic, and measure signup intent before you build. That used to take a week; now it takes an afternoon.
The concierge MVP: still works because you can manually deliver the workflow before investing in infrastructure.
The Wizard of Oz MVP: a working front-end with humans running the back-end. In 2026, this has gotten interesting again because the “person behind the curtain” can partially be an AI agent instead of an ops team.

That means the cost structure has reduced. Lovable is a good example of that. People talk about the growth story, but the important part is the sequencing. The open-source GPT Engineer project already proved there was demand. The 50,000-plus GPT Engineer GitHub stars were the validated learning. The commercial product was just the next experiment to see whether people would pay for it.

That’s why distribution-first MVPs are startups’ favorite thing now.

If you already have potential customers in a community, an audience, or design partners, your product can launch in a much rougher state because the distribution layer absorbs some of the friction.

The only mistake I’d mention is teams treating AI-generated output as if speed alone is the win when it’s not. If you do not define the outcome on your business model canvas before building and investing further, you are setting yourself up for failure.

Rule 2: Measure outcomes, not events, on both human and agent users

I think a lot of product metrics are not working anymore in 2026.

For example, DAU/MAU becomes less useful once agents enter the workflow because an agent does not need to “log in” every day to prove the product is valuable. Session length also becomes misleading. If an agent completes the task in 1.4 seconds instead of a human taking three minutes, that is a better product experience, not lower engagement.

When I asked James Mitchinson, Userpilot’s Head of Customer Success, what the most common pattern was on accounts that churned despite “good” usage numbers, his answer was a phrase that has stuck with our team:

“High logins, zero outcomes. Customers logging in, clicking around, generating clean event data, and getting nothing real done. The usage curve looked healthy. The renewal conversation did not.”

That means product teams need to measure outcomes that map to customer needs, instead of usage or activities. In Lovable’s case, I’d say their north star is “did the user ship an app,” instead of “did the user click around in the editor.”

Every AI product eventually moves in this direction because agents become a thing. You cannot rely on human interaction metrics anymore when part of the workflow is happening machine-to-machine.

So you need a second data collection layer:

Did the agent complete the task?
How many retries happened?
Where did the workflow fail?
When did the task escalate back to a human?
Did the customer reach the promised outcome that ties back to their business goals?

Rule 3: Learn passively first, then gather feedback from users

There are more products, copilots, and agents competing for the same jobs-to-be-done than ever before. So users behave differently. If something feels confusing, unreliable, or low-quality, most people do not give early feedback or spend time explaining the problem anymore, and they just switch tools.

Unless they are in an enterprise workflow where migration is painful, users have almost no reason to help you improve the product.

That’s not to mention survey response rates have been falling for a decade. Email NPS, the traditional user feedback channel, sits in the mid-teens. Refiner’s 2025 in-app survey benchmark put response rates at 27.5% on web and 36.1% on mobile, which is better than email’s 15% but still describes a world where two out of three users in the sample do not answer.

That is why the learning loop now starts with observation, not feedback. Something like this:

Funnels show where people drop.
Session replay shows what happened.
And surveys help validate the pattern afterward.

We had a case internally where an email workflow was underperforming. Funnel data narrowed it to one step. The replay showed users hesitating around a missing follow-up action. We added an in-app checklist that same afternoon, and the drop-off rate decreased.

💡 Read related blog posts: In-app surveys that actually get answered: A 2026 guide

Rule 4: Double down on what is working

The feedback loop only pays off if you act on what you learn. While the 2011 version of “act” was a backlog ticket and a Friday retrospective, that in 2026 should be a follow-up shipped within weeks of the signal.

Lovable’s product shipping velocity is an example. Agent Mode in July 2025. Visual Editing and Design View in August 2025. Two doubles-down within 60 days of each other, and I can say the team did not wait for an annual planning cycle.

And the more interesting signal is probably what they stopped investing in.

Lovable shipped the Figma Import in January 2025. Ten months later, in November 2025, they removed it. The stated reason was quality issues, but the replacement they invested in wasn’t a better importer. It was a deeper Visual Editor, which made the Figma pipeline unnecessary.

The underlying rule is one Ries borrowed from the engines-of-growth idea: pick the engine that is driving customer acquisition and pour into it.

If new sign-ups come from word of mouth, ship referral features. If they come from a specific use case, optimize and make the product better for that use case. The mistake teams make is treating “double down” as a slogan rather than as a quarterly calendar entry. Doubling down is a specific feature, on a specific date, against a specific signal.

This is also where AI agents become important internally. One of the directions we are exploring with Userpilot’s Lia is reducing the delay between detecting a signal and acting on it.

Because most teams don’t lack data anymore, they lack clarity and speed. That is why Lia is interesting to me conceptually. It surfaces dashboards, monitors product signals continuously, identifies friction or growth patterns, and helps teams decide where to focus next.

And I think that is ultimately where the competitive advantage is.

Rule 5: Sunset what is not working before it costs you

The pivot-or-persevere decision used to be binary. In 2026, it will have a third branch: sunset the feature to prevent the product from drifting away from its target market.

The framework I use for sunset decisions has three questions:

What is the cost-to-maintain over the next year, including support, docs, and tech debt?
What is the retention value, measured by churn delta, if the feature disappeared tomorrow?
What is the strategic clarity gain, measured by how much easier the product is to explain without it?

If cost-to-maintain is greater than retention value, and strategic clarity gain is meaningful, the feature should be sunset. Sunsetting lets the product earn back focus without abandoning the customers you already won.

Most teams do not sunset enough. The further development costs of carrying every feature you have ever shipped will eventually compound.

Will the build-measure-learn framework sustain in the future?

The honest answer is yes, and probably for longer than the 15 years it has already had.

Build-measure-learn is the shape that any iterative process takes when a team is trying to reduce uncertainty by running small experiments. The shape stays. The tools change. The vocabulary keeps adding words like MCP, agent satisfaction, and outcome events, but I think the underlying lean methodology is durable.

The build-measure phase keeps getting cheaper. The learn phase keeps demanding more judgment. That asymmetry is the structural reason the loop will keep mattering.

The first is that measurement moves from descriptive to predictive, surfacing customer satisfaction drops before they show up in the renewal conversation. That is the direction we are pushing Lia at Userpilot, where AI agents play a big part in giving product teams data upfront.

Lia’s predictive view: a usage trendline catches a drop weeks before it shows up in the monthly active users chart.

The second force is that distribution keeps democratizing. Every expert with an audience becomes a future founder. Every enterprise team becomes an internal startup shipping sub-feature agents to early-access cohorts. Continuous innovation becomes the default state, and the shape of the loop becomes universal, which means running it well, not running it at all, becomes the moat.

Measure and learn faster with Userpilot!

If the playbook above sounds like a lot of measurement work, that is because it is. The measure part of the build-measure-learn loop is where most product teams lose hours that should have gone into deciding the next experiment. Userpilot is built to give those hours back:

Userpilot Product Analytics and AI Agent Analytics give you outcome events across both human signals and agent signals in one customer segment view. The same dashboard answers “did the human in this account complete the workflow” and “did the agent in this account finish the task.”
Session Replay, Funnels, and In-App Surveys collapse the diagnostic triage into one. You can move from “where did it break” to “why” to “what the user thought” without switching tools.
Lia watches the combined signal continuously and surfaces patterns the team has not yet noticed. The PM owns the decision. Lia owns the legwork.

So book a demo, and we will walk you through it on your own data!

💡 Read related blog posts: Product-led growth in 2026: The era framework that maps to this loop

FAQ

What are the 7 stages of a startup?

The seven stages most product teams recognize are: idea, validation, MVP, iteration (the active build-measure-learn loop), product-market fit, growth, and maturity. The first three are where the lean startup playbook matters most. Validation is the cheapest stage to run experiments, because reputation and integration costs have not yet started compounding. Most teams that fail in the early stages do so because they skipped validation and went straight to MVP.

What are the 5 principles of lean startup?

Eric Ries names five principles in The Lean Startup that define the lean startup process. First, entrepreneurs are everywhere, meaning the framework applies inside enterprises and solo projects, not just venture-backed startups. Second, entrepreneurship is management, requiring a different kind of leadership for high-uncertainty work. Third, validated learning is the goal of every experiment. Fourth, build-measure-learn is the engine of that learning. Fifth, innovation accounting tracks progress against learning, not against revenue alone.

What is the BML process?

BML stands for build-measure-learn, the product development process Eric Ries codified in 2011. It has four moves. Form a hypothesis about what users want or how they will behave. Build the smallest test of that hypothesis you can, often a landing page, a concierge service, or a feature stub. Measure user behavior against the hypothesis with concrete metrics. Then learn: persevere with the current direction, pivot to a different one, or sunset the experiment entirely. Repeat the cycle, narrowing the hypothesis each time.

What is Eric Ries known for?

Eric Ries is the entrepreneur and writer best known for The Lean Startup, published in 2011, which introduced the build-measure-learn framework, validated learning, and innovation accounting into mainstream product vocabulary. He has also written The Startup Way and founded Lean Startup Co.

About the author

Abrar Abutouq