Artificial intelligence is transforming how we build software. Today it is possible to generate a complete CRUD in just a few minutes using tools like ChatGPT, Claude, Gemini and other AI platforms.

In this context, a strategic question emerges: which technology stack consumes fewer tokens and delivers greater productivity when generating code?

This question matters because token consumption directly impacts the cost of using the AI, the speed of generation, the clarity of responses, the ease of maintenance, and the amount of context available for business rules.

What tokens really are

Tokens are the units of text processed by language models. Each model has a finite token budget per request, and everything it reads or produces consumes part of that budget.

  • Whole words or word fragments.
  • Symbols and punctuation.
  • Source code snippets, with variable names, types and structures.
  • Spaces, line breaks and indentation.

The more verbose the code, the higher the token consumption. Frameworks that demand heavy boilerplate or extensive decorators drain the budget before the actual business logic even appears.

Why this matters

When building applications with AI assistance, part of the token budget is consumed by the structure of the code itself. Leaner stacks offer direct advantages to the engineering team.

  • Lower operational cost in calls to the model API.
  • Faster responses, because the context sent is smaller.
  • Less need to split prompts into multiple steps.
  • More room to describe business rules and specific requirements.
The fewer tokens infrastructure consumes, the more tokens are left for what actually generates value for the business.

Benchmark methodology

To make the comparison fair, we defined a standard scenario that any stack needs to deliver: a complete CRUD for the Customer entity, with explicit business rules and standardized REST endpoints.

The Customer entity has the following fields:

  • id
  • name
  • email
  • document
  • phone
  • status
  • created_at
  • updated_at

The expected REST endpoints are:

  • POST /customers, creates a new customer.
  • GET /customers, lists customers with pagination.
  • GET /customers/:id, retrieves a specific customer.
  • PUT /customers/:id, updates customer data.
  • DELETE /customers/:id, removes the customer.

Business rules that each implementation must enforce:

  • The name field is required.
  • The email must be unique among customers.
  • The document must be unique among customers.
  • The status only accepts the values active or inactive.
  • The listing returns paginated results.
  • Errors are standardized in JSON responses.

Evaluated stacks

  • PHP with Laravel.
  • Bun with Elysia and Drizzle.
  • Bun with Hono and Drizzle.
  • Go with Fiber and Ent.
  • Go with Gin and GORM.
  • TypeScript with NestJS and Prisma.

For each stack we track the same evaluation criteria:

  • Total tokens consumed during generation.
  • Lines of code produced.
  • Number of files generated.
  • Implementation time until the first local deploy.
  • Number of adjustments required after the first run.
  • Long-term maintenance complexity.

Expected result

Before running the benchmark in practice, it is worth recording the ranking hypothesis we hold today, based on real-world experience using AI for code generation:

  • 1. Bun with Elysia and Drizzle, excellent balance between conciseness and typing.
  • 2. PHP with Laravel, extremely high productivity and maturity.
  • 3. Bun with Hono and Drizzle, lean and simple structure.
  • 4. Go with Fiber and Ent, strong with schema-based code generation.
  • 5. Go with Gin and GORM, good productivity, but more verbose.
  • 6. TypeScript with NestJS and Prisma, robust architecture, but more detailed.

Laravel: the classic benchmark

Laravel remains a reference for productivity in the PHP world. A single line creates every REST route for a resource, and the ecosystem delivers practically everything a CRUD needs: validation, migrations, ORM, authentication and tests.

  • Mature conventions that reduce repetitive decisions.
  • Powerful CLI to generate controllers, models and migrations.
  • Eloquent as a robust and expressive ORM.
  • Large ecosystem with well-established packages.

Bun with Elysia and Drizzle: the new contender

Bun combines a modern runtime with native TypeScript support. Elysia offers an extremely concise API to define routes and validations, and Drizzle keeps database access typed, lightweight and close to SQL.

  • Lean code, with few lines to expose a complete route.
  • Excellent end-to-end type inference.
  • High performance on the Bun runtime.
  • Low boilerplate and decorator overhead.

Go with Ent: productivity through code generation

Go is traditionally more explicit than dynamic languages, but Ent significantly reduces boilerplate. By defining the entity schema, most CRUD operations are generated automatically, with strong types and safe queries.

This brings Go closer to the productivity of more opinionated frameworks, without giving up simple binaries and high performance in production.

NestJS with Prisma: enterprise architecture

NestJS is excellent for complex systems and large teams. Decorators, modules and dependency injection create a predictable architecture, even at the cost of more tokens per generated file.

  • Consistent architecture across modules and features.
  • Clear separation of responsibilities between controllers, services and repositories.
  • Strong adherence to enterprise patterns and DDD.
  • Mature ecosystem of integrations and testing tools.

The role of the framework

The language is only part of the equation. The biggest impact on token consumption comes from decisions made by the chosen framework.

  • Conventions that avoid rewriting the same structure for every feature.
  • Automatic code generation from schemas or definitions.
  • Level of abstraction chosen to encapsulate common rules.
  • Verbosity required by the framework's reference architecture.

In many cases the framework influences the token budget more than the language itself. Swapping frameworks within the same language can cut consumption in half.

What Anthropic says about this

This reasoning is not just the opinion of someone using AI day to day. In a recent article about Claude Code in large codebases, Anthropic itself reinforces that the environment around the model matters more than the model in isolation.

The ecosystem built around the model, the harness, determines how Claude Code performs more than the model alone.

Translated to our stack debate: the framework, its conventions, the chosen ORM and the default architecture act as the model's harness. That layer decides whether the AI spends tokens understanding your structure or advancing the business logic.

Claude's ability to help in a large codebase is bounded by its ability to find the right context.

Leaner stacks like Bun with Elysia and Drizzle deliver this legibility naturally: few files, inferred types and low indirection. More opinionated frameworks compensate with strong conventions, but the cost is paid in tokens per file.

Too much context loaded into every session degrades performance, while too little context leaves Claude to navigate blind.

The point is straightforward: there is a balance. Stacks that minimize structural noise leave more room for what matters, without falling into the opposite extreme of requiring additional context with every prompt.

Which stack to choose

The choice depends on which priority sits at the top of the project. Some combinations work better in specific scenarios.

  • Priority on lower token consumption: Bun with Elysia and Drizzle, followed by Laravel.
  • Priority on productivity and maturity: Laravel.
  • Priority on modern type safety: Bun with Elysia and Drizzle.
  • Priority on performance and simple binaries: Go with Fiber and Ent.
  • Priority on enterprise architecture: NestJS with Prisma.

My recommendation

For most business projects the recommendation is to adopt Bun with Elysia and Drizzle as a modern stack that is extremely efficient in token usage. The combination delivers strong typing, low boilerplate and enough performance for the vast majority of scenarios.

If the goal is maximum productivity with a consolidated ecosystem and libraries for practically any integration, Laravel remains an exceptional choice.

Conclusion

When using AI for development, token consumption becomes a strategic factor, not just a technical detail. Leaner stacks reduce cost, accelerate generation, simplify iterations and free up context for business rules.

The best stack is the one that maximizes the team's productivity and lets AI focus its effort on the logic that differentiates the product.

Bun with Elysia and Drizzle stands out by combining modernity, strong typing, performance and lean code. Laravel remains one of the most productive and mature platforms on the market for teams that value convention over configuration.

Limitations of this analysis

It is fair to acknowledge where this article does not yet close the loop. The reading presented here rests on practical experience using AI for code generation, not on controlled lab measurements.

  • We do not yet have absolute token counts per stack for the same scope.
  • The ranking presented is a working hypothesis, not a sealed result.
  • Differences between models (Claude, GPT, Gemini) may shift the order in specific cases.
  • Total cost of ownership also depends on talent availability, ecosystem support and operational costs outside code generation.

Anthropic's own material about Claude Code in large codebases acknowledges a similar point: the recommendations are qualitative, without absolute numbers for tokens consumed or cost per task. That is precisely the gap the next step of this benchmark intends to fill.

Next steps

In a future article I plan to run this benchmark in practice and present real metrics for each stack.

  • Tokens consumed per file and per iteration.
  • Lines of code generated for the same functional scope.
  • Generation time until the first local deploy.
  • Performance under simple load tests.
  • Ease of maintenance and evolution of the CRUD.

If you are also using AI to accelerate development, this comparison can help you choose a more efficient stack aligned with your product's technology strategy.