Artificial intelligence is transforming how we build software. Today it is possible to generate a complete CRUD in just a few minutes using tools like ChatGPT, Claude, Gemini and other AI platforms.
In this context, a strategic question emerges: which technology stack consumes fewer tokens and delivers greater productivity when generating code?
This question matters because token consumption directly impacts the cost of using the AI, the speed of generation, the clarity of responses, the ease of maintenance, and the amount of context available for business rules.
What tokens really are
Tokens are the units of text processed by language models. Each model has a finite token budget per request, and everything it reads or produces consumes part of that budget.
- Whole words or word fragments.
- Symbols and punctuation.
- Source code snippets, with variable names, types and structures.
- Spaces, line breaks and indentation.
The more verbose the code, the higher the token consumption. Frameworks that demand heavy boilerplate or extensive decorators drain the budget before the actual business logic even appears.
Why this matters
When building applications with AI assistance, part of the token budget is consumed by the structure of the code itself. Leaner stacks offer direct advantages to the engineering team.
- Lower operational cost in calls to the model API.
- Faster responses, because the context sent is smaller.
- Less need to split prompts into multiple steps.
- More room to describe business rules and specific requirements.
“The fewer tokens infrastructure consumes, the more tokens are left for what actually generates value for the business.”
Benchmark methodology
To make the comparison fair, we defined a standard scenario that any stack needs to deliver: a complete CRUD for the Customer entity, with explicit business rules and standardized REST endpoints.
The Customer entity has the following fields:
- id
- name
- document
- phone
- status
- created_at
- updated_at
The expected REST endpoints are:
- POST /customers, creates a new customer.
- GET /customers, lists customers with pagination.
- GET /customers/:id, retrieves a specific customer.
- PUT /customers/:id, updates customer data.
- DELETE /customers/:id, removes the customer.
Business rules that each implementation must enforce:
- The name field is required.
- The email must be unique among customers.
- The document must be unique among customers.
- The status only accepts the values active or inactive.
- The listing returns paginated results.
- Errors are standardized in JSON responses.
Evaluated stacks
Nine stacks went through the same prompt, ranging from ecosystems where CRUD is the central use case to languages whose strength lies in different kinds of services. The diversity was intentional: we wanted to see whether stacks traditionally associated with CRUD actually pay dividends when the AI is the one doing the typing.
- PHP with Laravel.
- Python with Django and Django REST Framework.
- Bun with Elysia and Drizzle.
- Bun with Hono and Drizzle.
- TypeScript with NestJS and Prisma.
- Java with Spring Boot and JPA.
- C# with ASP.NET Core and Entity Framework Core.
- Go with Fiber and Ent.
- Go with Gin and GORM.
The criteria of this benchmark split into three groups, separated by how each one was evaluated:
- Measured directly from the model's response: total tokens consumed during generation, lines of code produced and number of files generated.
- Evaluated qualitatively in this article: adherence of the code to real functionality, based on a line-by-line review of the outputs to identify how much rework each stack would require.
- Reserved for the next cycle: implementation time until the first local deploy and long-term maintenance complexity, both requiring code execution and multiple iterations.
Benchmark results
We ran the benchmark locally with Qwen 2.5 Coder 14B through Ollama on a MacBook Pro. Each stack was executed three times to neutralize the model's natural variance, and the reported number is the median of output tokens produced during code generation.
- 1. Python with Django and Django REST Framework — 681 tokens, 92 lines, 6 files
- 2. PHP with Laravel — 725 tokens, 138 lines, 5 files
- 3. Bun with Elysia and Drizzle — 1,205 tokens, 145 lines, 7 files
- 4. Go with Gin and GORM — 1,247 tokens, 186 lines, 5 files
- 5. Go with Fiber and Ent — 1,317 tokens, 211 lines, 5 files
- 6. TypeScript with NestJS and Prisma — 1,393 tokens, 209 lines, 9 files
- 7. Java with Spring Boot and JPA — 1,514 tokens, 281 lines, 7 files
- 8. Bun with Hono and Drizzle — 1,587 tokens, 200 lines, 8 files
- 9. C# with ASP.NET Core and Entity Framework Core — 1,981 tokens, 341 lines, 7 files
Four takeaways stand out. First, the podium shows that stacks where CRUD is the central use case, with strong conventions and implicit route generation, spend significantly fewer tokens. Django led with 681 tokens thanks to DRF's ModelViewSet, which replaces five REST routes with a single class. Laravel came in second, only 44 tokens behind, with Route::apiResource playing an equivalent role.
Second, the gap between first and last was nearly three times the consumption. C# with ASP.NET Core and Entity Framework Core ended ninth at 1,981 tokens — a direct result of the verbose style of the .NET platform, with separate DTOs, explicit DataAnnotations, formal repositories and a Program.cs configured through the builder pattern. All of that is ergonomic for the team, but it gets expensive when the AI is the one doing the typing.
Third, the Go stacks landed in the middle of the ranking, not at the top. Explicit conventions, manual parsing and individual error handling in each handler add up in tokens. Go is an excellent choice for infrastructure services or high-performance APIs, but the typical CRUD domain favors languages with more convention.
Fourth, NestJS came in sixth at 1,393 tokens, a better position than the apparent weight of its decorators and modules would suggest. The structure is repetitive, but it replaces initialization and routing code that other stacks have to describe explicitly. In terms of absolute tokens per entity, it holds its ground. The real weight of NestJS shows up when the number of entities grows — we will come back to that in the complexity analysis.
Top 3 highlighted in purple. Model: Qwen 2.5 Coder 14B via Ollama.
Django with DRF: the benchmark leader
Django dominated the benchmark with 681 tokens, a direct result of Django REST Framework. ModelViewSet replaces the five REST routes with a single class; DefaultRouter exposes the full set with two lines in urls.py. That is what happens when the framework was designed around CRUD and the AI can lean on those conventions without reinventing the wheel.
- ModelViewSet automates list, create, retrieve, update and delete with no additional code.
- DefaultRouter exposes all REST routes from a single register call.
- ModelSerializer infers fields from the model, eliminating duplication.
- Automatic migrations via makemigrations and migrate.
In the benchmark, Django delivered a four-line ViewSet that covers the entire CRUD:
from rest_framework import viewsets
from .models import Customer
from .serializers import CustomerSerializer
class CustomerViewSet(viewsets.ModelViewSet):
queryset = Customer.objects.all()
serializer_class = CustomerSerializerThe router wraps it up in three more lines, with no need to declare each endpoint individually:
from django.urls import path, include
from rest_framework.routers import DefaultRouter
from .views import CustomerViewSet
router = DefaultRouter()
router.register(r'customers', CustomerViewSet)
urlpatterns = [path('', include(router.urls))]Laravel: the classic benchmark
Laravel came in second at 725 tokens, only 44 tokens behind Django. A single line creates every REST route for a resource, and the ecosystem delivers practically everything a CRUD needs: validation, migrations, ORM, authentication and tests.
- Mature conventions that reduce repetitive decisions.
- Powerful CLI to generate controllers, models and migrations.
- Eloquent as a robust and expressive ORM.
- Large ecosystem with well-established packages.
In the benchmark, Laravel generated the complete routing in a single line of code. The entire definition of the five REST endpoints fits here:
use Illuminate\Support\Facades\Route;
use App\Http\Controllers\API\CustomerController;
Route::apiResource('customers', CustomerController::class);The controller that powers those routes is equally concise, delegating validation to Form Requests and serialization to Resources:
class CustomerController extends Controller
{
public function index(Request $request)
{
$page = $request->input('page', 1);
$pageSize = $request->input('pageSize', 10);
return CustomerResource::collection(
Customer::paginate($pageSize, ['*'], 'page', $page)
);
}
public function store(CustomerRequest $request)
{
$customer = Customer::create($request->validated());
return new CustomerResource($customer);
}
public function show(Customer $customer)
{
return new CustomerResource($customer);
}
public function update(CustomerRequest $request, Customer $customer)
{
$customer->update($request->validated());
return new CustomerResource($customer);
}
public function destroy(Customer $customer)
{
$customer->delete();
return response()->json(null, 204);
}
}Bun with Elysia and Drizzle: modern typing in TypeScript
Bun combines a modern runtime with native TypeScript support. Elysia offers an extremely concise API to define routes and validations, and Drizzle keeps database access typed, lightweight and close to SQL.
- Lean code, with few lines to expose a complete route.
- Excellent end-to-end type inference.
- High performance on the Bun runtime.
- Low boilerplate and decorator overhead.
In the benchmark, all five CRUD routes chain together in a single Elysia object, with no decorators and no extra routing files:
import { Elysia } from 'elysia';
import { CustomerService } from '../services/CustomerService';
const customerService = new CustomerService();
export const customersRoute = new Elysia()
.post('/', async ({ body }) => {
return await customerService.create(body);
})
.get('/', async ({ query }) => {
const page = parseInt(query.page ?? '1', 10);
const pageSize = parseInt(query.pageSize ?? '10', 10);
return await customerService.list(page, pageSize);
})
.get('/:id', async ({ params }) => {
return await customerService.get(params.id);
})
.put('/:id', async ({ params, body }) => {
return await customerService.update(params.id, body);
})
.delete('/:id', async ({ params }) => {
await customerService.delete(params.id);
return { success: true };
});Go with Ent: productivity through code generation
Go is traditionally more explicit than dynamic languages, but Ent significantly reduces boilerplate. By defining the entity schema, most CRUD operations are generated automatically, with strong types and safe queries.
This brings Go closer to the productivity of more opinionated frameworks, without giving up simple binaries and high performance in production. Even so, it landed in the middle of the table at 1,317 tokens: each handler has to explicitly handle parsing, validation and error handling, without the syntactic sugar of frameworks dedicated to CRUD. Go is an excellent choice for infrastructure services or high-performance APIs — for application CRUD, other stacks pay fewer tokens for an equivalent structure.
func (h *CustomerHandler) Create(c *fiber.Ctx) error {
var input struct {
Name string `json:"name" validate:"required"`
Email string `json:"email" validate:"required,email"`
Document string `json:"document" validate:"required"`
Phone string `json:"phone,omitempty"`
Status string `json:"status,omitempty"`
}
if err := c.BodyParser(&input); err != nil {
return c.Status(http.StatusBadRequest).JSON(fiber.Map{
"error": "Invalid input",
"message": err.Error(),
})
}
customer, err := h.client.Customer.Create().
SetName(input.Name).
SetEmail(input.Email).
SetDocument(input.Document).
SetPhone(input.Phone).
SetStatus(input.Status).
Save(context.Background())
if err != nil {
return c.Status(http.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to create customer",
"message": err.Error(),
})
}
return c.JSON(customer)
}NestJS with Prisma: enterprise architecture
NestJS is excellent for complex systems and large teams. Decorators, modules and dependency injection create a predictable architecture, and it would be reasonable to expect this architectural overhead to pay dearly in tokens.
- Consistent architecture across modules and features.
- Clear separation of responsibilities between controllers, services and repositories.
- Strong adherence to enterprise patterns and DDD.
- Mature ecosystem of integrations and testing tools.
The numbers showed the opposite. NestJS came in sixth at 1,393 tokens of median, close to the middle of the table. The decorators are repetitive, but they replace initialization and routing code that other stacks have to describe explicitly — and the AI leverages that pattern to generate compact handlers:
@Controller('customers')
export class CustomerController {
constructor(private readonly customerService: CustomerService) {}
@Post()
async create(@Body() dto: CreateCustomerDto) {
try {
const created = await this.customerService.create(dto);
return { data: created };
} catch (error) {
return { error: true, message: error.message };
}
}
@Get()
async findAll(
@Query('page') page: number,
@Query('pageSize') pageSize: number,
) {
return { data: await this.customerService.findAll(page, pageSize) };
}
@Get(':id')
async findOne(@Param('id') id: string) {
return { data: await this.customerService.findOne(id) };
}
@Put(':id')
async update(
@Param('id') id: string,
@Body() dto: UpdateCustomerDto,
) {
return { data: await this.customerService.update(id, dto) };
}
@Delete(':id')
async remove(@Param('id') id: string) {
return { data: await this.customerService.remove(id) };
}
}Spring Boot with JPA: enterprise on the JVM
Spring Boot came in seventh at 1,514 tokens. JpaRepository eliminates part of the work — an empty interface automatically gains findAll, findById, save and delete — but the rest of the ecosystem charges its price in explicit ergonomics: annotations on each entity property, mandatory getters and setters in traditional Java, separate DataSource configuration and a service layer formally injected via @Autowired. In return, the team gets a framework with decades of adherence to the enterprise world.
package com.example.demo.repository;
import com.example.demo.model.Customer;
import org.springframework.data.jpa.repository.JpaRepository;
import java.util.UUID;
public interface CustomerRepository extends JpaRepository<Customer, UUID> {
}The equivalent controller keeps the Spring pattern of @Autowired injection and @RequestMapping routing. Each handler explicitly describes the HTTP method and return type:
@RestController
@RequestMapping("/customers")
public class CustomerController {
@Autowired
private CustomerService customerService;
@GetMapping
public Page<Customer> getAllCustomers(Pageable pageable) {
return customerService.getAllCustomers(pageable);
}
@PostMapping
public ResponseEntity<Customer> createCustomer(@RequestBody Customer customer) {
Customer created = customerService.createCustomer(customer);
return ResponseEntity.created(null).body(created);
}
@PutMapping("/{id}")
public ResponseEntity<Customer> updateCustomer(
@PathVariable UUID id,
@RequestBody Customer customerDetails
) {
return ResponseEntity.ok(customerService.updateCustomer(id, customerDetails));
}
}ASP.NET Core with EF Core: verbose by design
C# with ASP.NET Core and Entity Framework Core finished ninth at 1,981 tokens, almost three times the consumption of Django. This is not necessarily a criticism: the .NET platform traditionally prioritizes expressiveness, strict separation of responsibilities and declarative validation through DataAnnotations. Every one of these choices adds tokens to the generation.
- Separate DTOs for create, update and read, each with their own validation annotations.
- Formal repository pattern with a dedicated interface and implementation injected through DI.
- OnModelCreating in DbContext to configure EF Core behaviors.
- Program.cs with builder pattern, service registration and HTTP pipeline configuration.
The result is highly structured code, but the AI has to describe each of these pieces explicitly to deliver the CRUD. The difference compared to Django, which does almost everything via ModelViewSet, shows up clearly in Program.cs:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();
builder.Services.AddDbContext<CustomerDbContext>(options =>
{
options.UseNpgsql(builder.Configuration.GetConnectionString("DefaultConnection"));
});
builder.Services.AddScoped<ICustomerRepository, CustomerRepository>();
var app = builder.Build();
if (app.Environment.IsDevelopment())
{
app.UseSwagger();
app.UseSwaggerUI();
}
app.UseHttpsRedirection();
app.UseAuthorization();
app.MapControllers();
app.Run();The role of the framework
The language is only part of the equation. The biggest impact on token consumption comes from decisions made by the chosen framework.
- Conventions that avoid rewriting the same structure for every feature.
- Automatic code generation from schemas or definitions.
- Level of abstraction chosen to encapsulate common rules.
- Verbosity required by the framework's reference architecture.
In many cases the framework influences the token budget more than the language itself. Swapping frameworks within the same language can cut consumption in half.
“Every extra file your architecture requires is an invisible tax on every conversation with the AI.”
The complexity of token consumption
The benchmark measured a single case: one Customer entity with seven fields. Real systems don't have one entity, they have dozens. It is worth formalizing how this token consumption grows when N entities enter the scope, because that is where the difference between a trivial technical choice and a strategic one shows up.
For every stack evaluated, the token consumption to generate a system with N entities is approximately linear in N:
“T(N) ≈ a + b × N”
Where a is the fixed boilerplate cost (database configuration, server bootstrap, root module) and b is the marginal cost per entity (migration, model, controller, validation, route). In Big O notation, this is O(N): growth is proportional to the number of entities, with no quadratic or exponential terms. The good news is that linearity means predictability; the bad news is that the slope b takes its toll with every new entity.
Estimating a and b from the proportion of fixed files versus per-entity files in the benchmark, and applying them to the tokens measured with N=1, we arrive at the following parameters and projections for a system with ten entities:
- Django with DRF:
a ≈ 40tokens,b ≈ 640tokens per entity. Projection forN=10: about 6,440 tokens. - Laravel:
a ≈ 25,b ≈ 700. Projection forN=10: about 7,025 tokens. - Bun with Elysia and Drizzle:
a ≈ 170,b ≈ 1,035. Projection forN=10: about 10,520 tokens. - Go with Gin and GORM:
a ≈ 140,b ≈ 1,107. Projection forN=10: about 11,210 tokens. - Go with Fiber and Ent:
a ≈ 210,b ≈ 1,107. Projection forN=10: about 11,280 tokens. - Spring Boot with JPA:
a ≈ 250,b ≈ 1,264. Projection forN=10: about 12,890 tokens. - NestJS with Prisma:
a ≈ 75,b ≈ 1,318. Projection forN=10: about 13,255 tokens. - Bun with Hono and Drizzle:
a ≈ 230,b ≈ 1,357. Projection forN=10: about 13,800 tokens. - C# with ASP.NET Core and EF Core:
a ≈ 400,b ≈ 1,581. Projection forN=10: about 16,210 tokens.
The reading is revealing. Django and Laravel keep the lead at any N because they have the lowest slopes b in the table, a direct consequence of strong conventions (DRF's ModelViewSet, Laravel's Route::apiResource). C# with ASP.NET, on the other hand, has the highest b — each new entity demands a create DTO, an update DTO, a read DTO, validations on each one and configurations in the DbContext. In large projects, the gap between first and last exceeds two and a half times the token consumption.
NestJS is another interesting case: it landed in sixth at N=1, but its b is high. Each new entity demands a module, controller, service, create dto, update dto, interface, model and schema. In projects with few entities, NestJS is competitive; in large systems, the slope catches up and it tends to surpass both Spring Boot and Go.
This linear model holds as long as entities live isolated from each other. Once they start to relate to one another, and the AI has to describe these relationships to produce consistent code, the analysis changes. If each of the N entities can relate to every other, describing the full graph grows in the worst case as O(N²). In practice, real systems have a few highly connected hubs and many nearly isolated entities, so growth lands between O(N) and O(N²), typically closer to linear.
Another cost shows up as the system grows: maintenance. Every new modification requires the model to load part of the existing code as context to avoid breaking invariants already in place, and that context also scales with N. The total cost of ownership for AI-generated code has two terms: the initial generation cost and the cost of each subsequent iteration, both growing with N. Stacks that minimize tokens per entity reduce both at once, and the gain compounds with every iteration.
Finally, there is a hard ceiling. Models have a finite context window. When the existing code exceeds this window, the AI has to choose what to read, and that selection has its own cost. Stacks that deliver dense context, with few files and low indirection, preserve more tokens for the new business rule. Stacks fragmented across many files force the model to read more to deliver the same amount of effective reasoning.
“In long-lived projects, the slopebmatters more than the starting pointa.”
In projects expected to live long and accumulate dozens of entities, choosing a stack with a low b pays compound dividends at every iteration. In small projects or proofs of concept, the difference is marginal and other factors such as DX, team familiarity and ecosystem tend to dominate the decision.
Adherence of the generated code
Tokens spent is not the same as value delivered. Code that spends few tokens but needs to be rewritten before it runs ends up costing much more across the cycle. To capture this other axis, we reviewed line by line the output of each stack and classified how much rework would be needed for the code to become functional.
- Laravel: high adherence. Conventions respected, correct Route::apiResource, Eloquent with idiomatic fillable and casts, FormRequest with valid rules. It would run with minimal adjustments.
- Django with DRF: high adherence. Correct ModelViewSet, idiomatic DefaultRouter, ModelSerializer with proper Meta. A small fix on the UUIDField default (should be uuid.uuid4) and folder organization that mixes models/views/serializers outside the standard single-app layout.
- Go with Gin and GORM: high adherence. Idiomatic structure, clear separation between handlers, services and routes. Small punctual adjustments such as missing strconv imports for pagination.
- Spring Boot with JPA: medium-high adherence. Correct entity with Jakarta Persistence, clean Repository, idiomatic Service. Inconsistency in the configuration file (mixes javax.persistence with jakarta.persistence) that would need adjustment in Spring Boot 3+ projects.
- C# with ASP.NET Core: high adherence. Complete idiomatic structure. The only bug is the [Index(IsUnique = true)] attribute on a property, which in modern EF Core has to be configured via Fluent API. The rest of the code runs with minimal adjustments.
- Bun with Elysia and Drizzle: medium adherence. Clean Elysia structure, but Drizzle still came out with small API inconsistencies in some runs (withCount, pgEnum syntax). Needs targeted review in the service.
- Go with Fiber and Ent: medium adherence. Correct concepts, but the model left literal paths in some files and the update chain is poorly chained. Incomplete imports in update handlers.
- Bun with Hono and Drizzle: medium adherence. Correct Hono routing structure, no confusion with other runtimes. Small details around imports and schema that need review.
- NestJS with Prisma: medium adherence. Prisma was generated correctly this time, with schema.prisma and PrismaService. Validation through class-validator is in place. Some inconsistencies in the standardized try/catch pattern that could become a global filter.
It is worth noting that this adherence depends heavily on model size. Smaller models, with fewer parameters, tend to hallucinate on long-tail frameworks — less popular libraries appear rarely in training data and the model falls back to syntax from better-documented neighbors. With Qwen 2.5 Coder 14B this effect was rare; with smaller models, it is common to see Hono swapped for Next.js or Prisma swapped for Mongoose in CRUD generation.
“The effective cost of a stack is generation tokens plus rework tokens. On small models, stacks with thin documentation in the training data pay the second term silently.”
This axis confirms the benchmark's reading. Django and Laravel not only spent fewer tokens, they also delivered outputs close to functional. C# was the most verbose, but it also delivered highly structured and nearly ready code. The stacks in the middle of the table require small adjustments, expected in any AI code generation at this moment.
What Anthropic says about this
This reasoning is not just the opinion of someone using AI day to day. In a recent article about Claude Code in large codebases, Anthropic itself reinforces that the environment around the model matters more than the model in isolation.
“The ecosystem built around the model, the harness, determines how Claude Code performs more than the model alone.”
Translated to our stack debate: the framework, its conventions, the chosen ORM and the default architecture act as the model's harness. That layer decides whether the AI spends tokens understanding your structure or advancing the business logic.
“Claude's ability to help in a large codebase is bounded by its ability to find the right context.”
Leaner stacks like Bun with Elysia and Drizzle deliver this legibility naturally: few files, inferred types and low indirection. More opinionated frameworks compensate with strong conventions, but the cost is paid in tokens per file.
“Too much context loaded into every session degrades performance, while too little context leaves Claude to navigate blind.”
The point is straightforward: there is a balance. Stacks that minimize structural noise leave more room for what matters, without falling into the opposite extreme of requiring additional context with every prompt.
Which stack to choose
The choice depends on which priority sits at the top of the project. Some combinations work better in specific scenarios.
- Priority on lower token consumption: Django with DRF or Laravel.
- Priority on productivity and maturity: Laravel or Django.
- Priority on modern type safety: Bun with Elysia and Drizzle.
- Priority on performance and simple binaries: Go with Fiber and Ent.
- Priority on JVM enterprise architecture: Spring Boot with JPA.
- Priority on integration with the Microsoft ecosystem: ASP.NET Core with Entity Framework.
- Priority on predictable architecture in TypeScript: NestJS with Prisma.
My recommendation
For projects where the goal is to deliver business value quickly with AI writing a large share of the code, Django with DRF and Laravel are essentially tied at the top of the benchmark — the 44-token gap between them is noise. The choice between the two comes down to team familiarity and preferred ecosystem, not consumption. Both combine the lowest token consumption in the benchmark with strong conventions that reduce the chance of the model losing its way.
When the priority is end-to-end modern typing, a high-performance runtime and a team that lives in TypeScript, Bun with Elysia and Drizzle remains a strong choice. It came in third on token consumption, delivers end-to-end type inference no other stack in the benchmark offers, and continues to be a solid bet for teams escaping the weight of traditional Node.
For enterprise projects with stack constraints (JVM or .NET for corporate reasons), Spring Boot and ASP.NET Core deliver structured and functional code, with the expected cost in tokens. It is not reasonable to expect these platforms, designed around strong typing and strict separation of responsibilities, to top a ranking of conciseness — but the predictability gain is what justifies the consumption.
Conclusion
When using AI for development, token consumption becomes a strategic factor, not just a technical detail. Leaner stacks reduce cost, accelerate generation, simplify iterations and free up context for business rules.
“The best stack is the one that maximizes the team's productivity and lets AI focus its effort on the logic that differentiates the product.”
Django with DRF and Laravel finished essentially tied at the top, both combining mature conventions, a robust ecosystem and the lowest token consumption among the evaluated stacks — a combination that is hard to beat for projects that need to accelerate delivery with AI assistance. Bun with Elysia and Drizzle comes right after, with the added advantage of modern typing and a high-performance runtime for teams that live in TypeScript. On the other end, ASP.NET Core showed that the cost of the enterprise style can reach almost three times that of a CRUD-specialized framework — a decision to be made consciously.
Limitations of this analysis
The numbers above were produced by a specific model under specific conditions, and it is important to make that transparent so the reader can interpret the results with the right calibration.
- Model used: Qwen 2.5 Coder 14B, running locally via Ollama on a MacBook Pro M3 with 18GB of memory. Larger commercial models such as Claude Opus, GPT-5 or Gemini Ultra may generate more elaborate code and therefore more tokens in absolute terms. The ranking between stacks tends to hold, but absolute values are not interchangeable.
- Each model family has its own tokenizer. The counts reported here reflect Qwen's tokenizer and are not directly comparable with Claude, GPT or Gemini for the same code. What stays consistent across tokenizers is the relative ranking.
- Each stack was generated three times to compute the median. Runs of the same stack fell into a narrow range, making the median stable and the ranking reliable.
- The
aandbcoefficients in the complexity section are estimates derived from the proportion of fixed files to per-entity files in the one-entity benchmark. Validating those projections would require running the benchmark across variedNvalues (next cycle). - Total cost of ownership also depends on talent availability, ecosystem support and operational costs outside code generation.
Next steps
The next cycle of this benchmark will answer the questions still left open.
- Run the same prompt on Claude, GPT and Gemini to compare the ranking across commercial tokenizers and measure whether positions hold with frontier models.
- Empirically validate the formula
T(N) ≈ a + b × Nby running the benchmark withNvarying from one to ten entities, measuringaandbdirectly instead of estimating. - Measure the time until the first local deploy by executing the generated code on each stack, installing dependencies and bringing up the server with its database.
- Measure maintenance cost: how many tokens each stack spends when adding a new field to the entity, not just when creating from scratch.
- Expand to stacks still not covered: Rails, FastAPI, Phoenix, Quarkus.
- Evaluate how each stack behaves over long-term iterations, simulating six months of product evolution.
If you are also using AI to accelerate development, this comparison can help you choose a more efficient stack aligned with your product's technology strategy.
Glossary
Technical terms used throughout the article, in order of appearance:
CRUD— acronym for Create, Read, Update, Delete. The four basic operations every application performs on persisted data.REST— an architectural style for APIs that uses HTTP verbs (GET, POST, PUT, DELETE) and URLs as resources.Endpoint— a specific API URL that accepts requests and returns responses.Boilerplate— repetitive code every feature needs but does not carry business logic (configuration, imports, scaffolding).ORM— Object-Relational Mapping. A library that translates operations on language objects into SQL queries on the database.Migration— a versioned script that changes the database schema in a controlled and reversible way.DTO— Data Transfer Object. An object designed specifically to carry data between layers or across the network, without behavior.Decorators— syntax that attaches metadata or behavior to a class, method or property. Exists in TypeScript, Python, Java, C# and others.Tokenizer— algorithm that breaks text into smaller units (tokens) before the model processes it. Each model family has its own, and counts are not interchangeable.Harness— the set of tools, system prompts and infrastructure surrounding an AI model to make it useful in a real application. Includes file reading, context management and the tools available to the model. In the stack debate of this article, frameworks and their conventions act as the AI's harness when generating code.Context window— the maximum amount of tokens a model accepts in a single call, summing the prompt sent and the generated response.Big O— notation that describes how the time or consumption of an operation grows as input increases.O(N)is linear growth,O(N²)is quadratic.Slope— in a line of the formy = a + b × x,bis the slope: how muchygrows for each unitxincreases.
