5 min read

The Hidden Cost of append: Why Pre-allocating Slices and Maps Matters in Go

The Hidden Cost of append: Why Pre-allocating Slices and Maps Matters in Go
Photo by Raoul Croes / Unsplash

If you've been writing Go for a while, you've probably used append thousands of times without thinking twice. It just works. But under the hood, that innocent-looking function might be silently killing your application's performance.

In this post, I'll show you exactly what happens when you skip pre-allocation, demonstrate the performance impact with real benchmarks, and give you practical rules for writing faster Go code.

What Actually Happens When a Slice Grows

Slices in Go are backed by arrays. When you call append and the slice doesn't have enough capacity, Go must:

  1. Allocate a new, larger array (roughly 2x for small slices, ~1.25x for larger ones)
  2. Copy all existing elements to the new array
  3. Add your new element
  4. Let the garbage collector clean up the old array

Here's what that looks like visually:

Appending to a slice at capacity:

Before: [A][B][C][D]  cap=4, len=4
                 ↓ append(E)
Step 1: [_][_][_][_][_][_][_][_]  allocate new array (cap=8)
Step 2: [A][B][C][D][_][_][_][_]  copy 4 elements
Step 3: [A][B][C][D][E][_][_][_]  finally add new element
        [A][B][C][D] → garbage collection

For a single append, this overhead is negligible. But in a loop adding thousands of elements? You're paying this cost over and over.

The Benchmark

I wrote a comprehensive benchmark to measure the real-world impact. Here's the core comparison:

// Bad: No pre-allocation - slice grows dynamically
func SliceNoPrealloc(n int) []int {
    var s []int // len=0, cap=0
    for i := 0; i < n; i++ {
        s = append(s, i)
    }
    return s
}

// Good: Pre-allocated with exact capacity
func SlicePrealloc(n int) []int {
    s := make([]int, 0, n) // len=0, cap=n
    for i := 0; i < n; i++ {
        s = append(s, i)
    }
    return s
}

// Alternative: Direct index assignment (fastest)
func SliceDirectAssign(n int) []int {
    s := make([]int, n) // len=n, cap=n
    for i := 0; i < n; i++ {
        s[i] = i
    }
    return s
}

Running go test -bench=. -benchmem reveals the damage:

Slice Results (integers)

Elements Method Time Memory Allocations
100 No prealloc 1,077 ns 2,040 B 8
100 Prealloc 420 ns 896 B 1
10,000 No prealloc 154 μs 357 KB 19
10,000 Prealloc 29 μs 82 KB 1
1,000,000 No prealloc 15.5 ms 41.7 MB 38
1,000,000 Prealloc 1.6 ms 8 MB 1

At one million elements, pre-allocation is 10x faster and uses 5x less memory. The non-pre-allocated version triggers 38 separate allocations, each one copying increasingly large amounts of data.

Map Results

Maps behave similarly. They rehash and redistribute buckets when the load factor exceeds a threshold:

// Bad: No size hint
m := make(map[int]int)

// Good: Size hint provided
m := make(map[int]int, expectedSize)
Elements Method Time Memory Allocations
100 No prealloc 7,065 ns 5,364 B 16
100 Prealloc 3,777 ns 2,908 B 6
10,000 No prealloc 769 μs 687 KB 276
10,000 Prealloc 392 μs 322 KB 11
100,000 No prealloc 7.7 ms 5.8 MB 4,016
100,000 Prealloc 4.4 ms 2.8 MB 1,681

The difference is less dramatic than slices (maps are more complex internally), but still a consistent 2x improvement in speed and memory.

The Real-World Pattern: Transforming Slices

The most common scenario where this matters is transforming one slice into another with some filtering:

// This pattern appears everywhere
func GetActiveUsers(users []User) []User {
    var active []User
    for _, u := range users {
        if u.IsActive {
            active = append(active, u)
        }
    }
    return active
}

You have three options to optimize this:

// Option 1: No pre-allocation (baseline)
func TransformNoPrealloc(input []int) []int {
    var result []int
    for _, v := range input {
        if v%2 == 0 {
            result = append(result, v*2)
        }
    }
    return result
}

// Option 2: Two-pass with exact count
func TransformPreallocExact(input []int) []int {
    count := 0
    for _, v := range input {
        if v%2 == 0 {
            count++
        }
    }
    
    result := make([]int, 0, count)
    for _, v := range input {
        if v%2 == 0 {
            result = append(result, v*2)
        }
    }
    return result
}

// Option 3: Estimate based on heuristics
func TransformPreallocEstimate(input []int) []int {
    // Assume ~50% will match
    result := make([]int, 0, len(input)/2)
    for _, v := range input {
        if v%2 == 0 {
            result = append(result, v*2)
        }
    }
    return result
}

The benchmark results surprised me:

Method Time Memory Allocations
No prealloc 912 μs 1.96 MB 25
Two-pass exact 334 μs 401 KB 1
Estimate (len/2) 194 μs 401 KB 1

The estimate method wins because it avoids iterating twice. Slight over-allocation is cheaper than an extra pass through your data. This is a useful insight: when in doubt, over-allocate slightly rather than under-allocate or iterate twice.

How to Detect Pre-allocation Issues

1. Benchmark with Memory Stats

go test -bench=. -benchmem

Look for high allocs/op values. If your function is doing 10+ allocations for what should be a single slice build, you have a problem.

2. Escape Analysis

go build -gcflags="-m" yourfile.go

This shows where allocations happen:

./main.go:19:11: make([]int, 0, n) escapes to heap
./main.go:27:11: make(map[int]int) escapes to heap

3. Static Analysis with prealloc

go install github.com/alexkohler/prealloc@latest
prealloc ./...

This linter specifically catches the var s []T followed by append in a loop pattern.

4. Runtime Profiling

For production systems, use pprof to find allocation hotspots:

import _ "net/http/pprof"

// Access via:
// go tool pprof http://localhost:6060/debug/pprof/allocs

Practical Rules

After years of optimizing Go code, here are my rules of thumb:

Rule 1: Known exact size → pre-allocate exactly

result := make([]T, 0, len(input))
// or if filling sequentially:
result := make([]T, len(input))

Rule 2: Upper bound known → pre-allocate to upper bound

// Filtering from input - output can't exceed input length
result := make([]T, 0, len(input))

Rule 3: Estimate available → use the estimate

// If you know ~30% typically match
result := make([]T, 0, len(input)/3)

Rule 4: Building from external source → estimate reasonably

// Reading lines from a file? Guess based on typical file
lines := make([]string, 0, 1000)

Rule 5: Maps with predictable keys → always hint

// Indexing users by ID
userMap := make(map[int64]*User, len(users))

When It Doesn't Matter

Pre-allocation isn't always worth the cognitive overhead. Skip it when:

  • The slice will have fewer than ~10 elements
  • The code isn't in a hot path
  • You genuinely have no idea about the size (rare in practice)
  • Readability suffers significantly

The goal is writing fast, maintainable code—not obsessing over micro-optimizations in cold paths.

Conclusion

Pre-allocating slices and maps is one of the simplest performance wins in Go. The pattern is easy to remember: if you know (or can estimate) the size, tell Go upfront with make([]T, 0, n) or make(map[K]V, n).

The benchmarks speak for themselves: 10x speed improvements and 5x memory reduction for large slices. In high-throughput systems processing millions of events, this adds up fast.

Next time you write var s []T followed by a loop with append, pause and ask: do I know roughly how big this will get? If yes, a single make call will save you allocations, copies, and GC pressure.

Your future self (and your servers) will thank you.


Want to verify these numbers yourself? Run the benchmarks with go test -bench=. -benchmem -benchtime=2s and see the results on your own hardware.