Reference Guide Detailed deployment notes with production context and concrete examples.

Flask Gunicorn Performance Tuning Guide

If you're trying to improve Flask performance in production or stabilize Gunicorn under load, this guide shows you how to tune Gunicorn step-by-step. The goal is to set safe defaults, match worker settings to your app profile, and validate performance with logs and runtime checks.

Quick Fix / Quick Setup

Use this as a safe baseline for a small to medium Flask app behind Nginx:

bash

gunicorn wsgi:app \
  --bind 127.0.0.1:8000 \
  --workers 3 \
  --worker-class gthread \
  --threads 4 \
  --timeout 30 \
  --graceful-timeout 30 \
  --keep-alive 2 \
  --max-requests 1000 \
  --max-requests-jitter 100 \
  --access-logfile - \
  --error-logfile -

Example systemd ExecStart:

ini

ExecStart=/path/venv/bin/gunicorn wsgi:app \
  --bind unix:/run/gunicorn.sock \
  --workers 3 \
  --worker-class gthread \
  --threads 4 \
  --timeout 30 \
  --graceful-timeout 30 \
  --keep-alive 2 \
  --max-requests 1000 \
  --max-requests-jitter 100

Use this baseline first, then increase workers gradually. Prefer gthread for apps that wait on database or external I/O. For CPU-heavy workloads, test sync workers first.

What’s Happening

Gunicorn performance depends on worker model, worker count, threads, timeouts, and process recycling. Too little concurrency causes request queueing and higher latency. Too much concurrency causes CPU contention, memory pressure, and unstable behavior. Reliable tuning is a balance between throughput, latency, and resource usage.

Step-by-Step Guide

Measure the server before changing settings.
Check CPU cores, memory, and current load:
bash
```
nproc
free -m
uptime
```
Identify whether the app is CPU-bound or I/O-bound.
- Use sync if requests spend most of their time executing Python code or heavy computation.
- Use gthread if requests often wait on PostgreSQL, external APIs, storage, or other I/O.
Start with a conservative worker count.
For a small VPS, begin with 2 to 4 workers. For a larger host, test this baseline formula:
text
```
workers = (2 x CPU cores) + 1
```
This is only a starting point. Final values must be validated under load.
Add threads only for I/O-bound apps.
Example baseline:
bash
```
gunicorn wsgi:app \
  --worker-class gthread \
  --workers 3 \
  --threads 4
```
Keep thread counts modest, usually 2 to 8.
Set conservative timeouts.
Start with:
bash
```
--timeout 30 --graceful-timeout 30 --keep-alive 2
```
Do not raise timeouts to hide slow application code unless the request is valid and cannot be moved out of the request path.
Enable worker recycling.
This helps reduce long-running memory growth and stale worker processes:
bash
```
--max-requests 1000 --max-requests-jitter 100
```
Bind Gunicorn privately and let Nginx handle public traffic.
Use either a localhost port or Unix socket:
bash
```
--bind 127.0.0.1:8000
```
or
bash
```
--bind unix:/run/gunicorn.sock
```

Move flags into a Gunicorn config file.
Create gunicorn.conf.py:

python

bind = "unix:/run/gunicorn.sock"
workers = 3
worker_class = "gthread"
threads = 4
timeout = 30
graceful_timeout = 30
keepalive = 2
max_requests = 1000
max_requests_jitter = 100
accesslog = "-"
errorlog = "-"

Update systemd to use the config file.
Example unit fragment:

ini

[Service]
User=www-data
Group=www-data
WorkingDirectory=/path/app
Environment="PATH=/path/venv/bin"
ExecStart=/path/venv/bin/gunicorn -c /path/app/gunicorn.conf.py wsgi:app
Restart=always
RestartSec=5

Reload and restart:

bash

sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo systemctl status gunicorn

Load test after each change.

Test one change at a time:

bash

hey -n 1000 -c 20 https://your-domain/

bash

wrk -t2 -c20 -d30s https://your-domain/

Compare:

average latency
p95/p99 latency
error rate
CPU usage
memory usage

Watch for signs of over-tuning.

If CPU is saturated continuously, reduce workers or threads.
If latency is high but CPU stays low, increase concurrency carefully.
If memory grows until the kernel kills processes, reduce concurrency and review recycling.

Move slow work out of request handling.

Do not solve long-running requests by only increasing timeouts. Move tasks like:

email sending
report generation
file processing
multi-API fan-out

into a background job system.

Validate Nginx and Gunicorn together.

Check that Nginx upstream settings, socket path, and timeout values do not override or hide Gunicorn tuning issues. If needed, review Flask Nginx Performance Tuning Guide and Fix Flask 502 Bad Gateway (Step-by-Step Guide).

Keep a known-good rollback profile.

Save the previous working gunicorn.conf.py and systemd unit values before each tuning round.

Common Causes

Too many workers → excessive context switching, high memory use, CPU thrash → reduce worker count and retest.
Too few workers → requests queue during spikes, high latency with low CPU utilization → increase workers or threads gradually.
Wrong worker class → poor performance for the workload pattern → use sync for CPU-heavy handling and gthread for moderate I/O-bound traffic.
Too many threads per worker → increased contention or memory use with little gain → lower thread count and test again.
Timeout too low → workers killed during valid slow requests → optimize the request path or increase timeout only when justified.
Timeout too high → stuck requests consume capacity too long → reduce timeout and move long work to background jobs.
No worker recycling → memory growth over time from leaks or fragmentation → add max-requests and max-requests-jitter.
Nginx timeout mismatch → Gunicorn appears slow or broken when the proxy is the actual limit → align proxy and upstream settings.
Database pool too small → workers block waiting for DB connections → tune SQLAlchemy or database pool settings.
Small VPS RAM limits → OOM kills or swap thrashing after concurrency increases → reduce workers and threads, then retest.
Application code is the bottleneck → Gunicorn tuning has little effect → profile database queries, external calls, rendering, and caching.

Debugging Section

Check service state and logs:

bash

sudo systemctl status gunicorn
sudo journalctl -u gunicorn -n 200 --no-pager

Look for:

worker timeouts
boot failures
repeated restarts
import errors
signal exits

Check Gunicorn processes, CPU, and memory:

bash

ps -o pid,ppid,%cpu,%mem,rss,cmd -C gunicorn
ps -eLf | grep gunicorn | grep -v grep
top -H -p $(pgrep -d',' -f gunicorn)

Look for:

more processes or threads than expected
runaway memory usage
constant CPU saturation
workers restarting repeatedly

Validate listening sockets:

bash

ss -ltnp | grep 8000
ss -lx | grep gunicorn.sock

Confirm Gunicorn is listening on the same address or socket Nginx uses.

Check Nginx for upstream-related errors:

bash

sudo journalctl -u nginx -n 100 --no-pager
sudo tail -n 100 /var/log/nginx/error.log

Look for:

connect() failed
upstream timed out
no live upstreams
socket permission errors

Check for OOM events:

bash

dmesg -T | grep -i -E 'killed process|out of memory|oom'

If OOM events exist, lower concurrency or add memory.

Run a baseline load test:

bash

hey -n 1000 -c 20 https://your-domain/
wrk -t2 -c20 -d30s https://your-domain/

If using SQLAlchemy, compare Gunicorn concurrency with the application DB pool size. If concurrency exceeds pool capacity, requests may block even when Gunicorn itself is healthy.

Checklist

FAQ

How many Gunicorn workers should I start with?

Start with 2 to 4 workers on a small server, then measure CPU, memory, and latency. Increase gradually only if the system has headroom.

When should I use gthread?

Use gthread when requests spend time waiting on a database, API, or filesystem and your app does not require a fully async stack.

Should I use sync or gthread for CPU-heavy endpoints?

Start with sync for CPU-heavy workloads. Threads usually help less when Python execution is the bottleneck.

Does increasing timeout improve performance?

No. It only allows slow requests to run longer. If the root cause is not fixed, it can reduce available capacity.

What does max-requests-jitter do?

It randomizes worker restarts so all workers do not recycle at the same time.

Why is Gunicorn still slow after tuning?

The bottleneck is often outside Gunicorn: slow SQL queries, external APIs, filesystem latency, missing caching, or proxy configuration. Review Flask Nginx Performance Tuning Guide and Flask Production Checklist (Everything You Must Do).

Final Takeaway

Gunicorn tuning is controlled capacity planning, not random flag changes. Start with a conservative baseline, match the worker model to the workload, measure under load, and adjust one variable at a time. If tuning does not improve results, the bottleneck is usually in application code, the database, or reverse proxy configuration rather than Gunicorn itself.