Flask Gunicorn Performance Tuning Guide
If you're trying to improve Flask performance in production or stabilize Gunicorn under load, this guide shows you how to tune Gunicorn step-by-step. The goal is to set safe defaults, match worker settings to your app profile, and validate performance with logs and runtime checks.
Quick Fix / Quick Setup
Use this as a safe baseline for a small to medium Flask app behind Nginx:
gunicorn wsgi:app \
--bind 127.0.0.1:8000 \
--workers 3 \
--worker-class gthread \
--threads 4 \
--timeout 30 \
--graceful-timeout 30 \
--keep-alive 2 \
--max-requests 1000 \
--max-requests-jitter 100 \
--access-logfile - \
--error-logfile -
Example systemd ExecStart:
ExecStart=/path/venv/bin/gunicorn wsgi:app \
--bind unix:/run/gunicorn.sock \
--workers 3 \
--worker-class gthread \
--threads 4 \
--timeout 30 \
--graceful-timeout 30 \
--keep-alive 2 \
--max-requests 1000 \
--max-requests-jitter 100
Use this baseline first, then increase workers gradually. Prefer gthread for apps that wait on database or external I/O. For CPU-heavy workloads, test sync workers first.
What’s Happening
Gunicorn performance depends on worker model, worker count, threads, timeouts, and process recycling. Too little concurrency causes request queueing and higher latency. Too much concurrency causes CPU contention, memory pressure, and unstable behavior. Reliable tuning is a balance between throughput, latency, and resource usage.
Step-by-Step Guide
- Measure the server before changing settings.
Check CPU cores, memory, and current load:bashnproc free -m uptime - Identify whether the app is CPU-bound or I/O-bound.
- Use
syncif requests spend most of their time executing Python code or heavy computation. - Use
gthreadif requests often wait on PostgreSQL, external APIs, storage, or other I/O.
- Use
- Start with a conservative worker count.
For a small VPS, begin with2to4workers. For a larger host, test this baseline formula:textworkers = (2 x CPU cores) + 1
This is only a starting point. Final values must be validated under load. - Add threads only for I/O-bound apps.
Example baseline:bashgunicorn wsgi:app \ --worker-class gthread \ --workers 3 \ --threads 4
Keep thread counts modest, usually2to8. - Set conservative timeouts.
Start with:bash--timeout 30 --graceful-timeout 30 --keep-alive 2
Do not raise timeouts to hide slow application code unless the request is valid and cannot be moved out of the request path. - Enable worker recycling.
This helps reduce long-running memory growth and stale worker processes:bash--max-requests 1000 --max-requests-jitter 100 - Bind Gunicorn privately and let Nginx handle public traffic.
Use either a localhost port or Unix socket:bash--bind 127.0.0.1:8000
orbash--bind unix:/run/gunicorn.sock - Move flags into a Gunicorn config file.
Creategunicorn.conf.py:pythonbind = "unix:/run/gunicorn.sock" workers = 3 worker_class = "gthread" threads = 4 timeout = 30 graceful_timeout = 30 keepalive = 2 max_requests = 1000 max_requests_jitter = 100 accesslog = "-" errorlog = "-" - Update
systemdto use the config file.
Example unit fragment:ini[Service] User=www-data Group=www-data WorkingDirectory=/path/app Environment="PATH=/path/venv/bin" ExecStart=/path/venv/bin/gunicorn -c /path/app/gunicorn.conf.py wsgi:app Restart=always RestartSec=5
Reload and restart:bashsudo systemctl daemon-reload sudo systemctl restart gunicorn sudo systemctl status gunicorn - Load test after each change.
Test one change at a time:
hey -n 1000 -c 20 https://your-domain/
or
wrk -t2 -c20 -d30s https://your-domain/
Compare:
- average latency
- p95/p99 latency
- error rate
- CPU usage
- memory usage
- Watch for signs of over-tuning.
- If CPU is saturated continuously, reduce workers or threads.
- If latency is high but CPU stays low, increase concurrency carefully.
- If memory grows until the kernel kills processes, reduce concurrency and review recycling.
- Move slow work out of request handling.
Do not solve long-running requests by only increasing timeouts. Move tasks like:
- email sending
- report generation
- file processing
- multi-API fan-out
into a background job system.
- Validate Nginx and Gunicorn together.
Check that Nginx upstream settings, socket path, and timeout values do not override or hide Gunicorn tuning issues. If needed, review Flask Nginx Performance Tuning Guide and Fix Flask 502 Bad Gateway (Step-by-Step Guide).
- Keep a known-good rollback profile.
Save the previous working gunicorn.conf.py and systemd unit values before each tuning round.
Common Causes
- Too many workers → excessive context switching, high memory use, CPU thrash → reduce worker count and retest.
- Too few workers → requests queue during spikes, high latency with low CPU utilization → increase workers or threads gradually.
- Wrong worker class → poor performance for the workload pattern → use
syncfor CPU-heavy handling andgthreadfor moderate I/O-bound traffic. - Too many threads per worker → increased contention or memory use with little gain → lower thread count and test again.
- Timeout too low → workers killed during valid slow requests → optimize the request path or increase timeout only when justified.
- Timeout too high → stuck requests consume capacity too long → reduce timeout and move long work to background jobs.
- No worker recycling → memory growth over time from leaks or fragmentation → add
max-requestsandmax-requests-jitter. - Nginx timeout mismatch → Gunicorn appears slow or broken when the proxy is the actual limit → align proxy and upstream settings.
- Database pool too small → workers block waiting for DB connections → tune SQLAlchemy or database pool settings.
- Small VPS RAM limits → OOM kills or swap thrashing after concurrency increases → reduce workers and threads, then retest.
- Application code is the bottleneck → Gunicorn tuning has little effect → profile database queries, external calls, rendering, and caching.
Debugging Section
Check service state and logs:
sudo systemctl status gunicorn
sudo journalctl -u gunicorn -n 200 --no-pager
Look for:
- worker timeouts
- boot failures
- repeated restarts
- import errors
- signal exits
Check Gunicorn processes, CPU, and memory:
ps -o pid,ppid,%cpu,%mem,rss,cmd -C gunicorn
ps -eLf | grep gunicorn | grep -v grep
top -H -p $(pgrep -d',' -f gunicorn)
Look for:
- more processes or threads than expected
- runaway memory usage
- constant CPU saturation
- workers restarting repeatedly
Validate listening sockets:
ss -ltnp | grep 8000
ss -lx | grep gunicorn.sock
Confirm Gunicorn is listening on the same address or socket Nginx uses.
Check Nginx for upstream-related errors:
sudo journalctl -u nginx -n 100 --no-pager
sudo tail -n 100 /var/log/nginx/error.log
Look for:
connect() failedupstream timed outno live upstreams- socket permission errors
Check for OOM events:
dmesg -T | grep -i -E 'killed process|out of memory|oom'
If OOM events exist, lower concurrency or add memory.
Run a baseline load test:
hey -n 1000 -c 20 https://your-domain/
wrk -t2 -c20 -d30s https://your-domain/
If using SQLAlchemy, compare Gunicorn concurrency with the application DB pool size. If concurrency exceeds pool capacity, requests may block even when Gunicorn itself is healthy.
Checklist
- Gunicorn worker class matches the app workload.
- Worker count is based on CPU and RAM, not guesswork.
- Threads are used only when they improve I/O-bound concurrency.
-
timeoutandgraceful-timeoutare set deliberately. -
max-requestsandmax-requests-jitterare enabled for long-running stability. -
systemduses a persistent Gunicorn config file or explicit flags. - Nginx points to the correct Gunicorn socket or host:port.
- CPU, memory, latency, and error rate were checked after each change.
- Long-running work is removed from request/response paths where possible.
- A rollback configuration is available if tuning degrades performance.
Related Guides
- Deploy Flask with Nginx + Gunicorn (Step-by-Step Guide)
- Flask Production Checklist (Everything You Must Do)
- Fix Flask 502 Bad Gateway (Step-by-Step Guide)
- Flask Gunicorn Service Failed to Start
FAQ
Q: How many Gunicorn workers should I start with?
A: Start with 2 to 4 workers on a small server, then measure CPU, memory, and latency. Increase gradually only if the system has headroom.
Q: When should I use gthread?
A: Use gthread when requests spend time waiting on a database, API, or filesystem and your app does not require a fully async stack.
Q: Should I use sync or gthread for CPU-heavy endpoints?
A: Start with sync for CPU-heavy workloads. Threads usually help less when Python execution is the bottleneck.
Q: Does increasing timeout improve performance?
A: No. It only allows slow requests to run longer. If the root cause is not fixed, it can reduce available capacity.
Q: What does max-requests-jitter do?
A: It randomizes worker restarts so all workers do not recycle at the same time.
Q: Why is Gunicorn still slow after tuning?
A: The bottleneck is often outside Gunicorn: slow SQL queries, external APIs, filesystem latency, missing caching, or proxy configuration. Review Flask Nginx Performance Tuning Guide and Flask Production Checklist (Everything You Must Do).
Final Takeaway
Gunicorn tuning is controlled capacity planning, not random flag changes. Start with a conservative baseline, match the worker model to the workload, measure under load, and adjust one variable at a time. If tuning does not improve results, the bottleneck is usually in application code, the database, or reverse proxy configuration rather than Gunicorn itself.