Brian Achaye
Brian Achaye

Data Scientist

Data Analyst

ODK/Kobo Toolbox Expert

BI Engineer

Data Solutions Consultant

Brian Achaye

Data Scientist

Data Analyst

ODK/Kobo Toolbox Expert

BI Engineer

Data Solutions Consultant

Articles

How to Scale ODK Central for 10,000+ Submissions: A Performance Tuning Guide

How to Scale ODK Central for 10,000+ Submissions: A Performance Tuning Guide

When a health NGO I worked with hit 15,000+ monthly submissions, their self-hosted ODK Central server slowed to a crawl—forms took 30 seconds to load, and syncs failed constantly. After weeks of optimizations, we got it handling 50K+ submissions smoothly.

Here’s how to scale ODK Central for high-volume deployments, whether you’re using Docker, Kubernetes, or cloud hosting.

Step 1: Infrastructure Scaling

A. Server Requirements for Heavy Loads

ComponentSmall (1K subs/month)Large (10K+ subs/month)
CPU2 cores4+ cores
RAM4GB16GB+
Storage50GB SSD200GB+ NVMe
DatabaseSingle PostgreSQLPostgreSQL + Read Replicas

B. Deployment Options

  1. Docker Compose (Simplest)
    • Increase resources in docker-compose.yml:yamlCopyservices: central: deploy: resources: limits: cpus: ‘4' memory: 8G
  2. Kubernetes (Best for 50K+ Subs)
  3. Cloud Hosting (AWS/GCP)
    • AWS Setup:
      • EC2: t3.xlarge (4 vCPU, 16GB RAM).
      • RDS PostgreSQL: db.m6g.large (HA setup).

Step 2: Database Optimization

A. PostgreSQL Tuning

Edit /etc/postgresql/14/main/postgresql.conf:

max_connections = 200
shared_buffers = 4GB                  # 25% of RAM
effective_cache_size = 12GB           # 75% of RAM
maintenance_work_mem = 1GB
work_mem = 64MB
synchronous_commit = off              # For better write speed

Restart PostgreSQL:

sudo systemctl restart postgresql

B. Index Heavy Queries

-- For faster form listing
CREATE INDEX idx_submissions_form_id ON submissions (form_id);

-- For user activity tracking
CREATE INDEX idx_audits_actor_id ON audits (actor_id);

Step 3: Central Configuration Tweaks

A. Increase Worker Processes

Edit .env:

NODE_ENV=production
WEB_CONCURRENCY=4                    # Match CPU cores

B. Enable Caching

Add Redis to docker-compose.yml:

services:
  redis:
    image: redis:alpine
  central:
    environment:
      REDIS_URL: redis://redis

C. Optimize File Storage

  • For AWS S3:iniCopyS3_BUCKET=your-bucket S3_ACCESS_KEY=xxx S3_SECRET_KEY=xxx
  • For local storage: Use tmpfs for /tmp:yamlCopycentral: tmpfs: – /tmp

Step 4: Load Balancing & High Availability

A. Multiple Central Instances

  1. Duplicate Central services in docker-compose.yml:yamlCopycentral1: <<: *central ports: [“3000:3000”] central2: <<: *central ports: [“3001:3000”]
  2. Add NGINX load balancer:nginxCopyupstream central { server central1:3000; server central2:3000; }

B. Database Replication

  • Set up 1 primary + 2 read replicas (using AWS RDS or Patroni).

Step 5: Monitoring & Maintenance

A. Key Metrics to Watch

MetricToolAlert Threshold
CPU UsagePrometheus>70% for 5 min
Database LatencypgAdmin>500ms
Failed SubmissionsCentral Logs>1% of total

B. Automated Cleanup

Add a cron job to archive old submissions:

0 3 * * * docker exec central-pg psql -U odkcentral -c "DELETE FROM submissions WHERE created_at < NOW() - INTERVAL '6 months'"

Real-World Example: National Survey

For a 120K-submission education survey, we:

  1. Deployed on AWS (4 nodes + RDS).
  2. Reduced form load time from 12s → 0.8s with Redis caching.
  3. Cut storage costs by 60% using S3 lifecycle policies.

Free Scaling Tools

P.S. What’s your biggest scaling challenge? Share below! 👇

Resources:

Related Posts
Write a comment