Full-stack end-to-end RLHF data collection & annotation platform. Same pipeline used to train ChatGPT and Claude.
Researchers create coding or reasoning tasks with prompts, expected behaviors, and evaluation criteria. Tasks are versioned and tagged.
tasks_api.pyAnnotators are assigned tasks from a queue. They submit reward signals, qualitative feedback, and binary preference labels for agent outputs.
annotations_api.pyCollects scalar rewards, pairwise rankings, and free-text rationales per task. Aggregates inter-annotator agreement scores.
feedback_api.pyMonitors annotator consistency, task completion rates, feedback distributions, and statistical outliers in reward signals.
quality_worker.pyReal-time Grafana dashboard showing task throughput, reward distributions, annotator performance, and dataset health.
metrics_api.pyGenerates clean JSONL/Parquet files formatted for RL fine-tuning (RLHF, DPO). Filters by quality thresholds before export.
export_worker.pyAuto-generated OpenAPI docs, async endpoints, strong request/response validation. Use SQLAlchemy 2.0 async ORM + Alembic for migrations.
backendUse React Query (TanStack) for server state, Zustand for UI state, React Hook Form for task creation forms. Tailwind CSS for styling.
frontendPostgres for all relational data with JSONB for task metadata. Redis for Celery task queues and caching hot metrics queries.
dataAsync workers for: quality score computation, dataset export generation, annotator assignment, and notification dispatch.
workersInstrument FastAPI with prometheus-fastapi-instrumentator. Build dashboards for reward distributions, task throughput, and annotator stats.
observabilityDocker Compose for local dev. Deployed to Render (API + Worker + Frontend). GitHub Actions auto-deploys on push to main. Upstash Redis with TLS for production queue.
infraJWT tokens with role-based access. Researchers and annotators have separate endpoints enforced at the API level. bcrypt + SHA-256 password hashing.
implementedLogin limited to 10 requests/min per IP. Registration limited to 5/min. Prevents brute force attacks. Implemented via slowapi.
implementedLocked to frontend origin only. No wildcard origins. Configured per-environment via ALLOWED_ORIGINS env var.
implementedAll secrets stored as environment variables on Render. .env and prometheus.yml excluded from git. Groq and Grafana tokens never committed.
implementedGitHub Actions triggers on push to main. Deploys API, Worker, and Frontend to Render in parallel via deploy hooks. Worker deploys after API succeeds.
implementedPrometheus scrapes /metrics every 60s. Remote write to Grafana Cloud. Dashboards for request rate, p95 latency, error rate, and endpoint breakdown.
implemented