← Back to Projects
TerrierTA

TerrierTA

Academic AI

Production evaluation pipeline with rubric-aligned generative assessment and self-consistency verification. Retrieval-augmented feedback synthesis across configurable assessment criteria. 500+ documents per evaluation cycle, 3 institutional deployments, 60% overhead reduction.

PythonLangGraphFastAPIReactPostgreSQLRedisRAGMulti-AgentGroq API

About

Grading hundreds of assignments by hand means slow, uneven feedback — and quality drops the more there is to grade. TerrierTA gives consistent, rubric-grounded feedback at scale, with a human in the loop where it counts. Production evaluation pipeline deployed across 3 institutional targets, processing 500+ documents per cycle. The architecture addresses four core challenges: maintaining assessment consistency, grounding generative feedback in configurable rubric criteria, balancing throughput with granularity, and providing human-in-the-loop override without disrupting throughput.

The Problem

High-throughput evaluation environments face a fundamental scaling constraint. Manual assessment at scale results in delayed feedback cycles, inconsistent evaluation across parallel tracks, and reduced granularity under time pressure.

The Approach

The system addresses the consistency-throughput tradeoff through rubric-aligned generation with self-consistency verification at the output boundary. Uncertain evaluations are flagged for human review. Configurable assessment criteria allow adaptation across deployment targets without architectural changes.

Tech Stack

  • Frontend: React 18, TypeScript, TailwindCSS, Vite
  • Backend: Python 3.11, FastAPI, PostgreSQL, Redis, Celery
  • AI/ML: Groq API / OpenAI, LangGraph, Sentence Transformers, RAG Pipeline, Custom Rubric Engine