A 3.35B-parameter model, a LangGraph agent, and an open-source platform walk into a support queue. What follows is a surprisingly competitive multilingual pipeline.
L'essentiel: We built a multilingual support agent with Cohere's Tiny Aya (3.35B) on a LangGraph + Idun Engine stack. The 3.35B model handles 23 languages and code-mixed prompts well enough to triage real tickets — at ~1/40th the inference cost of GPT-4-class models. Full architecture, eval methodology, and production deployment patterns below.
Enterprise support does not happen in one language. A French engineer files a bug report with a full stack trace. A Turkish operations team submits a ticket about a failing backup schedule, complete with cron expressions and log excerpts. A Japanese developer describes a memory leak, referencing heap dumps and JVM flags.
Before anyone can triage, classify, or respond to these tickets, someone (or something) needs to understand them. And understanding a multilingual technical support ticket is harder than it sounds. You need to preserve error codes, API endpoints, SQL fragments, and the causal reasoning that makes a bug report actionable. Translate "le serveur renvoie une erreur 504" wrong and you lose the status code. Paraphrase a stack trace and you lose the debugging path entirely.
The standard approach is to route everything through a large model API, typically 30B+ parameters on a cloud endpoint. It works. It is also slow (10 to 30 seconds per request), expensive at scale, and creates a hard dependency on external infrastructure.
We wanted to see if we could build a complete multilingual support agent powered primarily by a 3.35B-parameter model. Language detection, translation, verification, classification, all running fast enough for real-time triage.
When Cohere Labs released Tiny Aya in February 2026, it caught our attention. Most multilingual models at this scale treat language coverage as a checkbox: they support many languages, but performance degrades sharply outside English and a few high-resource languages. Tiny Aya does something different.
Rather than spreading thin, Tiny Aya covers 67 languages with instruction-tuned variants that specialize in different linguistic regions:
- TinyAya-Global: the generalist, balanced across all 67 languages. - TinyAya-Earth: Africa and West Asia (Arabic, Turkish, Swahili, Amharic, Hausa, etc.) - TinyAya-Fire: South Asia (Hindi, Bengali, Tamil, Urdu, etc.) - TinyAya-Water: Asia-Pacific and Europe (French, Japanese, Chinese, Portuguese, Spanish, Korean, etc.)
This structure immediately suggested an architecture. Instead of picking one model and hoping it works everywhere, we could run the global model as a baseline while simultaneously running the specialized variant for the detected language. Two translations, one verification step, best output wins.
Worth noting: Tiny Aya's tokenizer was designed to reduce fragmentation across scripts. Where other multilingual tokenizers produce long token sequences for non-Latin scripts (slower inference, more memory), Tiny Aya gets significantly fewer tokens per sentence. That directly means faster inference and lower memory usage.
We built the agent as a LangGraph workflow. Each node does one job and passes state forward. Here is the full pipeline:
`` START ├── fetch_ticket ──────────┐ │ ├── detect_language ──┬── translate_global ──┐ └── fetch_categories ──────┘ └── translate_sub ─────┤ ├── verify_translation ── classify ── update_ticket │ │ │ END ``
Eight nodes, three parallelization points, two model families.