Talent.com
Gramian Consulting Group
AI Evaluation Engineer (Knowledge & Research)Gramian Consulting Group • KE
AI Evaluation Engineer (Knowledge & Research)

AI Evaluation Engineer (Knowledge & Research)

Gramian Consulting Group • KE
9 days ago
Job type
  • Quick Apply
Job description

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer with a strong research background to design and evaluate complex, multi-agent tasks used to benchmark next-generation AI systems.

In this role, you will work at the intersection of research, data structuring, and AI evaluation, building high-quality tasks that require deep document understanding, structured reasoning, and multi-step synthesis. You will create datasets and evaluation frameworks that test whether AI agents can truly read, reason, and extract knowledge from large-scale unstructured data.

This is a high-precision, detail-oriented role requiring strong analytical thinking, structured problem decomposition, and the ability to translate research content into measurable evaluation tasks.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 5 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min)

Responsibilities

  • Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections
  • Curate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysis
  • Write structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source material
  • Design LLM judge prompts that evaluate agent output field-by-field against the oracle
  • Create decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)

Requirements

  • 5+ years of experience in research (academic or industry) in a scientific, technical, or analytical domain
  • Strong ability to read, analyze, and extract structured information from unstructured documents
  • Experience designing or working with structured data formats (JSON, schemas, validation)
  • Proficiency in Python scripting (data processing, validation, or evaluation scripts)
  • Experience with AI evaluation, coding benchmarks, or structured reasoning tasks (e.g., SWE-bench, Terminal-bench, or similar)
  • Experience working with Docker (building images, debugging containers)
  • Strong attention to detail, especially when defining exact, verifiable outputs
  • Ability to design complex, multi-step problem-solving workflows
Create a job alert for this search

AI Evaluation Engineer (Knowledge & Research) • KE

Similar jobs

AI Evaluation Engineer - Mathematics & Algorithms

Gramian Consulting GroupKE
Quick Apply

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions.With a strong background in software engineering and leadership, we help comp... Show more

Validator – RLHF (Reinforcement Learning with Human Feedback) Initiative (Kenya-based)

Digital GreenKE
Quick Apply

About Digital Green Digital Green is a global development organization that empowers smallholder farmers through digital solutions, data, and partnerships.We work at the intersection of technology ... Show more

Operations Specialist (Programming & AI Focus)

CrewBloomKE
Quick Apply

The Operations Specialist will leverage advanced technical skills to optimize internal processes and deliver innovative solutions within the Operations Department.This role combines programming exp... Show more

DevOps Engineer

OpenFnKE
Quick Apply

Full-time contractor (hours negotiable).Remote - Africa/Europe time zones (strong preference for candidates based in Africa).Our platform automates data exchange and digital workflows so that gover... Show more

Principal Programmer

ICONKenya

Principal Statistical Programmer.ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excelle... Show more

Senior Statistical Programmer

ICONKenya

ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excellence, and we welcome you to join u... Show more

Research Scientist - Computational Mathematics (Scientific Coding / AI Evaluatio

Gramian Consulting GroupKE
Quick Apply

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions.With a strong background in software engineering and leadership, we help comp... Show more

Senior Statistician

ICONRemote, Kenya

ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excellence, and we welcome you to join u... Show more

Power BI Analyst & Dashboard Developer

Remote RavenKE
Quick Apply

The Power BI Analyst & Dashboard Developer integrates data from company’s Electronic Medical Record (EMR), CRM, survey platforms, and outcome tracking tools into clean, validated dash... Show more

Manager/Consultant Statistician

ICONRemote, Kenya

ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excellence, and we welcome you to join u... Show more

AI & Cloud Engineering

Remote RavenKE
Quick Apply

We are seeking a highly skilled and compliance-minded AI Specialist to design, build, and deploy artificial intelligence and machine learning solutions that drive automation, operational ... Show more

Research Scientist - Computational Materials (Scientific Coding)

Gramian Consulting GroupKE
Quick Apply

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions.With a strong background in software engineering and leadership, we help comp... Show more

Statistical Programmer II

ICONKenya

ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excellence, and we welcome you to join u... Show more

Full-Stack AI Engineer

PavagoKE
Quick Apply

Job Title: Full-Stack AI Engineer.Our client is seeking a Full-Stack AI Engineer to design, build, and deploy AI-powered applications that bridge modern software engineering with applied machine le... Show more

AI Evaluation Engineer - Planning & Operations

Gramian Consulting GroupKE
Quick Apply

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions.With a strong background in software engineering and leadership, we help comp... Show more

Biology Experts - AI Training

Gramian Consulting GroupKE
Quick Apply

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions.With a strong background in software engineering and leadership, we help comp... Show more

AI Automation Specialist

PavagoKE
Quick Apply

AI Automation Specialist (Zapier, APIs, ChatGPT, Workflow Automation) – Remote | U.Deploying workflows that drive measurable impact.If you’ve actually built automations end-to-end and can connect s... Show more

Principal Statistical Programmer

ICONRemote, Kenya

Principal Statistical Programmer.ICON plc is a world-leading healthcare intelligence and clinical research organization.We’re proud to foster an inclusive environment driving innovation and excelle... Show more