IntelliCredit-X — AI Credit Officer

How It Works

Credit AI in 4 Steps

Each episode simulates a full credit committee lifecycle. The agent must gather evidence, reason through risk, and submit compliant decisions under real regulatory pressure.

01

Receive Application

Agent receives a 55D observation — financials, forensic alerts, portfolio state, macro conditions, and memory of past decisions.

02

Investigate with Tools

Calls up to 4 tools per step: get_financial_report(), check_compliance(), get_market_intel().

03

Submit Decision

Calls submit_decision(action, reasoning) with ≥50-char reasoning. 6 hard rules auto-override violations.

04

Face Consequences

Loans mature T+10–30 steps later. Regulator audits fire at steps ~10/20/30/40/50. 3 failures = shutdown (−50 reward).

Multi-Agent System

Three Agents, One Environment

The environment simulates the full credit ecosystem — not just individual decisions.

Credit Officer (LLM)

Your Agent Under Training

Mistral-7B fine-tuned via GRPO. Receives 55D obs, calls investigation tools, submits APPROVE / CONDITIONAL / REJECT with written reasoning.

Borrower Agent

Adversarial Pressure

Rejected borrowers reapply up to 3x with improved surface metrics but unchanged hidden PD — forcing the agent to learn true risk signals.

Regulator Agent

Compliance Enforcer

Audits portfolio at steps ≈10/20/30/40/50 (±1 jitter). Checks NPA rate, CRAR, sector concentration. Episode shutdown on 3 consecutive fails.

Benchmark Results

Before vs. After GRPO

Evaluated across 3 task difficulties. Zero regressions — every metric improved or held steady.

Task	Difficulty	Metric	Base Mistral-7B	GRPO Model	Delta
Task 1	Easy	Accuracy	80.0%	86.7%	+6.7% ✓
Task 1	Easy	Capital Utilization	40.0%	60.0%	+20.0% ✓
Task 2	Medium	Total Reward	10.305	10.584	+0.279 ✓
Task 3	Hard	Total Reward	0.215	2.491	+10x ✓
Task 3	Hard	NPA Rate	16.7%	8.3%	-8.3% ✓

Training Curves

What the Training Curves Tell Us

Four panels reveal the full story of what the model learned and when — across three curriculum stages (dashed lines mark transitions). Mean reward climbs from −2.0 to +1.0, format compliance rises from 0% to 65%, and KL divergence stays safely below 0.12, confirming the model changed without forgetting language capabilities.

Figure 1 GRPO v2 Training Curves — 3-Stage Curriculum

GRPO LossStarts near zero, climbs to 0.02–0.05 — healthy policy divergence from reference.

Mean Reward−2.0 → 0 at Stage 1 end → stable +0.5–1.0. Stage 3 dip then re-stabilises.

KL DivergenceGrows 0→0.08, stays below 0.12 threshold — genuine learning, no catastrophic forgetting.

submit_pctFormat compliance: 0% → 40–65%. The model acquired the vocabulary of the task.

Evaluation

Before vs. After GRPO — Full Comparison

Per-task, per-metric comparison of base Mistral-7B (blue) vs. GRPO-trained IntelliCredit model (green). Zero regressions across all 24 metric-task combinations. The hardest task (Task 3) shows the most dramatic improvement — NPA rate cut in half, total reward up 10×.

Figure 2 Base Mistral-7B vs. GRPO IntelliCredit — All Tasks

Task 1 (Easy)Accuracy +6.7%, capital utilization +20%. The GRPO model deploys more capital into correctly identified safe loans.

Task 2 (Medium)Both models hit perfect Task Score (1.000). GRPO squeezes +0.28 extra reward from better capital efficiency.

Task 3 (Hard)Total reward 0.215 → 2.491 (+10×). NPA 16.7% → 8.3% (halved). True portfolio-level risk management learned.

Key InsightModel learned that surface improvement + behavioural red flags = escalating risk. It calls tools; base model doesn't.

Quick Start

Start an Episode in 2 Calls

The environment is live and accepts HTTP from any client — no install required.

bash — curl

# Step 1: Reset (start a new episode)
curl -X POST https://vssksn-intellicredit-openenv.hf.space/reset   -H "Content-Type: application/json"   -d '{"episode_id":"demo-1","seed":42,"task_id":"task3"}'

# Step 2: Submit a decision  (0=APPROVE 1=CONDITIONAL 2=REJECT)
curl -X POST https://vssksn-intellicredit-openenv.hf.space/step   -H "Content-Type: application/json"   -d '{"episode_id":"demo-1","action":{"decision":2}}'

Resources

Everything Open Source

All artefacts published on Hugging Face and GitHub under MIT License.

Technical Blog

Architecture, 2-stage GRPO, training curves, full results.

Fine-Tuned Model

Mistral-7B post-trained on live environment via online GRPO.

↗

Training Dataset

2,000 GRPO prompts across 5 task levels — intellicredit-grpo-v2.

↗

Stage 1 — Offline GRPO