Securing an AI Agent
End-to-end guide for integrating Sentinel into a production AI agent that controls a Solana wallet
5 min read
Securing an AI Agent
This guide walks through a complete integration of Sentinel into a production AI agent — one that receives user instructions, builds Solana transactions, and executes them. By the end, every action the agent takes will pass through both the PromptGuard and ExecutionSandbox.
The Problem
An AI agent controlling a wallet is a high-value target. Two attack surfaces matter most:
- Adversarial user input — a user (or upstream system) crafts a message that manipulates the agent into behavior outside its intended scope
- Unconstrained execution — even a correctly-behaving agent can be instructed to move more funds than it should, interact with untrusted programs, or execute outside business hours
Sentinel addresses both with a two-layer pipeline.
Architecture
Step 1: Install and Configure
npm install @sentinel-sdk/core
Create the Sentinel instance once at agent startup, not per-request:
// sentinel.ts
import { Sentinel } from '@sentinel-sdk/core';
let _sentinel: Sentinel | null = null;
export async function getSentinel(): Promise<Sentinel> {
if (_sentinel) return _sentinel;
_sentinel = await Sentinel.create({
mode: 'full',
promptGuard: {
mode: 'both', // rules for speed, LLM for accuracy
rules: {
rulePacks: ['default'],
},
llm: {
provider: 'anthropic',
apiKeyEnvVar: 'ANTHROPIC_API_KEY',
timeoutMs: 4000,
},
},
executionSandbox: {
rpcEndpoint: process.env.SOLANA_RPC_URL!,
policy: {
spendingLimits: {
maxPerTx: 5,
maxDaily: 50,
maxWeekly: 200,
},
programAllowlist: [
'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA', // SPL Token
'ATokenGPvbdGVxr1b2hvZbsiqW5xWH25efTNsLJe1bAe', // Associated Token
'11111111111111111111111111111111', // System Program
],
cooldown: { minMs: 2000 },
riskThreshold: 70,
},
},
});
return _sentinel;
}
Create once, reuse across requests
Sentinel.create() loads rule packs and initializes components. Create the instance at startup and reuse it across all agent invocations for best performance.
Step 2: Guard the Input
Before passing a user message to your agent LLM, scan it:
// agent.ts
import { getSentinel } from './sentinel';
export async function handleUserMessage(message: string): Promise<AgentResponse> {
const sentinel = await getSentinel();
// 1. Scan the input
const guardResult = await sentinel.scanInput(message);
if (!guardResult.safe) {
return {
success: false,
error: 'Input contains disallowed content',
threatType: guardResult.threatType,
};
}
// 2. Pass to agent LLM
const agentOutput = await runAgentLLM(message);
// ...
}
Step 3: Guard the Transaction
After your agent LLM builds a transaction, evaluate it before broadcasting:
export async function handleUserMessage(message: string): Promise<AgentResponse> {
const sentinel = await getSentinel();
// Guard input
const guardResult = await sentinel.scanInput(message);
if (!guardResult.safe) {
return { success: false, error: 'Input blocked', threatType: guardResult.threatType };
}
// Run agent LLM
const { transaction } = await runAgentLLM(message);
// Guard transaction
const sandboxResult = await sentinel.evaluateTransaction(transaction);
if (!sandboxResult.approved) {
return {
success: false,
error: 'Transaction blocked by policy',
violations: sandboxResult.policyViolations,
riskScore: sandboxResult.riskScore,
};
}
// Broadcast
const signature = await broadcastTransaction(transaction);
return { success: true, signature };
}
Step 4: Use execute() for Simpler Code
If you have both an input and a transaction, use the unified execute() method:
const result = await sentinel.execute({
input: message,
transaction: builtTransaction,
});
if (!result.approved) {
return {
success: false,
blockedBy: result.blocked_by,
guardResult: result.guardResult,
sandboxResult: result.sandboxResult,
};
}
await broadcastTransaction(builtTransaction);
execute() runs the guard first and only runs the sandbox if the guard passes — saving a simulation call when the input is already blocked.
Step 5: Wire Up Events for Observability
const sentinel = await getSentinel();
sentinel.on('threat:detected', ({ result }) => {
metrics.increment('sentinel.threats', {
threatType: result.threatType,
confidence: result.confidence?.toFixed(1),
});
logger.warn('Threat detected', { result });
});
sentinel.on('policy:violated', ({ violation }) => {
metrics.increment('sentinel.policy_violations', {
rule: violation.rule,
});
logger.warn('Policy violated', { violation });
});
sentinel.on('tx:simulated', ({ result }) => {
metrics.histogram('sentinel.risk_score', result.riskScore);
});
Step 6: Handle Errors Gracefully
execute() never throws. Handle the approved: false result uniformly:
async function safeExecute(message: string, transaction: string) {
const sentinel = await getSentinel();
const result = await sentinel.execute({ input: message, transaction });
if (!result.approved) {
logger.info('Action blocked', {
blockedBy: result.blocked_by,
latency_ms: result.latency_ms,
});
return null;
}
return result;
}
Production Checklist
Before deploying an agent with Sentinel:
- Tune
riskThreshold— start at 70, lower gradually as you observe real traffic - Set
programAllowlist— restrict to only the programs your agent legitimately calls - Configure
timeBounds— if your agent runs in a business context, restrict to appropriate hours - Set
cooldown— prevents rapid-fire execution loops - Monitor
tx:simulatedevents — track risk score distribution to spot anomalies - Log all blocked actions — blocked actions are your threat intelligence feed
- Test with
mode: 'both'— parallel rule + LLM scanning gives the best coverage
Failing closed is intentional
If the LLM judge times out, Sentinel falls back to the rule engine. If the rule engine fails, Sentinel returns safe: false. If the sandbox RPC errors, it returns approved: false. This fail-closed design means Sentinel never silently allows a potentially dangerous action.
Example: DeFi Rebalancing Agent
class RebalancingAgent {
private sentinel: Sentinel;
async init() {
this.sentinel = await Sentinel.create({
mode: 'full',
promptGuard: { mode: 'rules', rules: { rulePacks: ['default'] } },
executionSandbox: {
rpcEndpoint: process.env.SOLANA_RPC_URL!,
policy: {
spendingLimits: { maxPerTx: 100, maxDaily: 500, maxWeekly: 2000 },
programAllowlist: [RAYDIUM_AMM, ORCA_WHIRLPOOL, SPL_TOKEN],
riskThreshold: 80,
cooldown: { minMs: 10_000 },
},
},
});
this.sentinel.on('threat:detected', (e) => this.onThreat(e.result));
this.sentinel.on('policy:violated', (e) => this.onViolation(e.violation));
}
async rebalance(instruction: string, swapTx: string) {
const result = await this.sentinel.execute({
input: instruction,
transaction: swapTx,
});
if (!result.approved) {
throw new Error(`Rebalance blocked: ${result.blocked_by}`);
}
return broadcastTransaction(swapTx);
}
private onThreat(result: ScanResult) {
alertOncall(`Threat in rebalancer: ${result.threatType} (${result.confidence})`);
}
private onViolation(v: PolicyViolation) {
alertOncall(`Policy violation: ${v.rule} — ${v.message}`);
}
}