Securing an AI Agent

End-to-end guide for integrating Sentinel into a production AI agent that controls a Solana wallet

5 min read

Securing an AI Agent

This guide walks through a complete integration of Sentinel into a production AI agent — one that receives user instructions, builds Solana transactions, and executes them. By the end, every action the agent takes will pass through both the PromptGuard and ExecutionSandbox.

The Problem

An AI agent controlling a wallet is a high-value target. Two attack surfaces matter most:

  1. Adversarial user input — a user (or upstream system) crafts a message that manipulates the agent into behavior outside its intended scope
  2. Unconstrained execution — even a correctly-behaving agent can be instructed to move more funds than it should, interact with untrusted programs, or execute outside business hours

Sentinel addresses both with a two-layer pipeline.

Architecture

Rendering diagram...

Step 1: Install and Configure

bash
npm install @sentinel-sdk/core

Create the Sentinel instance once at agent startup, not per-request:

typescript
// sentinel.ts
import { Sentinel } from '@sentinel-sdk/core';

let _sentinel: Sentinel | null = null;

export async function getSentinel(): Promise<Sentinel> {
  if (_sentinel) return _sentinel;

  _sentinel = await Sentinel.create({
    mode: 'full',

    promptGuard: {
      mode: 'both',  // rules for speed, LLM for accuracy
      rules: {
        rulePacks: ['default'],
      },
      llm: {
        provider: 'anthropic',
        apiKeyEnvVar: 'ANTHROPIC_API_KEY',
        timeoutMs: 4000,
      },
    },

    executionSandbox: {
      rpcEndpoint: process.env.SOLANA_RPC_URL!,
      policy: {
        spendingLimits: {
          maxPerTx: 5,
          maxDaily: 50,
          maxWeekly: 200,
        },
        programAllowlist: [
          'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA',  // SPL Token
          'ATokenGPvbdGVxr1b2hvZbsiqW5xWH25efTNsLJe1bAe',  // Associated Token
          '11111111111111111111111111111111',                 // System Program
        ],
        cooldown: { minMs: 2000 },
        riskThreshold: 70,
      },
    },
  });

  return _sentinel;
}

Create once, reuse across requests

Sentinel.create() loads rule packs and initializes components. Create the instance at startup and reuse it across all agent invocations for best performance.

Step 2: Guard the Input

Before passing a user message to your agent LLM, scan it:

typescript
// agent.ts
import { getSentinel } from './sentinel';

export async function handleUserMessage(message: string): Promise<AgentResponse> {
  const sentinel = await getSentinel();

  // 1. Scan the input
  const guardResult = await sentinel.scanInput(message);

  if (!guardResult.safe) {
    return {
      success: false,
      error: 'Input contains disallowed content',
      threatType: guardResult.threatType,
    };
  }

  // 2. Pass to agent LLM
  const agentOutput = await runAgentLLM(message);

  // ...
}

Step 3: Guard the Transaction

After your agent LLM builds a transaction, evaluate it before broadcasting:

typescript
export async function handleUserMessage(message: string): Promise<AgentResponse> {
  const sentinel = await getSentinel();

  // Guard input
  const guardResult = await sentinel.scanInput(message);
  if (!guardResult.safe) {
    return { success: false, error: 'Input blocked', threatType: guardResult.threatType };
  }

  // Run agent LLM
  const { transaction } = await runAgentLLM(message);

  // Guard transaction
  const sandboxResult = await sentinel.evaluateTransaction(transaction);

  if (!sandboxResult.approved) {
    return {
      success: false,
      error: 'Transaction blocked by policy',
      violations: sandboxResult.policyViolations,
      riskScore: sandboxResult.riskScore,
    };
  }

  // Broadcast
  const signature = await broadcastTransaction(transaction);
  return { success: true, signature };
}

Step 4: Use execute() for Simpler Code

If you have both an input and a transaction, use the unified execute() method:

typescript
const result = await sentinel.execute({
  input: message,
  transaction: builtTransaction,
});

if (!result.approved) {
  return {
    success: false,
    blockedBy: result.blocked_by,
    guardResult: result.guardResult,
    sandboxResult: result.sandboxResult,
  };
}

await broadcastTransaction(builtTransaction);

execute() runs the guard first and only runs the sandbox if the guard passes — saving a simulation call when the input is already blocked.

Step 5: Wire Up Events for Observability

typescript
const sentinel = await getSentinel();

sentinel.on('threat:detected', ({ result }) => {
  metrics.increment('sentinel.threats', {
    threatType: result.threatType,
    confidence: result.confidence?.toFixed(1),
  });
  logger.warn('Threat detected', { result });
});

sentinel.on('policy:violated', ({ violation }) => {
  metrics.increment('sentinel.policy_violations', {
    rule: violation.rule,
  });
  logger.warn('Policy violated', { violation });
});

sentinel.on('tx:simulated', ({ result }) => {
  metrics.histogram('sentinel.risk_score', result.riskScore);
});

Step 6: Handle Errors Gracefully

execute() never throws. Handle the approved: false result uniformly:

typescript
async function safeExecute(message: string, transaction: string) {
  const sentinel = await getSentinel();
  const result = await sentinel.execute({ input: message, transaction });

  if (!result.approved) {
    logger.info('Action blocked', {
      blockedBy: result.blocked_by,
      latency_ms: result.latency_ms,
    });
    return null;
  }

  return result;
}

Production Checklist

Before deploying an agent with Sentinel:

  • Tune riskThreshold — start at 70, lower gradually as you observe real traffic
  • Set programAllowlist — restrict to only the programs your agent legitimately calls
  • Configure timeBounds — if your agent runs in a business context, restrict to appropriate hours
  • Set cooldown — prevents rapid-fire execution loops
  • Monitor tx:simulated events — track risk score distribution to spot anomalies
  • Log all blocked actions — blocked actions are your threat intelligence feed
  • Test with mode: 'both' — parallel rule + LLM scanning gives the best coverage

Failing closed is intentional

If the LLM judge times out, Sentinel falls back to the rule engine. If the rule engine fails, Sentinel returns safe: false. If the sandbox RPC errors, it returns approved: false. This fail-closed design means Sentinel never silently allows a potentially dangerous action.

Example: DeFi Rebalancing Agent

typescript
class RebalancingAgent {
  private sentinel: Sentinel;

  async init() {
    this.sentinel = await Sentinel.create({
      mode: 'full',
      promptGuard: { mode: 'rules', rules: { rulePacks: ['default'] } },
      executionSandbox: {
        rpcEndpoint: process.env.SOLANA_RPC_URL!,
        policy: {
          spendingLimits: { maxPerTx: 100, maxDaily: 500, maxWeekly: 2000 },
          programAllowlist: [RAYDIUM_AMM, ORCA_WHIRLPOOL, SPL_TOKEN],
          riskThreshold: 80,
          cooldown: { minMs: 10_000 },
        },
      },
    });

    this.sentinel.on('threat:detected', (e) => this.onThreat(e.result));
    this.sentinel.on('policy:violated', (e) => this.onViolation(e.violation));
  }

  async rebalance(instruction: string, swapTx: string) {
    const result = await this.sentinel.execute({
      input: instruction,
      transaction: swapTx,
    });

    if (!result.approved) {
      throw new Error(`Rebalance blocked: ${result.blocked_by}`);
    }

    return broadcastTransaction(swapTx);
  }

  private onThreat(result: ScanResult) {
    alertOncall(`Threat in rebalancer: ${result.threatType} (${result.confidence})`);
  }

  private onViolation(v: PolicyViolation) {
    alertOncall(`Policy violation: ${v.rule} — ${v.message}`);
  }
}