Google Gemini
Category: AI / LLM + Image Generation
Integration type: Platform-level API key
External SDK: @google/generative-ai
Purpose
Google Gemini serves two distinct roles in the platform:
- Alternative LLM — Gemini Pro and Flash as drop-in alternatives to Claude/GPT-4o for content agents (useful for tenants with Google Cloud credits or who prefer Google’s models)
- AI Image Generation — Gemini’s Imagen model (
imagen-3.0-generate-001) for generating custom images for blog posts, social content, and GBP posts when stock photos are insufficient
Model lineup
| Model | Use | Notes |
|---|---|---|
gemini-2.5-pro | Complex reasoning, long context | Comparable to Claude Opus / GPT-4o |
gemini-2.5-flash | Fast, balanced | Comparable to Claude Sonnet |
gemini-2.0-flash-lite | Cheapest, fastest | Simple tasks, high-volume |
gemini-nano | On-device / embedded | Not applicable for server-side; included for reference |
imagen-3.0-generate-001 | Text-to-image | High quality photorealistic output |
gemini-2.5-flash-image | Image generation + editing | Multi-modal — can take reference images |
Config Structure
Platform config (env vars)
GOOGLE_AI_API_KEY=AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # From console.cloud.google.com
GOOGLE_AI_DEFAULT_MODEL=gemini-2.5-flash
GOOGLE_AI_IMAGE_MODEL=imagen-3.0-generate-001Integration Pattern
LLM adapter (packages/agent-core/src/adapters/gemini.ts)
Gemini implements the same LLMAdapter interface as Claude and OpenAI:
import { GoogleGenerativeAI } from '@google/generative-ai';
class GeminiAdapter implements LLMAdapter {
private client: GoogleGenerativeAI;
private history: { role: 'user' | 'model'; parts: { text: string }[] }[] = [];
constructor(
private apiKey: string,
private model: string = 'gemini-2.5-flash',
) {
this.client = new GoogleGenerativeAI(apiKey);
}
async *run(systemPrompt: string, userPrompt: string): AsyncGenerator<LLMEvent> {
const generativeModel = this.client.getGenerativeModel({
model: this.model,
systemInstruction: systemPrompt,
});
this.history.push({ role: 'user', parts: [{ text: userPrompt }] });
const chat = generativeModel.startChat({ history: this.history.slice(0, -1) });
const stream = await chat.sendMessageStream(userPrompt);
let assistantText = '';
for await (const chunk of stream.stream) {
const text = chunk.text();
assistantText += text;
yield { type: 'content', text };
}
this.history.push({ role: 'model', parts: [{ text: assistantText }] });
const usage = (await stream.response).usageMetadata;
yield {
type: 'usage',
inputTokens: usage?.promptTokenCount ?? 0,
outputTokens: usage?.candidatesTokenCount ?? 0,
};
}
}Tool use with Gemini
Gemini supports native function calling. The adapter registers tools in the Gemini format:
const tools = [{
functionDeclarations: [
{
name: 'web_search',
description: 'Search the web for current information',
parameters: {
type: 'object',
properties: { query: { type: 'string', description: 'Search query' } },
required: ['query'],
},
},
{
name: 'rag_search',
description: 'Search the client knowledge base',
parameters: {
type: 'object',
properties: {
query: { type: 'string' },
datasetIds: { type: 'array', items: { type: 'string' } },
},
required: ['query'],
},
},
],
}];Image generation (packages/tools/src/gemini-image.ts)
import { GoogleGenerativeAI } from '@google/generative-ai';
import axios from 'axios';
class GeminiImageTool implements ImageGenerationTool {
readonly name = 'gemini';
private client: GoogleGenerativeAI;
constructor(private apiKey: string) {
this.client = new GoogleGenerativeAI(apiKey);
}
async generateImage(options: {
prompt: string;
negativePrompt?: string; // Things to avoid in the image
aspectRatio?: '1:1' | '16:9' | '9:16' | '4:3' | '3:4';
style?: 'photograph' | 'illustration' | 'painting' | 'digital_art';
count?: number; // 1–4 images
}): Promise<GeneratedImage[]> {
// Imagen 3 via REST API (SDK wrapper not yet stable)
const response = await axios.post(
`https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-001:predict`,
{
instances: [{
prompt: this.buildPrompt(options),
}],
parameters: {
sampleCount: options.count ?? 1,
aspectRatio: options.aspectRatio ?? '16:9',
negativePrompt: options.negativePrompt,
},
},
{
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': this.apiKey,
},
},
);
return response.data.predictions.map((pred: any, i: number) => ({
provider: 'gemini',
model: 'imagen-3.0-generate-001',
imageBase64: pred.bytesBase64Encoded,
mimeType: pred.mimeType ?? 'image/png',
revisedPrompt: null,
}));
}
// Generate with reference image (Gemini 2.5 Flash multimodal)
async editImage(options: {
prompt: string;
referenceImages: Buffer[]; // Input images for style/content reference
aspectRatio?: string;
}): Promise<GeneratedImage[]> {
const model = this.client.getGenerativeModel({ model: 'gemini-2.5-flash' });
const parts: any[] = [
{ text: options.prompt },
...options.referenceImages.map(buf => ({
inlineData: {
data: buf.toString('base64'),
mimeType: 'image/jpeg',
},
})),
];
const result = await model.generateContent({ contents: [{ role: 'user', parts }] });
const imagePart = result.response.candidates?.[0].content.parts
.find((p: any) => p.inlineData);
if (!imagePart) throw new Error('Gemini did not return an image');
return [{
provider: 'gemini',
model: 'gemini-2.5-flash',
imageBase64: imagePart.inlineData.data,
mimeType: imagePart.inlineData.mimeType,
revisedPrompt: null,
}];
}
private buildPrompt(options: { prompt: string; style?: string }): string {
const styleModifiers: Record<string, string> = {
photograph: 'professional photograph, realistic, high quality, 4K',
illustration: 'digital illustration, clean vector art style',
painting: 'oil painting, artistic, painterly strokes',
digital_art: 'digital art, concept art, highly detailed',
};
const suffix = options.style ? `, ${styleModifiers[options.style]}` : '';
return `${options.prompt}${suffix}`;
}
}Image Generation Provider Abstraction
All image generation providers (Gemini, DALL-E, Stability AI) implement a shared interface:
interface ImageGenerationTool {
readonly name: string;
generateImage(options: ImageGenerationOptions): Promise<GeneratedImage[]>;
}
interface GeneratedImage {
provider: string;
model: string;
imageBase64: string; // Base64-encoded image
mimeType: string;
revisedPrompt: string | null; // Some models return a revised prompt
}The agent execution engine resolves which image provider to use based on tenant config and the task type.
Gemini Nano
gemini-nano is Google’s smallest model designed for on-device inference (Android AICore). It is not applicable for Leadmetrics server-side agents. It is listed here for completeness — any reference to “Nano” in platform docs means an awareness of the model family, not a supported integration.
Cost Reference
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
gemini-2.5-pro | $1.25 | $10.00 |
gemini-2.5-flash | $0.075 | $0.30 |
gemini-2.0-flash-lite | $0.0075 | $0.03 |
imagen-3.0-generate-001 | — | $0.03/image |
Test Cases
Unit tests (packages/agent-core/src/adapters/gemini.test.ts)
| Test | Approach |
|---|---|
run() calls startChat with history minus last message | Assert history passed to startChat excludes current turn |
run() yields content events from stream chunks | Mock stream; assert text chunks emitted |
run() yields usage event with token counts | Mock usageMetadata; assert values |
run() appends model reply to this.history | Assert history has model entry after run |
Unit tests (packages/tools/src/gemini-image.test.ts)
| Test | Approach |
|---|---|
generateImage() POSTs to Imagen endpoint | Mock axios.post; assert URL and x-goog-api-key |
generateImage() appends style modifier to prompt | Assert prompt includes style string |
generateImage() returns base64 image | Mock { predictions: [{ bytesBase64Encoded: 'abc' }] } |
editImage() includes reference images as inlineData | Assert parts contain inlineData blocks |
Related
- DALL-E Provider — OpenAI image generation
- Stability AI Provider — Stable Diffusion image generation
- OpenAI Provider — GPT-4o LLM (similar adapter pattern)
- Claude Provider — primary LLM