DALL-E (OpenAI Image Generation)
Category: AI Image Generation
Integration type: Platform-level API key (same as OpenAI LLM)
External SDK: openai
Purpose
DALL-E 3 is OpenAI’s text-to-image generation model. It produces photorealistic images, illustrations, and stylised artwork from text prompts. It is used when stock photo searches (Pixabay, Unsplash) don’t yield a sufficiently specific image for the content being created.
DALL-E 3 uses the same API key as the OpenAI LLM integration — no separate credential required.
When DALL-E vs Stock Photos
| Scenario | Use |
|---|---|
| Need a generic nature/business photo | Unsplash or Pixabay (free) |
| Need a specific scene that doesn’t exist in stock | DALL-E 3 (generates it) |
| Need brand-specific illustration | DALL-E 3 with brand style prompt |
| Need editing an existing image | DALL-E 2 inpainting or Gemini edit |
Config Structure
Platform config (env vars)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Shared with OpenAI LLM
DALLE_DEFAULT_MODEL=dall-e-3 # 'dall-e-3' | 'dall-e-2'
DALLE_DEFAULT_SIZE=1024x1024 # See size options per model belowIntegration Pattern
Image generation tool (packages/tools/src/dall-e.ts)
import OpenAI from 'openai';
class DallETool implements ImageGenerationTool {
readonly name = 'dall-e';
private client: OpenAI;
constructor(apiKey: string) {
this.client = new OpenAI({ apiKey });
}
async generateImage(options: {
prompt: string;
model?: 'dall-e-3' | 'dall-e-2';
size?: string; // See per-model options below
quality?: 'standard' | 'hd'; // DALL-E 3 only; default 'standard'
style?: 'vivid' | 'natural'; // DALL-E 3 only; default 'vivid'
count?: number; // DALL-E 3: max 1; DALL-E 2: max 10
}): Promise<GeneratedImage[]> {
const model = options.model ?? 'dall-e-3';
const size = options.size ?? this.defaultSize(model);
const count = Math.min(options.count ?? 1, model === 'dall-e-3' ? 1 : 10);
const response = await this.client.images.generate({
model,
prompt: options.prompt,
n: count,
size: size as any,
quality: model === 'dall-e-3' ? (options.quality ?? 'standard') : undefined,
style: model === 'dall-e-3' ? (options.style ?? 'vivid') : undefined,
response_format: 'b64_json',
});
return response.data.map(image => ({
provider: 'dall-e',
model,
imageBase64: image.b64_json!,
mimeType: 'image/png',
revisedPrompt: image.revised_prompt ?? null, // DALL-E 3 always revises the prompt
}));
}
// DALL-E 2 image editing — inpainting
async editImage(options: {
image: Buffer; // PNG, must be square, transparent areas = mask
mask?: Buffer; // PNG with transparent areas indicating where to edit
prompt: string;
size?: '256x256' | '512x512' | '1024x1024';
count?: number;
}): Promise<GeneratedImage[]> {
const response = await this.client.images.edit({
image: new File([options.image], 'image.png', { type: 'image/png' }),
mask: options.mask ? new File([options.mask], 'mask.png', { type: 'image/png' }) : undefined,
prompt: options.prompt,
n: options.count ?? 1,
size: (options.size ?? '1024x1024') as any,
response_format: 'b64_json',
});
return response.data.map(image => ({
provider: 'dall-e',
model: 'dall-e-2',
imageBase64: image.b64_json!,
mimeType: 'image/png',
revisedPrompt: null,
}));
}
// Image variation — DALL-E 2 only
async createVariation(options: {
image: Buffer; // PNG, square
count?: number;
size?: string;
}): Promise<GeneratedImage[]> {
const response = await this.client.images.createVariation({
image: new File([options.image], 'image.png', { type: 'image/png' }),
n: options.count ?? 1,
size: (options.size ?? '1024x1024') as any,
response_format: 'b64_json',
});
return response.data.map(image => ({
provider: 'dall-e',
model: 'dall-e-2',
imageBase64: image.b64_json!,
mimeType: 'image/png',
revisedPrompt: null,
}));
}
private defaultSize(model: string): string {
return model === 'dall-e-3' ? '1024x1024' : '512x512';
}
}Size Options
DALL-E 3
| Size | Aspect ratio | Notes |
|---|---|---|
1024x1024 | Square | Best for Instagram, general use |
1792x1024 | Landscape 16:9 | Blog featured images, banners |
1024x1792 | Portrait 9:16 | Stories, Pinterest |
DALL-E 2
| Size | Notes |
|---|---|
256x256 | Low resolution; low cost |
512x512 | Medium resolution |
1024x1024 | Full resolution |
DALL-E 3 Prompt Revision
DALL-E 3 automatically revises the input prompt to add detail and safety filtering. The revised_prompt in the response shows what was actually used. The platform logs this for debugging and transparency. If an agent’s prompt is consistently revised, the system prompt for that agent should be adjusted.
Content Policy
DALL-E 3 applies strict content moderation:
- No people who could be mistaken for real people without explicit consent
- No violent, adult, or disturbing content
- Prompts involving brands or logos may be refused
Agent system prompts must avoid requesting people’s faces, real brand logos, or sensitive subjects. Violations result in 400 errors with content_policy_violation.
Cost Reference
| Model | Size | Quality | Price per image |
|---|---|---|---|
| DALL-E 3 | 1024×1024 | Standard | $0.040 |
| DALL-E 3 | 1024×1024 | HD | $0.080 |
| DALL-E 3 | 1792×1024 | Standard | $0.080 |
| DALL-E 3 | 1792×1024 | HD | $0.120 |
| DALL-E 2 | 1024×1024 | — | $0.020 |
Test Cases
Unit tests (packages/tools/src/dall-e.test.ts)
| Test | Approach |
|---|---|
generateImage() calls images.generate with correct params | Mock openai.images.generate; assert model, size, quality, style |
generateImage() caps count at 1 for DALL-E 3 | Pass count: 3; assert n: 1 in API call |
generateImage() returns revisedPrompt from response | Mock { revised_prompt: 'revised text' } |
editImage() passes image and mask as File objects | Assert File instances in images.edit call |
generateImage() uses response_format: 'b64_json' | Always assert this param |
| Throws on content policy violation (400) | Mock 400 with content_policy_violation; assert error |
Related
- Google Gemini Provider — Imagen 3 image generation
- Stability AI Provider — Stable Diffusion image generation
- Pixabay Provider — free stock photos (use before AI generation)
- Unsplash Provider — premium stock photos
- OpenAI Provider — GPT-4o LLM (same API key)