使用 Gemini API 生成和编辑图像：一份实践指南

Gemini API 现已支持强大的图像生成功能，其模型 gemini-2.5-flash-image-preview（也被称为 Nano Banana）允许用户通过对话方式创建和处理视觉内容。你可以结合文本、图像或两者来与模型交互，实现前所未有的创作控制力。

核心功能包括：

文本到图像 (Text-to-Image)：根据简单或复杂的文本描述生成高质量图像。
图像编辑 (Image + Text-to-Image)：提供一张现有图像，并用文本指令来添加、移除或修改其中的元素，调整风格或色彩。
图像合成与风格迁移 (Multi-image to Image)：利用多张输入图像合成一个新场景，或将一张图像的艺术风格应用到另一张上。
迭代式优化：通过连续对话逐步微调图像，直至达到理想效果。
高保真文本渲染：在图像中精准地生成清晰、可读且位置合理的文本，非常适合制作徽标、图表和海报。

所有通过此功能生成的图像都将包含 SynthID 数字水印。

文本到图像生成 (Text-to-Image)

这是最基础的用法，只需提供一段描述性的文本提示词，即可生成相应的图像。

以下代码演示了如何根据提示词“创建一张以 Gemini 为主题的高档餐厅中的‘纳米香蕉’菜肴图片”来生成图像。

Python

from google import genai
from PIL import Image
from io import BytesIO

# 需要先配置你的 API Key
# genai.configure(api_key="YOUR_API_KEY")

client = genai.Client()
prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt],
)

# 响应中可能包含文本和图像部分
for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        # 将图像数据保存为文件
        image_data = part.inline_data.data
        image = Image.open(BytesIO(image_data))
        image.save("generated_image.png")
        print("图像已保存为 generated_image.png")

JavaScript (Node.js)

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

async function main() {
    // 需要先配置你的 API Key
    const genAI = new GoogleGenAI("YOUR_API_KEY");
    
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash-image-preview" });

    const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme";
    const result = await model.generateContent(prompt);
    
    const response = result.response;
    const candidate = response.candidates[0];

    for (const part of candidate.content.parts) {
        if (part.text) {
            console.log(part.text);
        } else if (part.inlineData) {
            const imageData = part.inlineData.data;
            const buffer = Buffer.from(imageData, "base64");
            fs.writeFileSync("generated_image.png", buffer);
            console.log("图像已保存为 generated_image.png");
        }
    }
}

main();

Go

package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"google.golang.org/api/option"
	"google.golang.org/genai"
)

func main() {
	ctx := context.Background()
    // 需要先配置你的 API Key
	client, err := genai.NewClient(ctx, option.WithAPIKey("YOUR_API_KEY"))
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	model := client.GenerativeModel("gemini-2.5-flash-image-preview")
	prompt := genai.Text("Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme")

	resp, err := model.GenerateContent(ctx, prompt)
	if err != nil {
		log.Fatal(err)
	}

	for _, part := range resp.Candidates[0].Content.Parts {
		if txt, ok := part.(genai.Text); ok {
			fmt.Println(txt)
		} else if data, ok := part.(genai.ImageData); ok {
			err := os.WriteFile("generated_image.png", data, 0644)
			if err != nil {
				log.Fatal(err)
			}
			fmt.Println("图像已保存为 generated_image.png")
		}
	}
}

REST API (cURL)

curl -s -X POST " \
 -H "x-goog-api-key: $GEMINI_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
      "contents": [{
        "parts": [
          {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
        ]
      }]
    }' \
 | grep -o '"data": "[^"]*"' \
 | cut -d '"' -f4 \
 | base64 --decode > generated_image.png

图像编辑 (图文到图像)

此功能允许你上传一张图片，并结合文本指令对其进行修改。

重要提示：请确保您对上传的所有图片均拥有必要权利。请勿生成侵犯他人权利的内容。使用此服务时，必须遵守相关的使用限制政策。

以下示例演示了如何上传一张猫的图片，并要求模型将其与“纳米香蕉”、餐厅和星座等元素结合。

Python

from google import genai
from PIL import Image
from io import BytesIO

# 配置 API Key
# genai.configure(api_key="YOUR_API_KEY")

client = genai.Client()

prompt = (
    "Create a picture of my cat eating a nano-banana in a "
    "fancy restaurant under the Gemini constellation"
)
image = Image.open("/path/to/your/cat_image.png")

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image],
)

# 处理响应
for part in response.candidates[0].content.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image_data = part.inline_data.data
        modified_image = Image.open(BytesIO(image_data))
        modified_image.save("edited_image.png")
        print("编辑后的图像已保存为 edited_image.png")

JavaScript (Node.js)

import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";

async function main() {
    // 配置 API Key
    const genAI = new GoogleGenAI("YOUR_API_KEY");
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash-image-preview" });

    const imagePath = "/path/to/your/cat_image.png";
    const imageData = fs.readFileSync(imagePath);
    const base64Image = imageData.toString("base64");

    const prompt = [
        { text: "Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation" },
        {
            inlineData: {
                mimeType: "image/png",
                data: base64Image,
            },
        },
    ];

    const result = await model.generateContent({ contents: prompt });
    const response = result.response;

    for (const part of response.candidates[0].content.parts) {
        if (part.inlineData) {
            const buffer = Buffer.from(part.inlineData.data, "base64");
            fs.writeFileSync("edited_image.png", buffer);
            console.log("编辑后的图像已保存为 edited_image.png");
        }
    }
}

main();

Go

package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"google.golang.org/api/option"
	"google.golang.org/genai"
)

func main() {
	ctx := context.Background()
    // 配置 API Key
	client, err := genai.NewClient(ctx, option.WithAPIKey("YOUR_API_KEY"))
	if err != nil {
		log.Fatal(err)
	}
	defer client.Close()

	model := client.GenerativeModel("gemini-2.5-flash-image-preview")

	imagePath := "/path/to/your/cat_image.png"
	imgData, err := os.ReadFile(imagePath)
	if err != nil {
		log.Fatal(err)
	}

	prompt := []genai.Part{
		genai.Text("Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation"),
		genai.ImageData("png", imgData),
	}

	resp, err := model.GenerateContent(ctx, prompt...)
	if err != nil {
		log.Fatal(err)
	}

    // 处理响应...
}

REST API (cURL)

IMG_PATH="/path/to/your/cat_image.jpeg"

# 确定 base64 命令的参数
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl -X POST \
  " \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d "{
        \"contents\": [{
          \"parts\":[
            {\"text\": \"Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"},
            {\"inline_data\": {
                \"mime_type\":\"image/jpeg\",
                \"data\": \"$IMG_BASE64\"
              }
            }
          ]
        }]
      }" \
  | grep -o '"data": "[^"]*"' \
  | cut -d '"' -f4 \
  | base64 --decode > edited_image.png

其他高级生成模式

Gemini 还支持更复杂的交互模式，具体取决于提示词的结构和上下文：

图文交织输出：生成包含相关文本的图像。
- 示例提示：“生成一份图文并茂的海鲜饭食谱。”
基于图文的图文创作：使用输入的图像和文本来创建新的相关图像和文本。
- 示例提示：（附带一张带家具的房间照片）“我的空间还适合放置哪些颜色的沙发？请更新图片来展示效果。”
多轮图像修改（聊天模式）：以对话方式持续生成和修改图像。
- 示例对话：
  1. 用户上传一张蓝色汽车的图片。
  2. 用户：“把这辆车变成敞篷车。”
  3. 用户：“现在把颜色改成黄色。”

提示词指南与策略

要充分发挥 Gemini 的图像生成能力，关键在于掌握一个核心原则：描述一个完整的场景，而不仅仅是罗列关键词。模型强大的语言理解能力使其在处理叙事性描述时表现更佳，生成的图像也更连贯、更符合预期。

以下是一些实用的策略和模板，可帮助你构建高效的提示词。

1. 创建逼真照片

要生成具有真实感的照片，可以引入摄影术语，如拍摄角度、镜头类型、光线和细节，引导模型生成逼真的效果。

模板：一张 [拍摄类型，如特写、远景] 的写实照片，主体是 [主体描述]，正在 [动作或表情]，场景位于 [环境描述]。光线来自 [光线描述]，营造出 [氛围] 的气氛。使用 [相机/镜头细节] 拍摄，突出了 [关键纹理和细节]。图像应为 [宽高比] 格式。

示例提示：一张老年日本陶艺家的写实特写肖像，他脸上有深刻的、日晒出的皱纹和温暖而会意的微笑。他正在仔细检查一个刚上釉的茶碗。场景是他那质朴、阳光普照的工作室。柔和的黄金时刻光线从窗户射入，照亮了粘土的细腻纹理。使用 85mm 人像镜头拍摄，背景呈现柔和的虚化效果（焦外成像）。整体氛围宁静而充满大师风范。竖向构图。

# 示例代码
response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents="A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation.",
)
# ... 后续代码用于保存图像 ...

2. 设计风格化插画与贴纸

在创建贴纸、图标或素材时，明确指出所需的艺术风格，并可以要求透明背景。

模板：一张 [风格，如可爱风、扁平风] 风格的贴纸，主体是 [主体描述]，具有 [关键特征]，采用 [色调] 色彩。设计应有 [线条风格] 和 [阴影风格]。背景必须是透明的。

示例提示：一张可爱（Kawaii）风格的贴纸，画的是一只戴着小竹帽的快乐小熊猫。它正在啃一片绿色的竹叶。设计特点是线条粗大清晰，采用简单的赛璐璐上色法（cel-shading），色彩鲜艳。背景必须是白色的。

# 示例代码
response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents="A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white.",
)
# ... 后续代码用于保存图像 ...

3. 在图像中精准渲染文本

Gemini 在渲染文本方面表现出色。在提示词中清楚地说明要渲染的文字、字体风格（可用描述性词语）以及整体设计。

模板：为 [品牌/概念] 创建一个 [图像类型，如徽标、海报]，其中包含文字“[要渲染的文本]”，字体为 [字体风格描述]。设计应为 [风格描述]，采用 [配色方案]。

示例提示：为一家名为“The Daily Grind”的咖啡店设计一个现代、简约的徽标。文字应采用干净、粗体的无衬线字体。设计应包含一个与文字无缝融合的、风格化的咖啡豆简化图标。配色方案为黑白。

# 示例代码
response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents="Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a coffee bean seamlessly integrated with the text. The color scheme is black and white.",
)
# ... 后续代码用于保存图像 ...

👉 如果你需要 ChatGPT 代充 / Claude / Claude Code / 镜像 / 中转 API：

购买 / 了解更多：ai4.plus
备用入口：kk4099.com

文本到图像生成 (Text-to-Image)#

Python#

JavaScript (Node.js)#

Go#

REST API (cURL)#

图像编辑 (图文到图像)#

Python#

JavaScript (Node.js)#

Go#

REST API (cURL)#

其他高级生成模式#

提示词指南与策略#

1. 创建逼真照片#

2. 设计风格化插画与贴纸#

3. 在图像中精准渲染文本#

文本到图像生成 (Text-to-Image)

Python

JavaScript (Node.js)

Go

REST API (cURL)

图像编辑 (图文到图像)

Python

JavaScript (Node.js)

Go

REST API (cURL)

其他高级生成模式

提示词指南与策略

1. 创建逼真照片

2. 设计风格化插画与贴纸

3. 在图像中精准渲染文本