6. LangChain4j + 多模態視覺理解詳細說明

LangChain4j + 多模態視覺理解詳細說明

@

LangChain4j + 多模態視覺理解詳細說明
- LangChain4j進行圖像理解
LangChain4j 多模態實戰
- 結合LangChain4j進行圖像理解，其支持視覺-語言的多模態任務
- 結合阿里巴巴通義萬相進行圖像生成(文本生成圖像)
最后：

LangChain4j進行圖像理解

多模態之視覺理解：

https://docs.langchain4j.dev/tutorials/chat-and-language-models#multimodality

TextContent 文本交流大模型

ImageContent圖像交流大模型

AudioContent 音頻交流大模型

VideoContent 視頻交流大模型

PdfFileContent PDF 交流大模型

eg：

LangChain4j 多模態實戰

首先：準備工作：切換阿里百煉大模型選擇選擇模型qwen-vl-max，能支持圖像的qwen-vl-max。

阿里百煉地址：https://bailian.console.aliyun.com/

https://help.aliyun.com/zh/model-studio/models#850732b1aabs0

https://help.aliyun.com/zh/model-studio/models#3f1f1c8913fvo

結合LangChain4j進行圖像理解，其支持視覺-語言的多模態任務

就是讓大模型，解讀我們圖像信息。

編寫 Moudle 項目：

導入 pom.xml 依賴。

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>com.rainbowsea</groupId>
        <artifactId>langchain4j-studys</artifactId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <artifactId>langchain4j-04chat-image</artifactId>
    <packaging>jar</packaging>

    <name>langchain4j-04chat-image</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>


        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!--langchain4j-open-ai 基礎-->
        <!--所有調用均基于 OpenAI 協議標準，實現一致的接口設計與規范LangChain4j 提供與許多 LLM 提供商的集成
        從最簡單的開始方式是從 OpenAI 集成開始https://docs.langchain4j.dev/get-started    -->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-open-ai</artifactId>
        </dependency>
        <!--langchain4j 高階-->
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j</artifactId>
        </dependency>
        <!--lombok-->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <!--hutool-->
        <dependency>
            <groupId>cn.hutool</groupId>
            <artifactId>hutool-all</artifactId>
            <version>5.8.22</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

編寫大模型基本配置三件套(大模型 key，大模型 name，大模型 url)的配置類

package com.rainbowsea.langchain4j04chatimage.config;

import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * https://docs.langchain4j.dev/tutorials/chat-and-language-models/#image-content
 */
@Configuration
public class LLMConfig {
    @Bean
    public ChatModel ImageModel() {
        return OpenAiChatModel.builder()
                .apiKey(System.getenv("aliQwen_api"))
                //qwen-vl-max 是一個多模態大模型，支持圖片和文本的結合輸入，適用于視覺-語言任務。
                .modelName("qwen-vl-max")
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();
    }

}

業務類的編寫：

1）resources目錄下放入mi.jpg圖片。該圖片就是用于讓大模型讀取的圖片資料。

https://docs.langchain4j.dev/tutorials/chat-and-language-models/#multimodality

package com.rainbowsea.langchain4j04chatimage.controller;

import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.Base64;

/**
 *
 */
@RestController
@Slf4j
public class ImageModelController {
    @Autowired
    private ChatModel chatModel;

    @Value("classpath:static/images/mi.jpg")
    // 圖片也是一種資源，也可以用 @Value 進行賦值注入
    // classpath表示 resources 根目錄
    private Resource resource;    //import org.springframework.core.io.Resource;

    /**
     * @Description: 通過Base64編碼將圖片轉化為字符串，結合ImageContent和TextContent
     * 一起發送到模型進行處理。
     * 測試地址：http://localhost:9004/image/call
     */
    @GetMapping(value = "/image/call")
    public String readImageContent() throws IOException {

        // 注意：這里我們的計算機還是大模型是無法直接識別，傳輸圖片的
        // 我們需要將圖片轉換為 byte[] 二進制比特數據才能傳輸，才能識別
        byte[] byteArray = resource.getContentAsByteArray();
        String base64Data = Base64.getEncoder().encodeToString(byteArray);

        // 將發送給大模型的信息，封裝到 UserMessage 對象當中
        UserMessage userMessage = UserMessage.from(
                TextContent.from("從以下圖片中獲取來源網站名稱，股價走勢和5月30號股價"),
                // mimeType 指明讓大模型解讀的文件類型是::image/jpg ，讓大模型更容易解讀
                ImageContent.from(base64Data, "image/jpg")
        );

        ChatResponse chatResponse = chatModel.chat(userMessage);
        String result = chatResponse.aiMessage().text();

        System.out.println(result);

        return result;
    }
}

運行測試：

結合阿里巴巴通義萬相進行圖像生成(文本生成圖像)

https://docs.langchain4j.dev/integrations/language-models/

LangChain4J引l入第3方平臺和自己整合：

注意：這里我們的 DashScope 是 Qwen 通義千問。

https://docs.langchain4j.dev/integrations/language-models/dashscope

官方說明，新增 Maven 配置

**導入對應大模型依賴的 xml **

這里我們統一一下，將配置放到我們的頂級 pom.xml 當中。

        <!--langchain4j-community 引入阿里云百煉平臺依賴管理清單-->
        <langchain4j-community.version>1.0.1-beta6</langchain4j-community.version>


            <!--引入阿里云百煉平臺依賴管理清單
   https://docs.langchain4j.dev/integrations/language-models/dashscope
   -->
            <dependency>
                <groupId>dev.langchain4j</groupId>
                <artifactId>langchain4j-community-bom</artifactId>
                <version>${langchain4j-community.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>

阿里巴巴通義萬相 WanxlmageModel:

https://docs.langchain4j.dev/integrations/language-models/dashscope#configurable-parameters

切換通義萬相-文生圖模型 wanx2.1-t2i-turbo ，它支持通過一句話生成圖像

https://help.aliyun.com/zh/model-studio/text-to-image

在我們對應的子模塊的 pom.xml 當中導入我們 '通義千問文圖' 的依賴 jak 包

版本信息，從頂級 pom.xml 當中繼承獲取

切換我們配置類當中的大模型為 “wanx2.1-t2i-turbo”可以文生圖的大模型

package com.rainbowsea.langchain4j04chatimage.config;

import dev.langchain4j.community.model.dashscope.WanxImageModel;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * https://docs.langchain4j.dev/tutorials/chat-and-language-models/#image-content
 */
@Configuration
public class LLMConfig {

    /**
     * @Description: 測試通義萬象來實現圖片生成，
     * 知識出處，https://help.aliyun.com/zh/model-studio/text-to-image
     * @Auther: zzyybs@126.com
     */
    @Bean
    public WanxImageModel wanxImageModel()
    {
        return WanxImageModel.builder()
                .apiKey(System.getenv("aliQwen_api"))
                .modelName("wanx2.1-t2i-turbo") //圖片生成 https://help.aliyun.com/zh/model-studio/text-to-image
                .build();
    }
}

編寫文生圖的 controller 類方法

package com.rainbowsea.langchain4j04chatimage.controller;

import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesis;
import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesisParam;
import com.alibaba.dashscope.aigc.imagesynthesis.ImageSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import dev.langchain4j.community.model.dashscope.WanxImageModel;
import dev.langchain4j.data.image.Image;
import dev.langchain4j.model.output.Response;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;

/**
 */
@RestController
@Slf4j
public class WanxImageModelController
{
    @Autowired
    private WanxImageModel wanxImageModel;

    // http://localhost:9006/image/create2
    @GetMapping(value = "/image/create2")
    public String createImageContent2() throws IOException
    {
        System.out.println(wanxImageModel);
        Response<Image> imageResponse = wanxImageModel.generate("小兔子");

        System.out.println(imageResponse.content().url());

        return imageResponse.content().url().toString();

    }




    // http://localhost:9006/image/create3
    @GetMapping(value = "/image/create3")
    public String createImageContent3() throws IOException
    {

        String prompt = "近景鏡頭，18歲的中國女孩，古代服飾，圓臉，正面看著鏡頭，" +
                "民族優雅的服裝，商業攝影，室外，電影級光照，半身特寫，精致的淡妝，銳利的邊緣。";
        ImageSynthesisParam param =
                ImageSynthesisParam.builder()
                            .apiKey(System.getenv("aliQwen-api"))
                            .model(ImageSynthesis.Models.WANX_V1)
                            .prompt(prompt)
                            .style("<watercolor>")
                            .n(1)
                            .size("1024*1024")
                        .build();

        ImageSynthesis imageSynthesis = new ImageSynthesis();
        ImageSynthesisResult result = null;

        try {
            System.out.println("---sync call, please wait a moment----");
            result = imageSynthesis.call(param);
        } catch (ApiException | NoApiKeyException e){
            throw new RuntimeException(e.getMessage());
        }


        System.out.println(JsonUtils.toJson(result));

        return JsonUtils.toJson(result);
    }
}

運行測試：

最后：

“在這個最后的篇章中，我要表達我對每一位讀者的感激之情。你們的關注和回復是我創作的動力源泉，我從你們身上吸取了無盡的靈感與勇氣。我會將你們的鼓勵留在心底，繼續在其他的領域奮斗。感謝你們，我們總會在某個時刻再次相遇。”

posted @ 2025-09-04 11:42 Rainbow-Sea 閱讀(139) 評論(0) 收藏舉報

刷新頁面返回頂部

TheMagicalRainbowSea

一個人的資金一定是與他(她)的能力相匹配的，無一例外。掘金: [ https://juejin.cn/user/752533564566951 ] 騰訊云: [ https://cloud.tencent.com/developer/user/10317357 ] CSDN: [ https://blog.csdn.net/weixin_61635597?spm=1000.2115.3001.5343 ]

6. LangChain4j + 多模態視覺理解詳細說明

LangChain4j + 多模態視覺理解詳細說明

LangChain4j進行圖像理解

LangChain4j 多模態實戰

結合LangChain4j進行圖像理解，其支持視覺-語言的多模態任務

結合阿里巴巴通義萬相進行圖像生成(文本生成圖像)

最后：

公告

TheMagicalRainbowSea

一個人的資金一定是與他(她)的能力相匹配的，無一例外。 掘金: [ https://juejin.cn/user/752533564566951 ] 騰訊云: [ https://cloud.tencent.com/developer/user/10317357 ] CSDN: [ https://blog.csdn.net/weixin_61635597?spm=1000.2115.3001.5343 ]

6. LangChain4j + 多模態視覺理解詳細說明

LangChain4j + 多模態視覺理解詳細說明

LangChain4j進行圖像理解

LangChain4j 多模態實戰

結合LangChain4j進行圖像理解，其支持視覺-語言的多模態任務

結合阿里巴巴通義萬相進行圖像生成(文本生成圖像)

最后：

公告

一個人的資金一定是與他(她)的能力相匹配的，無一例外。掘金: [ https://juejin.cn/user/752533564566951 ] 騰訊云: [ https://cloud.tencent.com/developer/user/10317357 ] CSDN: [ https://blog.csdn.net/weixin_61635597?spm=1000.2115.3001.5343 ]