【擁抱鴻蒙】HarmonyOS NEXT實現雙路預覽并識別文字

我們在許多其他平臺看到過OCR功能的應用，那么HarmonyOS在這方面的支持如何呢？我們如何能快速使用這一能力呢？使用這一能力需要注意的點有哪些呢？就讓我們一起來探究吧~

我們在許多其他平臺看到過OCR功能的應用，那么HarmonyOS在這方面的支持如何呢？我們如何能快速使用這一能力呢？使用這一能力需要注意的點有哪些呢？就讓我們一起來探究吧~

【開發環境】

版本規則號：HarmonyOS NEXT
版本類型：Developer Preview2
OpenHarmony API Version：11 Release
compileSdkVersion：4.1.0(11)
IDE：DevEco Studio 4.1.3.700（Mac）

實現目標

通過對Core Vision Kit的基礎功能的實現，完成相冊圖片獲取、OCR、相機預覽，圖片格式轉換等功能，熟悉ArkTS的開發流程和細節，加深對HarmonyOS中各類基礎庫的理解。

名詞解釋

Core Vision Kit：基礎視覺服務
Camera Kit：相機服務
Core File Kit：文件基礎服務
OCR：Optical Character Recognition，通用文字識別或光學字符識別
URI: Uniform Resource Identifier，資源標識符，本文中URI指圖片資源的訪問路徑

核心功能

本篇所涉及的核心功能就是通用文字識別（OCR）。

OCR是通過拍照、掃描等光學輸入方式，把各種票據、卡證、表格、報刊、書籍等印刷品文字轉化為圖像信息，再利用文字識別技術將圖像信息轉化為計算機等設備可以使用的字符信息的技術。

首先，我們實現從相冊選取一張圖片，并識別圖片上的文字的功能。這一功能的實現基于系統提供的Core Vision Kit中的OCR能力。

創建一個ImageOCRUtil類，用于封裝OCR相關功能。
從CoreVisionKit中導入textRecognition模塊，聲明一個名為ImageOCRUtil的類，并創建其new()方法。

import { textRecognition } from '@kit.CoreVisionKit';

export class ImageOCRUtil {}

export default new ImageOCRUtil();

在ImageOCRUtil中實現圖片中文字識別功能。
構建一個異步方法：async recognizeText(image: PixelMap | undefined, resultCallback: Function)，其中PixelMap為圖像像素類，用于讀取或寫入圖像數據以及獲取圖像信息。目前pixelmap序列化大小最大128MB，超過會送顯失敗。大小計算方式為：寬 x 高 x 每像素占用字節數。

export class ImageOCRUtil {

  /**
   * 文字識別
   *
   * @param image 圖片源數據
   * @param resultCallback 結果返回
   * @returns
   */
  static async recognizeText(image:  PixelMap | undefined, resultCallback: Function) {
    // 非空判斷
    if (!image || image === undefined) {
      hilog.error(0x0000, 'OCR', 'the image is not existed');
      return;
    }

    let visionInfo: textRecognition.VisionInfo = {
      pixelMap: image
    };

    let textConfiguration: textRecognition.TextRecognitionConfiguration = {
      isDirectionDetectionSupported: false
    };

    textRecognition.recognizeText(visionInfo, textConfiguration, (error: BusinessError, data: textRecognition.TextRecognitionResult) => {
      // 識別成功，獲取結果
      if (error.code == 0) {
        let recognitionRes = data.value.toString();
        // 將識別結果返回
        resultCallback(recognitionRes);
      }
    });
  }
}

在ImageOCRUtil中實現從相冊獲取圖片URI功能。
這里需用到Core File Kit，可借助圖片選擇器獲取圖片的存儲路徑。

import { picker } from '@kit.CoreFileKit';

/**
  * 打開相冊選擇圖片
  * @returns 異步返回圖片URI
  */
static openAlbum(): Promise<string> {
    return new Promise<string>((resolve, reject) => {
      let photoPicker = new picker.PhotoViewPicker;
      photoPicker.select({
        MIMEType: picker.PhotoViewMIMETypes.IMAGE_TYPE,
        maxSelectNumber: 1
      }).then((res: picker.PhotoSelectResult) => {
        resolve(res.photoUris[0]);
      }).catch((err: BusinessError) => {
        hilog.error(0x0000, "OCR", `Failed to get photo uri, code: ${err.code}, message: ${err.message}`)
        resolve('')
      })
    })
}

UI與調用

為了驗證圖片識別的效果，我們可以搭建簡單的UI，提供從相冊獲取圖片 -> 文字識別 -> 顯示識別結果這一流程的UI與交互。

在Index頁面中，UI相關的代碼如下：

import { image } from '@kit.ImageKit'
import { hilog } from '@kit.PerformanceAnalysisKit';
import { ImageOCRUtil } from '../common/utils/ImageOCRUtil';
import { CommonUtils } from '../common/utils/CommonUtils';
import { fileIo } from '@kit.CoreFileKit';

@Entry
@Component
struct Index {
  private imageSource: image.ImageSource | undefined = undefined;
  @State selectedImage: PixelMap | undefined = undefined;
  @State dataValues: string = '';

  build() {
    Column() {
      // 選中的圖片
      Image(this.selectedImage)
        .objectFit(ImageFit.Fill)
        .height('60%')

      // 識別的內容
      Text(this.dataValues)
        .copyOption(CopyOptions.LocalDevice)
        .height('15%')
        .width('60%')
        .margin(10)

      // 選擇圖片按鈕
      Button('選擇圖片')
        .type(ButtonType.Capsule)
        .fontColor(Color.White)
        .width('80%')
        .margin(10)
        .onClick(() => {
          this.selectImage();
        })

      Button('開始識別')
        .type(ButtonType.Capsule)
        .fontColor(Color.White)
        .alignSelf(ItemAlign.Center)
        .width('80%')
        .margin(10)
        .onClick(() => {
            // 點擊“開始識別”
          });
        })
    }
    .width('100%')
    .height('100%')
    .justifyContent(FlexAlign.Center)
  }

  private async selectImage() {
    let uri = await ImageOCRUtil.openAlbum();
    if (uri === undefined) {
      hilog.error(0x0000, 'OCR', 'Failed to get the uri of photo.')
      return;
    }

    this.loadImage(uri);
  }

  loadImage(path: string) {
    setTimeout(async () => {
      let fileSource = await fileIo.open(path, fileIo.OpenMode.READ_ONLY);
      this.imageSource = image.createImageSource(fileSource.fd);
      this.selectedImage = await this.imageSource.createPixelMap();
    })
  }
}

在“開始識別”的按鈕的點擊事件中，我們調用ImageOCRUtil的recognizeText，并在其回調中顯示識別結果。
并對imageSource和selectedImage進行release()釋放內存空間。

ImageOCRUtil.recognizeText(this.selectedImage, (content: string) => {
  if (!CommonUtils.isEmpty(content)) {
    this.dataValues = content;
  }
  
  // 釋放內存空間
  this.imageSource?.release();
  this.selectedImage?.release();
});

其實現效果如下所示：

雙路預覽

為了對文字識別這一功能進行擴展，我們可以結合相機的雙路預覽功能實時獲取圖片幀，并對圖片幀進行文字識別。

我們創建一個XComponentPage的頁面，添加一個相機預覽視圖。

獲取ImageReceiver組件的SurfaceId。

async getImageReceiverSurfaceId(receiver: image.ImageReceiver): Promise<string | undefined> {
    let ImageReceiverSurfaceId: string | undefined = undefined;
    if (receiver !== undefined) {
      console.info('receiver is not undefined');
      let ImageReceiverSurfaceId: string = await receiver.getReceivingSurfaceId();
      console.info(`ImageReceived id: ${ImageReceiverSurfaceId}`);
    } else {
      console.error('createImageReceiver failed');
    }
    return ImageReceiverSurfaceId;
  }

創建XComponent組件Surface。

XComponent({
        // 組件的唯一標識
        id: 'LOXComponent',
        // surface:EGL/OpenGLES和媒體數據寫入  component:開發者定制繪制內容
        type: XComponentType.SURFACE,
        // 應用Native層編譯輸出動態庫名稱，僅XComponent類型為"surface"時有效
        libraryname: 'SingleXComponent',
        // 給組件綁定一個控制器，通過控制器調用組件方法，僅XComponent類型為"surface"時有效
        controller: this.mXComponentController
      })// 插件加載完成時回調事件
        .onLoad(() => {
          // 設置Surface寬高（1920*1080），預覽尺寸設置參考前面 previewProfilesArray 獲取的當前設備所支持的預覽分辨率大小去設置
          // 預覽流與錄像輸出流的分辨率的寬高比要保持一致
          this.mXComponentController.setXComponentSurfaceSize({ surfaceWidth: 1920, surfaceHeight: 1080 });
          // 獲取Surface ID
          this.xComponentSurfaceId = this.mXComponentController.getXComponentSurfaceId();
        })// 插件卸載完成時回調事件
        .onDestroy(() => {

        })
        .width("100%")
        .height(display.getDefaultDisplaySync().width * 9 / 16)

實現雙路預覽。

import camera from '@ohos.multimedia.camera';


async createDualChannelPreview(cameraManager: camera.CameraManager, XComponentSurfaceId: string, receiver: image.ImageReceiver): Promise<void> {
    // 獲取支持的相機設備對象
    let camerasDevices: Array<camera.CameraDevice> = cameraManager.getSupportedCameras();

    // 獲取支持的模式類型
    let sceneModes: Array<camera.SceneMode> = cameraManager.getSupportedSceneModes(camerasDevices[0]);
    let isSupportPhotoMode: boolean = sceneModes.indexOf(camera.SceneMode.NORMAL_PHOTO) >= 0;
    if (!isSupportPhotoMode) {
      console.error('photo mode not support');
      return;
    }

    // 獲取profile對象
    let profiles: camera.CameraOutputCapability = cameraManager.getSupportedOutputCapability(camerasDevices[0], camera.SceneMode.NORMAL_PHOTO); // 獲取對應相機設備profiles
    let previewProfiles: Array<camera.Profile> = profiles.previewProfiles;

    // 預覽流1
    let previewProfilesObj: camera.Profile = previewProfiles[0];

    // 預覽流2
    let previewProfilesObj2: camera.Profile = previewProfiles[0];

    // 創建 預覽流1 輸出對象
    let previewOutput: camera.PreviewOutput = cameraManager.createPreviewOutput(previewProfilesObj, XComponentSurfaceId);

    // 創建 預覽流2 輸出對象
    let imageReceiverSurfaceId: string = await receiver.getReceivingSurfaceId();
    let previewOutput2: camera.PreviewOutput = cameraManager.createPreviewOutput(previewProfilesObj2, imageReceiverSurfaceId);

    // 創建cameraInput對象
    let cameraInput: camera.CameraInput = cameraManager.createCameraInput(camerasDevices[0]);

    // 打開相機
    await cameraInput.open();

    // 會話流程
    let photoSession: camera.PhotoSession = cameraManager.createSession(camera.SceneMode.NORMAL_PHOTO) as camera.PhotoSession;

    // 開始配置會話
    photoSession.beginConfig();

    // 把CameraInput加入到會話
    photoSession.addInput(cameraInput);

    // 把 預覽流1 加入到會話
    photoSession.addOutput(previewOutput);

    // 把 預覽流2 加入到會話
    photoSession.addOutput(previewOutput2);

    // 提交配置信息
    await photoSession.commitConfig();

    // 會話開始
    await photoSession.start();
  }

通過ImageReceiver實時獲取預覽圖像。

onImageArrival(receiver: image.ImageReceiver): void {
  receiver.on('imageArrival', () => {
    receiver.readNextImage((err: BusinessError, nextImage: image.Image) => {
      if (err || nextImage === undefined) {
        console.error('readNextImage failed');
        return;
      }
      nextImage.getComponent(image.ComponentType.JPEG, async (err: BusinessError, imgComponent: image.Component) => {
        if (err || imgComponent === undefined) {
          console.error('getComponent failed');
        }
        if (imgComponent && imgComponent.byteBuffer as ArrayBuffer) {
          let imageArrayBuffer = imgComponent.byteBuffer as ArrayBuffer;
          console.log("得到圖片數據:" + JSON.stringify(imageArrayBuffer))
          console.log("圖片數據長度:" + imageArrayBuffer.byteLength)
          
          //TODO：OCR識別

        } else {
          console.error('byteBuffer is null');
        }
        nextImage.release();
      })
    })
  })
}

最后，我們對預覽返回進行文字識別。預覽返回的結果imageArrayBuffer的類型為ArrayBuffer，我們需要將其轉換為PixelMap類，然后再調用recognizeText()識別。

// 轉換圖片格式為PixelMap，并識別其中的文字
let opts: image.InitializationOptions = {
  editable: true,
  pixelFormat: 3,
  size: { height: 320, width: 320 }
}
image.createPixelMap(imageArrayBuffer, opts).then((pixelMap: image.PixelMap) => {
  console.info('Succeeded in creating pixelmap.');

  ImageOCRUtil.recognizeText(pixelMap, (res: string) => {
    console.info("識別結果:" + res);
  });
  }).catch((error: BusinessError) => {
    console.error('Failed to create pixelmap.');
})

這樣，運行XComponentPage時，打開預覽對準包含文字的物體，就可從Log中看到識別的文字信息。

完整代碼見 -> hosgo-vision

擁抱鴻蒙，擁抱未來，選擇遠方，風雨兼程。

參考

機器學習-基礎視覺服務（ArkTS）
指南-Core Vision Kit
通用文字識別
雙路預覽(ArkTS)

posted @ 2025-06-03 15:36 鄭知魚閱讀(314) 評論(0) 收藏舉報

刷新頁面返回頂部

鴻蒙布道師-鄭知魚

吾非魚，亦可知魚之樂。