【ESP32 在線語音】音頻接收的緩存機制和網絡發送分包機制

首先是初始化 I2S 設備中，可能用到了緩存

//初始化 I2S 設備 INMP441
  Serial.println("Setup I2S ...");
  i2s_install();
  i2s_setpin();
  esp_err_t err = i2s_start(I2S_PORT_0);

其中的 i2s_install() 配置了 i2s 的相關設置，函數具體內容如下：

/**
 * @brief 配置 i2s 參數
 * 
 */
void i2s_install()
{
  const i2s_config_t i2s_config = {
      .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
      .sample_rate = SAMPLE_RATE,
      .bits_per_sample = i2s_bits_per_sample_t(16),
      .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
      .communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_STAND_I2S),
      .intr_alloc_flags = 0, // default interrupt priority
      .dma_buf_count = 8,
      .dma_buf_len = 1024,
      .use_apll = false};

  esp_err_t err = i2s_driver_install(I2S_PORT_0, &i2s_config, 0, NULL);
  if (err != ESP_OK)
  {
    Serial.printf("I2S driver install failed (I2S_PORT_0): %d\n", err);
    while (true)
        ;
  }
  else
  {
    Serial.printf("I2S driver install OK\r\n");
  }
}

其中，與緩存有關的包括下面的結構體成員和 i2s_driver_install() 函數：

     .dma_buf_count = 8,
      .dma_buf_len = 1024,


    int                     dma_buf_count;              /**< The total number of DMA buffers to receive/transmit data.
                                                          * A descriptor includes some information such as buffer address,
                                                          * the address of the next descriptor, and the buffer length.
                                                          * Since one descriptor points to one buffer, therefore, 'dma_desc_num' can be interpreted as the total number of DMA buffers used to store data from DMA interrupt.
                                                          * Notice that these buffers are internal to'i2s_read' and descriptors are created automatically inside of the I2S driver.
                                                          * Users only need to set the buffer number while the length is derived from the parameter described below.
                                                          */
    int                     dma_buf_len;                /**< Number of frames in a DMA buffer.
                                                          *  A frame means the data of all channels in a WS cycle.
                                                          *  The real_dma_buf_size = dma_buf_len * chan_num * bits_per_chan / 8.
                                                          *  For example, if two channels in stereo mode (i.e., 'channel_format' is set to 'I2S_CHANNEL_FMT_RIGHT_LEFT') are active,
                                                          *  and each channel transfers 32 bits (i.e., 'bits_per_sample' is set to 'I2S_BITS_PER_CHAN_32BIT'),
                                                          *  then the total number of bytes of a frame is 'channel_format' * 'bits_per_sample' = 2 * 32 / 8 = 8 bytes.
                                                          *  We assume that the current 'dma_buf_len' is 100, then the real length of the DMA buffer is 8 * 100 = 800 bytes.
                                                          *  Note that the length of an internal real DMA buffer shouldn't be greater than 4092.
                                                          */

從注釋（Notice that these buffers are internal to'i2s_read）可以得到 , DMA功能的使用，指定即可，而不需要自己實現——內部會自動完成相關的配置。

i2s_driver_install() 函數定義如下

/**
 * @brief Install and start I2S driver.
 *
 * @param i2s_num         I2S port number
 *
 * @param i2s_config      I2S configurations - see i2s_config_t struct
 *
 * @param queue_size      I2S event queue size/depth.
 *i2s 隊列尺寸           I2S事件隊列尺寸
 * @param i2s_queue       I2S event queue handle, if set NULL, driver will not use an event queue.
 * /*i2s 隊列            I2S事件隊列句柄
 * This function must be called before any I2S driver read/write operations.
 *
 * @return
 *     - ESP_OK              Success
 *     - ESP_ERR_INVALID_ARG Parameter error
 *     - ESP_ERR_NO_MEM      Out of memory
 *     - ESP_ERR_INVALID_STATE  Current I2S port is in use
 */
esp_err_t i2s_driver_install(i2s_port_t i2s_num, const i2s_config_t *i2s_config, int queue_size, void *i2s_queue);

實際代碼中沒有使用，暫時也不知道所謂的 I2S 時間是什么？（莫非可能自動檢測VAD端點檢測？）

I2S讀取真正的過程如下：

/**
       * @brief  
       * recordingSize：采樣得到的字節數的1/2 或者說uint16_t 的個數(每次采樣，深度為16bit，得到2字節)
       * recordTimeSeconds：此處為3s
       * SAMPLE_RATE：此處為16k 
       * recordTimeSeconds * SAMPLE_RATE 表示i2s采集，經過3s所得到的 uint16_t 的個數 
       */
while (recordingSize < recordTimeSeconds * SAMPLE_RATE)
{
    // 開始循環錄音，將錄制結果保存在pcm_data中
    esp_err_t result = i2s_read(I2S_PORT_0, audioData, sizeof(audioData), &bytes_read, portMAX_DELAY);
    memcpy(pcm_data + recordingSize/*pcm_data 是 uint16_t 類型的指針，因此recordingSize不需要乘2*/, 
           audioData, 
           bytes_read);
    recordingSize += bytes_read / 2; //此處除以2，很有意思
}

這段代碼一直讀取 i2s 數據到 audioData 中，并將讀取到的字節數賦值給 bytes_read，直到讀取到3s的音頻為止（前提：在 i2s 初始化中，已經將采樣率設置為了 16k）

其中主要用到了 i2s_read() 函數。

/**
 * @brief Read data from I2S DMA receive buffer
 *
 * @param i2s_num         I2S port number
 *
 * @param dest            Destination address to read into
 * 數據的目的地            
 * @param size            Size of data in bytes
 * 數據目的地的容量
 * @param[out] bytes_read Number of bytes read, if timeout, bytes read will be less than the size passed in.
 * 實際讀出數據
 * @param ticks_to_wait   RX buffer wait timeout in RTOS ticks. If this many ticks pass without bytes becoming available in the DMA receive buffer, then the function will return (note that if data is read from the DMA buffer in pieces, the overall operation may still take longer than this timeout.) Pass portMAX_DELAY for no timeout.
 *
 * @note If the built-in ADC mode is enabled, we should call i2s_adc_enable and i2s_adc_disable around the whole reading process,
 *       to prevent the data getting corrupted.
 *
 * @return
 *     - ESP_OK               Success
 *     - ESP_ERR_INVALID_ARG  Parameter error
 */
esp_err_t i2s_read(i2s_port_t i2s_num, void *dest, size_t size, size_t *bytes_read, TickType_t ticks_to_wait);

代碼展望：

阻塞式讀取的風險：使用portMAX_DELAY意味著無限期等待數據，在實時系統中可能造成線程阻塞，缺乏超時保護機制。
內存拷貝的性能損耗：每次讀取都執行memcpy到全局緩沖區，在高速音頻流處理中可能成為性能瓶頸，直接DMA傳輸會是更優選擇。

以下是訊飛星火語音聽寫的接口調用流程：從下面可以看到，上傳音頻的時候還需要特殊的處理 https://www.xfyun.cn/doc/asr/voicedictation/API.html#%E6%8E%A5%E5%8F%A3%E8%B0%83%E7%94%A8%E6%B5%81%E7%A8%8B

由于采集到的是PCM格式的數據，因此設置每次傳輸 1280 字節數據（實測：一次發送1280x8字節數據是可行的，而且反應速度更快）

另外：需要科普一下"幀"的概念：從一個問題引出——16k 采樣率單聲道，16bit深度的PCM音頻，其一幀大小是多少

重要澄清：幀 vs. 數據包

在實際編程和處理中（例如使用WebRTC、Opus編碼器等），人們常說的“幀”可能指的是一個時間塊，比如20ms的音頻數據。這更像一個數據包。

我們來計算一下一個 20ms的數據包 有多大：

計算20ms內的樣本數量：
采樣率 × 時間 = 16000 樣本/秒 × 0.020 秒 = 320 個樣本
計算這個數據包的總大小：
樣本數量 × 每樣本字節數 × 聲道數 = 320 × 2 字節 × 1 = 640 字節

所以，雖然技術上一幀是2字節，但在討論網絡傳輸或編碼時，一個包含320個樣本（即320幀）、大小為640字節的數據塊，也經常被稱作“一幀音頻”或“一個音頻包”。

（對于科大訊飛星火模型而言，其時間塊長度應該為40ms，所以文檔中寫為 1280 字節）

簡短回答

對于 16kHz, 單聲道, 16bit 的PCM音頻，一幀的大小是 2 字節。

詳細解釋

要理解這個答案，我們需要先明確幾個概念：

采樣率：每秒采集（或播放）多少個樣本。16kHz 表示每秒 16,000 個樣本。
聲道：單聲道表示只有1個聲音通道。
位深度：每個樣本用多少比特來表示。16bit 表示每個樣本占用 16 個比特。
幀：在PCM音頻中，一幀等于一個采樣點在所有聲道上的數據總和。

計算步驟

對于單聲道音頻，計算非常簡單：

一幀大小（字節） = (位深度 / 8) × 聲道數

位深度（字節）：16 bit / 8 = 2 字節（因為1字節=8比特，這是每個樣本的大小）
聲道數：1

所以：
一幀大小 = 2 字節/樣本 × 1 聲道 = 2 字節

總結

從純PCM格式定義上講：一幀 = 2 字節。
- 這是音頻處理底層（如直接操作PCM數組）時最精確的概念。
從實時音頻傳輸/編碼的上下文上講：一幀（或一個包）可能指 640 字節（對應20ms時長）。
- 這是在高層次應用（如WebRTC, VoIP）中更常見的說法。

在與人溝通或編寫代碼時，請務必根據上下文確認“幀”的具體含義。如果您是在處理原始的PCM字節流，那么 2字節 是正確的答案。

posted @ 2025-10-29 01:19 FBshark 閱讀(18) 評論(0) 收藏舉報

刷新頁面返回頂部

FBshark

【ESP32 在線語音】音頻接收的緩存機制和網絡發送分包機制

重要澄清：幀 vs. 數據包

簡短回答

詳細解釋

計算步驟

總結

公告