WebGPU學(xué)習(xí)（六）：學(xué)習(xí)“rotatingCube”示例

大家好，本文學(xué)習(xí)Chrome->webgpu-samplers->rotatingCube示例

大家好，本文學(xué)習(xí)Chrome->webgpu-samplers->rotatingCube示例。

上一篇博文：
WebGPU學(xué)習(xí)（五）: 現(xiàn)代圖形API技術(shù)要點(diǎn)和WebGPU支持情況調(diào)研

下一篇博文：
WebGPU學(xué)習(xí)（七）：學(xué)習(xí)“twoCubes”和“instancedCube”示例

學(xué)習(xí)rotatingCube.ts

我們已經(jīng)學(xué)習(xí)了“繪制三角形”的示例，與它相比，本示例增加了以下的內(nèi)容：

增加一個(gè)uniform buffer object（簡(jiǎn)稱(chēng)為ubo），用于傳輸“model矩陣乘以 view矩陣乘以 projection矩陣”的結(jié)果矩陣（簡(jiǎn)稱(chēng)為mvp矩陣），并在每幀被更新
設(shè)置頂點(diǎn)
開(kāi)啟面剔除
開(kāi)啟深度測(cè)試

下面，我們打開(kāi)rotatingCube.ts文件，依次來(lái)看下新增內(nèi)容：

增加一個(gè)uniform buffer object

介紹

在WebGL 1中，我們通過(guò)uniform1i,uniform4fv等函數(shù)傳遞每個(gè)gameObject對(duì)應(yīng)的uniform變量（如diffuseMap, diffuse color, model matrix等）到shader中。
其中很多相同的值是不需要被傳遞的，舉例如下：
如果gameObject1和gameObject3使用同一個(gè)shader1，它們的diffuse color相同，那么只需要傳遞其中的一個(gè)diffuse color，而在WebGL 1中我們一般把這兩個(gè)diffuse color都傳遞了，造成了重復(fù)的開(kāi)銷(xiāo)。

WebGPU使用uniform buffer object來(lái)傳遞uniform變量。uniform buffer是一個(gè)全局的buffer，我們只需要設(shè)置一次值，然后在每次draw之前，設(shè)置使用的數(shù)據(jù)范圍（通過(guò)offset, size來(lái)設(shè)置），從而復(fù)用相同的數(shù)據(jù)。如果uniform值有變化，則只需要修改uniform buffer對(duì)應(yīng)的數(shù)據(jù)。

在WebGPU中，我們可以把所有g(shù)ameObject的model矩陣設(shè)為一個(gè)ubo，所有相機(jī)的view和projection矩陣設(shè)為一個(gè)ubo，每一種material（如phong material，pbr material等）的數(shù)據(jù)（如diffuse color，specular color等）設(shè)為一個(gè)ubo，每一種light（如direction light、point light等）的數(shù)據(jù)（如light color、light position等）設(shè)為一個(gè)ubo，這樣可以有效減少u(mài)niform變量的傳輸開(kāi)銷(xiāo)。

另外，我們需要注意ubo的內(nèi)存布局：
默認(rèn)的布局為std140，我們可以粗略地理解為，它約定了每一列都有4個(gè)元素。
我們來(lái)舉例說(shuō)明：
下面的ubo對(duì)應(yīng)的uniform block，定義布局為std140：

layout (std140) uniform ExampleBlock
{
    float value;
    vec3  vector;
    mat4  matrix;
    float values[3];
    bool  boolean;
    int   integer;
};

它在內(nèi)存中的實(shí)際布局為：

layout (std140) uniform ExampleBlock
{
                     // base alignment  // aligned offset
    float value;     // 4               // 0 
    vec3 vector;     // 16              // 16  (must be multiple of 16 so 4->16)
    mat4 matrix;     // 16              // 32  (column 0)
                     // 16              // 48  (column 1)
                     // 16              // 64  (column 2)
                     // 16              // 80  (column 3)
    float values[3]; // 16              // 96  (values[0])
                     // 16              // 112 (values[1])
                     // 16              // 128 (values[2])
    bool boolean;    // 4               // 144
    int integer;     // 4               // 148
};

也就是說(shuō)，這個(gè)ubo的第一個(gè)元素為value，第2-4個(gè)元素為0（為了對(duì)齊）；
第5-7個(gè)元素為vector的x、y、z的值，第8個(gè)元素為0；
第9-24個(gè)元素為matrix的值（列優(yōu)先）；
第25-27個(gè)元素為values數(shù)組的值，第28個(gè)元素為0；
第29個(gè)元素為boolean轉(zhuǎn)為float的值，第30-32個(gè)元素為0；
第33個(gè)元素為integer轉(zhuǎn)為float的值，第34-36個(gè)元素為0。

分析本示例對(duì)應(yīng)的代碼

在vertex shader中定義uniform block

代碼如下：

  const vertexShaderGLSL = `#version 450
  layout(set = 0, binding = 0) uniform Uniforms {
    mat4 modelViewProjectionMatrix;
  } uniforms;
  ...
  void main() {
    gl_Position = uniforms.modelViewProjectionMatrix * position;
    fragColor = color;
  }
  `;

布局為默認(rèn)的std140，指定了set和binding，包含一個(gè)mvp矩陣
其中set和binding用來(lái)對(duì)應(yīng)相應(yīng)的數(shù)據(jù)，會(huì)在后面說(shuō)明

創(chuàng)建uniformsBindGroupLayout

代碼如下：

  const uniformsBindGroupLayout = device.createBindGroupLayout({
    bindings: [{
      binding: 0,
      visibility: 1,
      type: "uniform-buffer"
    }]
  });

binding對(duì)應(yīng)vertex shader中uniform block的binding，意思是bindings數(shù)組的第一個(gè)元素的對(duì)應(yīng)binding為0的uniform block

visibility為GPUShaderStage.VERTEX（等于1），指定type為“uniform-buffer”

創(chuàng)建uniform buffer

代碼如下：

  const uniformBufferSize = 4 * 16; // BYTES_PER_ELEMENT(4) * matrix length(4 * 4 = 16)

  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });

創(chuàng)建uniform bind group

代碼如下：

  const uniformBindGroup = device.createBindGroup({
    layout: uniformsBindGroupLayout,
    bindings: [{
      binding: 0,
      resource: {
        buffer: uniformBuffer,
      },
    }],
  });

binding對(duì)應(yīng)vertex shader中uniform block的binding，意思是bindings數(shù)組的第一個(gè)元素的對(duì)應(yīng)binding為0的uniform block

每一幀更新uniform buffer的mvp矩陣數(shù)據(jù)

代碼如下：

  //因?yàn)槭枪潭ㄏ鄼C(jī)，所以只需要計(jì)算一次projection矩陣
  const aspect = Math.abs(canvas.width / canvas.height);
  let projectionMatrix = mat4.create();
  mat4.perspective(projectionMatrix, (2 * Math.PI) / 5, aspect, 1, 100.0);
  
  ...
 
  
  //計(jì)算mvp矩陣
  function getTransformationMatrix() {
    let viewMatrix = mat4.create();
    mat4.translate(viewMatrix, viewMatrix, vec3.fromValues(0, 0, -5));
    let now = Date.now() / 1000;
    mat4.rotate(viewMatrix, viewMatrix, 1, vec3.fromValues(Math.sin(now), Math.cos(now), 0));

    let modelViewProjectionMatrix = mat4.create();
    mat4.multiply(modelViewProjectionMatrix, projectionMatrix, viewMatrix);

    return modelViewProjectionMatrix;
  }
  
  ...
  return function frame() {
    //使用setSubData更新uniform buffer，后面分析
    uniformBuffer.setSubData(0, getTransformationMatrix());
    ...
  }

draw之前設(shè)置bind group

代碼如下：

  return function frame() {
    ...
    //“0”對(duì)應(yīng)vertex shader中uniform block的“set = 0”
    passEncoder.setBindGroup(0, uniformBindGroup);
    passEncoder.draw(36, 1, 0, 0);
    ...
  }

詳細(xì)分析“更新uniform buffer”

本示例使用setSubData來(lái)更新uniform buffer：

  return function frame() {
    uniformBuffer.setSubData(0, getTransformationMatrix());
    ...
  }

我們?cè)?a target="_blank" rel="noopener nofollow">WebGPU學(xué)習(xí)（五）: 現(xiàn)代圖形API技術(shù)要點(diǎn)和WebGPU支持情況調(diào)研->Approaching zero driver overhead->persistent map buffer中，提到了WebGPU目前有兩種方法實(shí)現(xiàn)“CPU把數(shù)據(jù)傳輸?shù)紾PU“，即更新GPUBuffer的值：
1.調(diào)用GPUBuffer->setSubData方法
2.使用persistent map buffer技術(shù)

這里使用了第1種方法。
我們看下如何在本示例中使用第2種方法：

function setBufferDataByPersistentMapBuffer(device, commandEncoder, uniformBufferSize, uniformBuffer, mvpMatricesData) {
    const [srcBuffer, arrayBuffer] = device.createBufferMapped({
        size: uniformBufferSize,
        usage: GPUBufferUsage.COPY_SRC
    });

    new Float32Array(arrayBuffer).set(mvpMatricesData);
    srcBuffer.unmap();

    commandEncoder.copyBufferToBuffer(srcBuffer, 0, uniformBuffer, 0, uniformBufferSize);
    const commandBuffer = commandEncoder.finish();

    const queue = device.defaultQueue;
    queue.submit([commandBuffer]);

    srcBuffer.destroy();
}

return function frame() {
    //uniformBuffer.setSubData(0, getTransformationMatrix());
     ...

    const commandEncoder = device.createCommandEncoder({});

    setBufferDataByPersistentMapBuffer(device, commandEncoder, uniformBufferSize, uniformBuffer, getTransformationMatrix());
     ...
}

為了驗(yàn)證性能，我做了benchmark測(cè)試，創(chuàng)建一個(gè)包含160000個(gè)mat4的ubo，使用這2種方法來(lái)更新uniform buffer，比較它們的js profile：

使用setSubData(調(diào)用setBufferDataBySetSubData函數(shù)):
截屏2019-12-22上午10.09.43.png-38.6kB

setSubData占91.54%

使用persistent map buffer(調(diào)用setBufferDataByPersistentMapBuffer函數(shù)):
截屏2019-12-22上午10.09.50.png-52.9kB

createBufferMapped和setBufferDataByPersistentMapBuffer占72.72+18.06=90.78%

可以看到兩個(gè)的性能差不多。但考慮到persistent map buffer從實(shí)現(xiàn)原理上要更快（cpu和gpu共用一個(gè)buffer，不需要copy），因此應(yīng)該優(yōu)先使用該方法。

另外，WebGPU社區(qū)現(xiàn)在還在討論如何優(yōu)化更新buffer數(shù)據(jù)（如有人提出增加GPUUploadBuffer pass），因此我們還需要繼續(xù)關(guān)注該方面的進(jìn)展。

參考資料

Advanced-GLSL->Uniform buffer objects

設(shè)置頂點(diǎn)

傳輸頂點(diǎn)的position和color數(shù)據(jù)到vertex shader的attribute（在glsl 4.5中用“in”表示attribute）中

代碼如下：

  const vertexShaderGLSL = `#version 450
  ...
  layout(location = 0) in vec4 position;
  layout(location = 1) in vec4 color;
  layout(location = 0) out vec4 fragColor;
  void main() {
    gl_Position = uniforms.modelViewProjectionMatrix * position;
    fragColor = color;
  }
  
  const fragmentShaderGLSL = `#version 450
  layout(location = 0) in vec4 fragColor;
  layout(location = 0) out vec4 outColor;
  void main() {
    outColor = fragColor;
  }
  `;

在vertex shader中設(shè)置color為fragColor（在glsl 4.5中用“out”表示W(wǎng)ebGL 1的varying變量），然后在fragment shader中接收f(shuō)ragColor，將其設(shè)置為outColor，從而將fragment的color設(shè)置為對(duì)應(yīng)頂點(diǎn)的color

創(chuàng)建vertices buffer，設(shè)置立方體的頂點(diǎn)數(shù)據(jù)

代碼如下：

cube.ts:

//每個(gè)頂點(diǎn)包含position,color,uv數(shù)據(jù)
//本示例沒(méi)用到uv數(shù)據(jù)
export const cubeVertexArray = new Float32Array([
    // float4 position, float4 color, float2 uv,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,
    1, -1, -1, 1,  1, 0, 0, 1,  1, 0,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,

    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    1, -1, 1, 1,   1, 0, 1, 1,  0, 1,
    1, -1, -1, 1,  1, 0, 0, 1,  0, 0,
    1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    1, -1, -1, 1,  1, 0, 0, 1,  0, 0,

    -1, 1, 1, 1,   0, 1, 1, 1,  1, 1,
    1, 1, 1, 1,    1, 1, 1, 1,  0, 1,
    1, 1, -1, 1,   1, 1, 0, 1,  0, 0,
    -1, 1, -1, 1,  0, 1, 0, 1,  1, 0,
    -1, 1, 1, 1,   0, 1, 1, 1,  1, 1,
    1, 1, -1, 1,   1, 1, 0, 1,  0, 0,

    -1, -1, 1, 1,  0, 0, 1, 1,  1, 1,
    -1, 1, 1, 1,   0, 1, 1, 1,  0, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
    -1, -1, -1, 1, 0, 0, 0, 1,  1, 0,
    -1, -1, 1, 1,  0, 0, 1, 1,  1, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,

    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,
    -1, 1, 1, 1,   0, 1, 1, 1,  0, 1,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 0,
    -1, -1, 1, 1,  0, 0, 1, 1,  0, 0,
    1, -1, 1, 1,   1, 0, 1, 1,  1, 0,
    1, 1, 1, 1,    1, 1, 1, 1,  1, 1,

    1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
    -1, -1, -1, 1, 0, 0, 0, 1,  0, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
    1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
    1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
    -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
]);

rotatingCube.ts:

  const verticesBuffer = device.createBuffer({
    size: cubeVertexArray.byteLength,
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST
  });
  verticesBuffer.setSubData(0, cubeVertexArray);

因?yàn)橹恍枰O(shè)置一次頂點(diǎn)數(shù)據(jù)，所以這里可以使用setSubData來(lái)設(shè)置GPUBuffer的數(shù)據(jù)，對(duì)性能影響不大

創(chuàng)建render pipeline時(shí)，指定vertex shader的attribute

代碼如下：

cube.ts:

export const cubeVertexSize = 4 * 10; // Byte size of one cube vertex.
export const cubePositionOffset = 0;
export const cubeColorOffset = 4 * 4; // Byte offset of cube vertex color attribute.

rotatingCube.ts:

  const pipeline = device.createRenderPipeline({
    ...
    vertexState: {
      vertexBuffers: [{
        arrayStride: cubeVertexSize,
        attributes: [{
          // position
          shaderLocation: 0,
          offset: cubePositionOffset,
          format: "float4"
        }, {
          // color
          shaderLocation: 1,
          offset: cubeColorOffset,
          format: "float4"
        }]
      }],
    },
    ...
  });

render pass->draw指定頂點(diǎn)個(gè)數(shù)為36

代碼如下：

  return function frame() {
    ...
    const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
    ...
    passEncoder.draw(36, 1, 0, 0);
    passEncoder.endPass();
    ...
  }

開(kāi)啟面剔除

相關(guān)代碼為：

  const pipeline = device.createRenderPipeline({
    ...
    rasterizationState: {
      cullMode: 'back',
    },
    ...
  });

相關(guān)的定義為：

enum GPUFrontFace {
    "ccw",
    "cw"
};
enum GPUCullMode {
    "none",
    "front",
    "back"
};
...

dictionary GPURasterizationStateDescriptor {
    GPUFrontFace frontFace = "ccw";
    GPUCullMode cullMode = "none";
    ...
};

其中ccw表示逆時(shí)針，cw表示順時(shí)針；frontFace用來(lái)設(shè)置哪個(gè)方向是“front”（正面）；cullMode用來(lái)設(shè)置將哪一面剔除掉。

因?yàn)楸臼纠龥](méi)有設(shè)置frontFace，因此frontFace為默認(rèn)的ccw，即將頂點(diǎn)連接的逆時(shí)針?lè)较蛟O(shè)置為正面；
又因?yàn)楸臼纠O(shè)置了cullMode為back，那么反面的頂點(diǎn)（即順時(shí)針連接的頂點(diǎn)）會(huì)被剔除掉。

參考資料

[WebGL入門(mén)]六，頂點(diǎn)和多邊形
 Investigation: Rasterization State

開(kāi)啟深度測(cè)試

現(xiàn)在分析相關(guān)代碼，忽略與模版測(cè)試相關(guān)的代碼：

創(chuàng)建render pipeline時(shí)，設(shè)置depthStencilState

代碼如下：

  const pipeline = device.createRenderPipeline({
    ...
    depthStencilState: {
      //開(kāi)啟深度測(cè)試
      depthWriteEnabled: true,
      //設(shè)置比較函數(shù)為less，后面會(huì)說(shuō)明 
      depthCompare: "less",
      //設(shè)置depth為24bit
      format: "depth24plus-stencil8",
    },
    ...
  });

創(chuàng)建depth texture（注意它的size->depth為1），將它的view設(shè)置為render pass -> depthStencilAttachment -> attachment

代碼如下：

  const depthTexture = device.createTexture({
    size: {
      width: canvas.width,
      height: canvas.height,
      depth: 1
    },
    format: "depth24plus-stencil8",
    usage: GPUTextureUsage.OUTPUT_ATTACHMENT
  });

  const renderPassDescriptor: GPURenderPassDescriptor = {
    ...
    depthStencilAttachment: {
      attachment: depthTexture.createView(),

      depthLoadValue: 1.0,
      depthStoreOp: "store",
      ...
    }
  };

其中，depthStencilAttachment的定義為：

dictionary GPURenderPassDepthStencilAttachmentDescriptor {
    required GPUTextureView attachment;

    required (GPULoadOp or float) depthLoadValue;
    required GPUStoreOp depthStoreOp;
    ...
};

depthLoadValue和depthStoreOp與WebGPU學(xué)習(xí)（二）: 學(xué)習(xí)“繪制一個(gè)三角形”示例->分析render pass->colorAttachment的loadOp和StoreOp類(lèi)似，我們來(lái)看下相關(guān)的代碼：


  const pipeline = device.createRenderPipeline({
    ...
    depthStencilState: {
      ...
      depthCompare: "less",
      ...
    },
    ...
  });
  
  ...

  const renderPassDescriptor: GPURenderPassDescriptor = {
    ...
    depthStencilAttachment: {
      ...
      depthLoadValue: 1.0,
      depthStoreOp: "store",
      ...
    }
  };

在深度測(cè)試時(shí)，gpu會(huì)將fragment的z值（范圍為[0.0-1.0]）與這里設(shè)置的depthLoadValue值（這里為1.0）比較。其中使用depthCompare定義的函數(shù)（這里為less，意思是所有z值大于等于1.0的fragment會(huì)被剔除）進(jìn)行比較。