HPCG基準測試的幾種執行方式

0 HPCG簡介

HPCG（High Performance Conjugate Gradients）基準測試是一個高性能計算性能評估工具，它主要用于衡量超級計算機在稀疏矩陣、內存訪問密集型任務下的真實性能，比傳統的 HPL（LINPACK）更貼近很多科學與工程計算場景。

HPL（LINPACK）側重浮點運算能力（FLOPS），適合反映處理器的峰值計算能力，但偏向計算密集型任務

HPCG 重點考察：

稀疏矩陣存取
內存帶寬
緩存效率
通信延遲

因此，HPCG 的成績通常是 HPL 的 0.3%~4% 左右，更接近真實 HPC 應用性能。

HPCG 實現了預條件共軛梯度法 (Preconditioned Conjugate Gradient) 求解三維泊松方程的迭代過程，包含：

稀疏矩陣-向量乘法 (SpMV)
向量更新 (AXPY)
點積 (Dot Product)
全局通信（MPI Allreduce）
多重網格預條件器

1 標準源代碼執行

安裝依賴庫、clone代碼、拷貝編譯配置文件

# dnf install -y gcc gcc-c++ make cmake openmpi openmpi-devel
# git clone https://github.com/hpcg-benchmark/hpcg.git
# cd setup
# cp Make.Linux_MPI Make.kunpeng

修改編譯配置文件

# vim setup_make.kunpeng
#HEADER
#  -- High Performance Conjugate Gradient Benchmark (HPCG)
#     HPCG - 3.1 - March 28, 2019

#     Michael A. Heroux
#     Scalable Algorithms Group, Computing Research Division
#     Sandia National Laboratories, Albuquerque, NM
#
#     Piotr Luszczek
#     Jack Dongarra
#     University of Tennessee, Knoxville
#     Innovative Computing Laboratory
#
#     (C) Copyright 2013-2019 All Rights Reserved
#
#
#  -- Copyright notice and Licensing terms:
#
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:
#
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.
#
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution.
#
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:
#  This  product  includes  software  developed  at Sandia National
#  Laboratories, Albuquerque, NM and the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratory.
#
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.
#
#  -- Disclaimer:
#
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#@HEADER
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -s -f
MKDIR        = mkdir -p
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - HPCG Directory Structure / HPCG library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = .
SRCdir       = $(TOPdir)/src
INCdir       = $(TOPdir)/src
BINdir       = $(TOPdir)/bin
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        = /usr/lib64/openmpi
MPinc        = -I$(MPdir)/include
MPlib        = -L$(MPdir)/lib -lmpi
#
#
# ----------------------------------------------------------------------
# - HPCG includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc)
HPCG_LIBS     =
#
# - Compile time options -----------------------------------------------
#
# -DHPCG_NO_MPI         Define to disable MPI
# -DHPCG_NO_OPENMP      Define to disable OPENMP
# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous
# -DHPCG_DEBUG          Define to enable debugging output
# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output
#
# By default HPCG will:
#    *) Build with MPI enabled.
#    *) Build with OpenMP enabled.
#    *) Not generate debugging output.
#
HPCG_OPTS     = -DHPCG_NO_OPENMP
#
# ----------------------------------------------------------------------
#
HPCG_DEFS     = $(HPCG_OPTS) $(HPCG_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CXX          = mpicxx
#CXXFLAGS     = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CXXFLAGS     = -O3 -march=armv8-a
#
LINKER       = $(CXX)
LINKFLAGS    = $(CXXFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
USE_CUDA = 0
#
# ----------------------------------------------------------------------

#

注意這里禁用了CUDA

編譯

cd ..
make arch=kunpeng
cd bin

準備執行配置文件

默認的配置文件： hpcg.dat

HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
104 104 104
60

我們的配置文件： hpcg.dat

HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300

注意上面最后一行表示執行時間，建議是1800s起，300s以下可能會不準。

執行：

# mpirun --allow-run-as-root --mca pml ob1  -np 64 ./xhpcg
]# tail HPCG-Benchmark_3.1_2025-07-08_10-21-09.txt
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=2.58093
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=16.1137
Final Summary::HPCG 2.4 rating for historical reasons is=16.1678
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=310.259
Final Summary::Official results execution time (sec) must be at least=1800
# cat hpcg20250708T100940.txt
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [0] Number of Iterations [11] Scaled Residual [1.12102e-13]
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [1] Number of Iterations [11] Scaled Residual [1.12102e-13]
Call [0] Number of Iterations [2] Scaled Residual [2.79999e-17]
Call [1] Number of Iterations [2] Scaled Residual [2.79999e-17]
Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 7.31869e-10
Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 5.92074e-11
SpMV call [0] Residual [0]
SpMV call [1] Residual [0]
Call [0] Scaled Residual [0.00454823]
Call [1] Scaled Residual [0.00454823]

重要參數如下：

此機的HPCG結果為：16.1137 GFLOP/s

allow-run-as-root：這個參數明確告訴 mpirun 允許以 root 用戶身份運行程序。

默認情況下，許多 MPI 實現（尤其是 Open MPI）會出于安全考慮，阻止 root 用戶直接運行并行應用程序，因為這可能帶來安全風險，尤其是在共享集群環境中。如果您的程序需要 root 權限才能運行（或者您正在以 root 身份直接執行 mpirun 命令），但 MPI 庫又默認禁止 root 運行，就會出現權限相關的錯誤。添加此參數可以繞過此安全檢查。

在非生產環境或測試中，如果確實需要 root 權限，可以使用此參數。但在生產環境或多用戶共享集群中，不推薦以 root 身份運行計算任務，通常應該使用普通用戶賬戶。

mca pml ob1：這個參數是 MPI Component Architecture (MCA) 的一個選項，用于選擇點對點通信層 (PML - Point-to-Point Messaging Layer) 的具體實現。ob1 是 Open MPI 中一個常用的 PML 組件。

Open MPI 具有高度模塊化的架構，允許用戶為不同的功能（如點對點通信、集體通信、進程管理等）選擇不同的組件。ob1 是 Open MPI 中默認的、也是最常用的 PML 組件之一。它通常使用各種網絡接口（如 InfiniBand、Ethernet 等）進行通信。

顯式指定 ob1 通常是為了確保使用特定的通信機制，或者解決與默認 PML 相關的兼容性/性能問題。在大多數情況下，如果您不指定，mpirun 也會默認使用 ob1，但顯式指定可以確保行為一致性。

np 64：這個參數指定了要啟動的 MPI 進程（或稱為“秩”或“rank”）的數量。

MPI 程序通過在多個進程之間分配任務來實現并行。每個進程都有一個唯一的 ID (rank)，從 0 到 np-1。

64 表示您希望 xhpcg 程序以 64 個并行進程運行。這些進程可以分布在多個計算節點上，也可以全部運行在單個節點上，這取決于您的 hosts 文件配置和 mpirun 的其他資源調度參數。

參考資料

軟件測試精品書籍文檔下載持續更新 https://github.com/china-testing/python-testing-examples 請點贊，謝謝！
本文涉及的python測試開發庫謝謝點贊！ https://github.com/china-testing/python_cn_resouce
python精品書籍下載 https://github.com/china-testing/python_cn_resouce/blob/main/python_good_books.md
Linux精品書籍下載 http://www.rzrgm.cn/testing-/p/17438558.html
python八字排盤 https://github.com/china-testing/bazi
聯系方式：釘ding或V信: pythontesting
https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
https://developer.nvidia.com/nvidia-hpc-benchmarks-downloads?target_os=Linux&target_arch=x86_64
https://github.com/davidrohr/hpl-gpu
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
https://github.com/NVIDIA/nvidia-hpcg
https://www.amd.com/en/developer/zen-software-studio/applications/pre-built-applications/zen-hpl.html
https://www.netlib.org/benchmark/hpl/

2 Phoronix Test Suite執行

安裝Phoronix Test Suite參見：http://www.rzrgm.cn/testing-/p/18303322

安裝 hpcg

# phoronix-test-suite install hpcg
# cd /var/lib/phoronix-test-suite/test-profiles/pts/hpcg-1.3.0

修改配置文件

test-definition.xml的內容：

<?xml version="1.0"?>
<!--Phoronix Test Suite v10.8.4-->
<PhoronixTestSuite>
  <TestInformation>
    <Title>High Performance Conjugate Gradient</Title>
    <AppVersion>3.1</AppVersion>
    <Description>HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC.</Description>
    <ResultScale>GFLOP/s</ResultScale>
    <Proportion>HIB</Proportion>
    <TimesToRun>1</TimesToRun>
  </TestInformation>
  <TestProfile>
    <Version>1.3.0</Version>
    <SupportedPlatforms>Linux</SupportedPlatforms>
    <SoftwareType>Benchmark</SoftwareType>
    <TestType>Processor</TestType>
    <License>Free</License>
    <Status>Verified</Status>
    <ExternalDependencies>build-utilities, fortran-compiler, openmpi-development</ExternalDependencies>
    <EnvironmentSize>2.4</EnvironmentSize>
    <ProjectURL>http://www.hpcg-benchmark.org/</ProjectURL>
    <RepositoryURL>https://github.com/hpcg-benchmark/hpcg</RepositoryURL>
    <InternalTags>SMP, MPI</InternalTags>
    <Maintainer>Michael Larabel</Maintainer>
  </TestProfile>
  <TestSettings>
    <Option>
      <DisplayName>X Y Z</DisplayName>
      <Identifier>xyz</Identifier>
      <Menu>
        <Entry>
          <Name>104 104 104</Name>
          <Value>--nx=104 --ny=104 --nz=104</Value>
        </Entry>
        <Entry>
          <Name>144 144 144</Name>
          <Value>--nx=144 --ny=144 --nz=144</Value>
        </Entry>
        <Entry>
          <Name>160 160 160</Name>
          <Value>--nx=160 --ny=160 --nz=160</Value>
        </Entry>
        <Entry>
          <Name>192 192 192</Name>
          <Value>--nx=192 --ny=192 --nz=192</Value>
        </Entry>
      </Menu>
    </Option>
    <Option>
      <DisplayName>RT</DisplayName>
      <Identifier>time</Identifier>
      <ArgumentPrefix>--rt=</ArgumentPrefix>
      <Menu>
        <Entry>
          <Name>300</Name>
          <Value>300</Value>
          <Message>Shorter run-time</Message>
        </Entry>
        <Entry>
          <Name>1800</Name>
          <Value>1800</Value>
          <Message>Official run-time</Message>
        </Entry>
      </Menu>
    </Option>
  </TestSettings>
</PhoronixTestSuite>

執行測試

# phoronix-test-suite benchmark hpcg

    Evaluating External Test Dependencies .......................................................................................................................

Phoronix Test Suite v10.8.4

    Installed:     pts/hpcg-1.3.0


High Performance Conjugate Gradient 3.1:
    pts/hpcg-1.3.0
    Processor Test Configuration
        1: 104 104 104
        2: 144 144 144
        3: 160 160 160
        4: 192 192 192
        5: Test All Options
        ** Multiple items can be selected, delimit by a comma. **
        X Y Z: 1


        1: 300  [Shorter run-time]
        2: 1800 [Official run-time]
        3: Test All Options
        ** Multiple items can be selected, delimit by a comma. **
        RT: 1


System Information


  PROCESSOR:              ARMv8 @ 2.90GHz
    Core Count:           128
    Cache Size:           224 MB
    Scaling Driver:       cppc_cpufreq performance

  GRAPHICS:               Huawei Hi171x [iBMC Intelligent Management chip w/VGA support]
    Screen:               1024x768

  MOTHERBOARD:            WUZHOU BC83AMDAA01-7270Z
    BIOS Version:         11.62
    Chipset:              Huawei HiSilicon
    Network:              6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892

  MEMORY:                 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET

  DISK:                   2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N
    File-System:          xfs
    Mount Options:        attr2 inode64 noquota relatime rw
    Disk Scheduler:       MQ-DEADLINE
    Disk Details:         Block Size: 4096

  OPERATING SYSTEM:       Kylin Linux Advanced Server V10
    Kernel:               4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314
    Display Server:       X Server 1.20.8
    Compiler:             GCC 7.3.0 + CUDA 12.8
    Security:             itlb_multihit: Not affected
                          + l1tf: Not affected
                          + mds: Not affected
                          + meltdown: Not affected
                          + mmio_stale_data: Not affected
                          + spec_store_bypass: Mitigation of SSB disabled via prctl
                          + spectre_v1: Mitigation of __user pointer sanitization
                          + spectre_v2: Not affected
                          + srbds: Not affected
                          + tsx_async_abort: Not affected

    Would you like to save these test results (Y/n): y
    Enter a name for the result file: hpcg_45_31
    Enter a unique name to describe this test run / configuration:

If desired, enter a new description below to better describe this result set / system configuration under test.
Press ENTER to proceed without changes.

Current Description: ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.

New Description:

High Performance Conjugate Gradient 3.1:
    pts/hpcg-1.3.0 [X Y Z: 104 104 104 - RT: 300]
    Test 1 of 1
    Estimated Trial Run Count:    1
    Estimated Time To Completion: 38 Minutes [03:16 CDT]
        Started Run 1 @ 02:39:00

    X Y Z: 104 104 104 - RT: 300:
        69.8633

    Average: 69.8633 GFLOP/s

    Do you want to view the text results of the testing (Y/n): Y
hpcg_45_31
ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.


ARMv8:

        Processor: ARMv8 @ 2.90GHz (128 Cores), Motherboard: WUZHOU BC83AMDAA01-7270Z (11.62 BIOS), Chipset: Huawei HiSilicon, Memory: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET, Disk: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N, Graphics: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support], Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892

        OS: Kylin Linux Advanced Server V10, Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314, Display Server: X Server 1.20.8, Compiler: GCC 7.3.0 + CUDA 12.8, File-System: xfs, Screen Resolution: 1024x768


    High Performance Conjugate Gradient 3.1
    X Y Z: 104 104 104 - RT: 300
    GFLOP/s > Higher Is Better
    ARMv8 . 69.86 |==================================================================================================================================================

    Would you like to upload the results to OpenBenchmarking.org (y/n): y
    Would you like to attach the system logs (lspci, dmesg, lsusb, etc) to the test result (y/n): y

    Results Uploaded To: https://openbenchmarking.org/result/2507083-NE-HPCG4531360

可用瀏覽器查看測試結果：

3 NVIDIA HPC Benchmarks

# docker pull nvcr.io/nvidia/hpc-benchmarks:25.04
# vi HPCG.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300
# docker run --rm --gpus all --ipc=host --ulimit memlock=-1:-1 \
       -v $(pwd):/host_data \
       nvcr.io/nvidia/hpc-benchmarks:25.04 \
       mpirun -np 1 \
       /workspace/hpcg.sh \
       --dat /host_data/HPCG.dat \
       --cpu-affinity 0 \
       --gpu-affinity 0

=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.04
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: No InfiniBand devices detected.
         Multi-node communication performance may be reduced.
         Ensure /dev/infiniband is mounted to this container.

HPCG-NVIDIA 25.4.0  -- NVIDIA accelerated HPCG benchmark -- NVIDIA
Build v0.5.6

Start of application (GPU-Only) ...
Initial Residual = 2838.81
Iteration = 1   Scaled Residual = 0.185703
Iteration = 2   Scaled Residual = 0.101681
...
Iteration = 50   Scaled Residual = 3.94531e-07
GPU Rank Info:
 | cuSPARSE version 12.5
 | Reference CPU memory = 935.79 MB
 | GPU Name: 'NVIDIA GeForce RTX 4090'
 | GPU Memory Use: 2223 MB / 24082 MB
 | Process Grid: 1x1x1
 | Local Domain: 128x128x128
 | Number of CPU Threads: 1
 | Slice Size: 2048
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [0] Number of Iterations [11] Scaled Residual [1.19242e-14]
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [1] Number of Iterations [11] Scaled Residual [1.19242e-14]
Call [0] Number of Iterations [1] Scaled Residual [2.94233e-16]
Call [1] Number of Iterations [1] Scaled Residual [2.94233e-16]
Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 8.42084e-10
Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 4.21042e-10
SpMV call [0] Residual [0]
SpMV call [1] Residual [0]
Initial Residual = 2838.81
Iteration = 1   Scaled Residual = 0.220178
Iteration = 2   Scaled Residual = 0.118926
...
Iteration = 49   Scaled Residual = 4.98548e-07
Iteration = 50   Scaled Residual = 3.08635e-07
Call [0] Scaled Residual [3.08635e-07]
Call [1] Scaled Residual [3.08635e-07]
Call [2] Scaled Residual [3.08635e-07]
...
Call [1501] Scaled Residual [3.08635e-07]
Call [1502] Scaled Residual [3.08635e-07]
HPCG-Benchmark
version=3.1
Release date=March 28, 2019
Machine Summary=
Machine Summary::Distributed Processes=1
Machine Summary::Threads per processes=1
Global Problem Dimensions=
Global Problem Dimensions::Global nx=128
Global Problem Dimensions::Global ny=128
Global Problem Dimensions::Global nz=128
Processor Dimensions=
Processor Dimensions::npx=1
Processor Dimensions::npy=1
Processor Dimensions::npz=1
Local Domain Dimensions=
Local Domain Dimensions::nx=128
Local Domain Dimensions::ny=128
########## Problem Summary  ##########=
Setup Information=
Setup Information::Setup Time=0.00910214
Linear System Information=
Linear System Information::Number of Equations=2097152
Linear System Information::Number of Nonzero Terms=55742968
Multigrid Information=
Multigrid Information::Number of coarse grid levels=3
Multigrid Information::Coarse Grids=
Multigrid Information::Coarse Grids::Grid Level=1
Multigrid Information::Coarse Grids::Number of Equations=262144
Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=2
Multigrid Information::Coarse Grids::Number of Equations=32768
Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=3
Multigrid Information::Coarse Grids::Number of Equations=4096
Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
########## Memory Use Summary  ##########=
Memory Use Information=
Memory Use Information::Total memory used for data (Gbytes)=1.49883
Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0
Memory Use Information::Bytes per equation (Total memory / Number of Equations)=714.697
Memory Use Information::Memory used for linear system and CG (Gbytes)=1.31912
Memory Use Information::Coarse Grids=
Memory Use Information::Coarse Grids::Grid Level=1
Memory Use Information::Coarse Grids::Memory used=0.15755
Memory Use Information::Coarse Grids::Grid Level=2
Memory Use Information::Coarse Grids::Memory used=0.0196946
Memory Use Information::Coarse Grids::Grid Level=3
Memory Use Information::Coarse Grids::Memory used=0.00246271
########## V&V Testing Summary  ##########=
Spectral Convergence Tests=
Spectral Convergence Tests::Result=PASSED
Spectral Convergence Tests::Unpreconditioned=
Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11
Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12
Spectral Convergence Tests::Preconditioned=
Spectral Convergence Tests::Preconditioned::Maximum iteration count=1
Spectral Convergence Tests::Preconditioned::Expected iteration count=2
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon=
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSED
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=8.42084e-10
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.21042e-10
########## Iterations Summary  ##########=
Iteration Count Information=
Iteration Count Information::Result=PASSED
Iteration Count Information::Reference CG iterations per set=50
Iteration Count Information::Optimized CG iterations per set=50
Iteration Count Information::Total number of reference iterations=75150
Iteration Count Information::Total number of optimized iterations=75150
########## Reproducibility Summary  ##########=
Reproducibility Information=
Reproducibility Information::Result=PASSED
Reproducibility Information::Scaled residual mean=3.08635e-07
Reproducibility Information::Scaled residual variance=0
########## Performance Summary (times in sec) ##########=
Benchmark Time Summary=
Benchmark Time Summary::Optimization phase=0.017375
Benchmark Time Summary::DDOT=6.03317
Benchmark Time Summary::WAXPBY=6.80771
Benchmark Time Summary::SpMV=58.5598
Benchmark Time Summary::MG=227.166
Benchmark Time Summary::Total=298.585
Floating Point Operations Summary=
Floating Point Operations Summary::Raw DDOT=9.5191e+11
Floating Point Operations Summary::Raw WAXPBY=9.5191e+11
Floating Point Operations Summary::Raw SpMV=8.54573e+12
Floating Point Operations Summary::Raw MG=4.76988e+13
Floating Point Operations Summary::Total=5.81484e+13
Floating Point Operations Summary::Total with convergence overhead=5.81484e+13
GB/s Summary=
GB/s Summary::Raw Read B/W=1200
GB/s Summary::Raw Write B/W=277.327
GB/s Summary::Raw Total B/W=1477.32
GB/s Summary::Total with convergence and optimization phase overhead=1457.89
GFLOP/s Summary=
GFLOP/s Summary::Raw DDOT=157.779
GFLOP/s Summary::Raw WAXPBY=139.828
GFLOP/s Summary::Raw SpMV=145.932
GFLOP/s Summary::Raw MG=209.974
GFLOP/s Summary::Raw Total=194.747
GFLOP/s Summary::Total with convergence overhead=194.747
GFLOP/s Summary::Total with convergence and optimization phase overhead=192.185
User Optimization Overheads=
User Optimization Overheads::Optimization phase time (sec)=0.017375
User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=0.0396317
DDOT Timing Variations=
DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.220609
DDOT Timing Variations::Max DDOT MPI_Allreduce time=0.220609
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=0.220609
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=192.185
Final Summary::HPCG 2.4 rating for historical reasons is=193.058
Final Summary::Results are valid but execution time (sec) is=298.585
Final Summary::Official results execution time (sec) must be at least=1800

4 鯤鵬920華為基準

https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/

安裝：

# dnf install golang -y
# export VERSION=3.8.7 && # adjust this as necessary 
# wget https://github.com/hpcng/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz
# tar -xzf singularity-${VERSION}.tar.gz && cd singularity
# ./mconfig && make -C builddir && sudo make -C builddir install

執行：

sudo singularity exec /home/qatest/benchmark/Benchmark-24.0.RC2-OpenEuler22.03-aarch64.sif  mpirun --allow-run-as-root -mca pml ucx -np 128 xhpcg --nx 128 --ny 128 --nz 128 --nt 1800 | tee hpcg.log

沐曦

docker run -it -e HPCC_PERF_DIR=/tmp -u 0 -e HPCC_SMALL_PAGESIZE_ENABLE=1 -e FORCE_ACTIVE_WAIT=2 --device=/dev/dri --device=/dev/htcd --group-add video  image.marsgpu.com/hpc-release/hpcg-x201:0.9.0-2.32.0.3-ubuntu20.04-amd64 sh -c '/opt/HPCG/autotest.sh'

posted @ 2025-08-08 14:04 磁石空杯閱讀(152) 評論(0) 收藏舉報

刷新頁面返回頂部