<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      高性能計算-Intel IPP庫ARM移植示例(20)

      1. 簡介

      (1) Intel? Integrated Performance Primitives,即英特爾集成性能基元(簡稱IPP),為信號、數據和圖像處理特定應用領域,提供simd優化的一組全面的函數庫。

      (2) 本項目將對 exp、cos、sin、tone、Triangle函數用NEON向量化指令實現ARM移植版本,有串行和向量化兩個版本。

      (3)計算使用泰勒展開,使用前后加和項的比例關系,加速計算。達到接口要求的精度即可。

      (4)項目地址:https://github.com/libo-0379/IPP_ARM

      2. 函數及接口介紹

      (1) exp

      Parameters
      pSrc Pointer to the source vector.
      pDst Pointer to the destination vector.
      len Number of elements in the vectors.
      
      IppStatus ippsExp_32f_A24 (const Ipp32f* pSrc, Ipp32f* pDst, Ipp32s len);
      

      (2) cos

      Parameters
      pSrc Pointer to the source vector.
      pDst Pointer to the destination vector.
      len Number of elements in the vectors.
      
      IppStatus ippsCos_32f_A24 (const Ipp32f* pSrc, Ipp32f* pDst, Ipp32s len);
      

      (3) sin

      Parameters
      pSrc Pointer to the source vector.
      pDst Pointer to the destination vector.
      len Number of elements in the vectors.
      
      IppStatus ippsSin_32f_A24 (const Ipp32f* pSrc, Ipp32f* pDst, Ipp32s len);
      

      (4) Tone

      Description:
      This function generates the tone with the specified frequency rFreq, phase pPhase, and magnitude magn.
      The function computes len samples of the tone, and stores them in the array pDst. For real tones, each
      generated value x[n] is defined as:
      x[n] = magn * cos(2πn*rFreq + phase)
      For complex tones, x[n] is defined as:
      x[n] = magn * (cos(2πn*rFreq + phase)+j* sin(2πn*rFreq + phase))
      The parameter hint suggests using specific code, which provides for either fast but less accurate calculation,
      or more accurate but slower execution.
      
      Parameters:
      magn Magnitude of the tone, that is, the maximum value attained by
      the wave.
      pPhase Pointer to the phase of the tone relative to a cosine wave. It
      must be in range [0.0, 2π). You can use the returned value to
      compute the next continuous data block.
      rFreq Frequency of the tone relative to the sampling frequency. It
      must be in the interval [0.0, 0.5) for real tone and in [0.0, 1.0)
      for complex tone.
      pDst Pointer to the array that stores the samples.
      len Number of samples to be computed.
      
      IppStatus ippsTone_32f(Ipp32f* pDst, int len, Ipp32f magn, Ipp32f rFreq, Ipp32f*
      pPhase, IppHintAlgorithm hint);
      

      函數表達式

      x[n] = magn * cos(2πn*rFreq + phase)

      函數圖像:

      image

      (5) Triangle

      Description:
      This function generates the triangle with the specified frequency rFreq, phase pointed by pPhase, and
      magnitude magn. The function computes len samples of the triangle, and stores them in the array pDst. For
      real triangle, x[n] is defined as:
      x[n] = magn * cth(2π* rFreq*n + phase), n = 0, 1, 2,...
      For complex triangles, x[n] is defined as:
      x[n] = magn * [cth(2π* rFreq*n + phase) + j * sth(2π* rFreq*n + phase)], n = 0, 1, 2,...
      See Triangle-Generating Functions for the definition of functions cth and sth.
      The cth () function is determined as follows:
      H = π + h
      
      Parameters:
      rFreq Frequency of the triangle relative to the sampling frequency. It
      must be in range [0.0, 0.5).
      pPhase Pointer to the phase of the triangle relative to a cosine
      triangular analog wave. It must be in range [0.0, 2π). You can
      use the returned value to compute the next continuous data
      block.
      magn Magnitude of the triangle, that is, the maximum value attained
      by the wave.
      asym Asymmetry h of a triangle. It must be in range [-π , π ). If h=0,
      then the triangle is symmetric and a direct analog of a tone.
      pDst Pointer to the array that stores the samples.
      len Number of samples to be computed
      
      IppStatus ippsTriangle_32fc(Ipp32fc* pDst, int len, Ipp32f magn, Ipp32f rFreq, Ipp32f
      asym, Ipp32f* pPhase);
      

      函數實部和虛部

      image

      函數圖像

      image

      3. 核心代碼

      #include <stdio.h>
      #include <math.h>
      #include "../include/ipp.h"
      
      IppStatus ippsExp_32f_A24(const Ipp32f *pSrc, Ipp32f *dst, Ipp32s len)
      {
          for(int i=0;i<len/4*4;i+=4)
          {   
              //每個元素泰勒展開項計數,初始化為0
              float32x4_t vn = vdupq_n_f32(0);
              float32x4_t vx = vld1q_f32(pSrc+i);
              //記錄當前項的展開值,初始化為第一項的值
              float32x4_t vprior = vdupq_n_f32(1.0);
              //保存每個元素展開的求和
              float32x4_t vsum = vprior;
              while(1)
              {
                  //計算當前展開項
                  // 1. 計算 prior * x / ++n
                  vn = vaddq_f32(vn,vdupq_n_f32(1));
                  // 自己實現的 vdivq_f32_neon 對 -20.055 計算誤差太大,改用系統提供接口 vdivq_f32
                  // float32x4_t vcur = vdivq_f32_neon(vmulq_f32(vprior,vx),vn);
                  float32x4_t vcur = vdivq_f32(vmulq_f32(vprior,vx),vn);
                  // 2. 將當前項賦值為前項            
                  vprior = vcur;
                  // printf("0 %f\n",vgetq_lane_f32(vcur,0));
                  // 3. 如果所有元素的最大展開項小于精度控制,不再計算
                  if(vmaxvq_f32(vabsq_f32(vcur)) - EPSILON < 0)
                      break;
                  // 4. 將當前展開項加和
                  vsum = vaddq_f32(vsum,vcur);
              }
              // printf("n %f\n",vgetq_lane_f32(vn,0));
              vst1q_f32(dst+i,vsum);
          }
      
          //處理尾部數據
          for(int i=len/4*4;i<len;i++)
          {
              // 初始化第一個值
              int n = 0;
              double prior = 1.0; //前一項數值初始化為第一項的數值
              double sum = prior; //求和保存結果
              Ipp32f x = *(pSrc+i);
              while (1)
              {
                  double cur = prior * x / ++n;   //當前加項的數值 = 前項*(src[i]/n)
                  prior = cur;
                  // printf("%f\n",cur);
                  if (fabs(cur)-EPSILON <= 0) //如果小于精度值停止計算
                      break;
                  sum += cur;
              }
              // printf("n %d\n",n);
              *(dst+i) = sum;
          }
          return ippStsNoErr;
      }
      
      IppStatus ippsCos_32f_A24(const Ipp32f *pSrc, Ipp32f *dst, Ipp32s len)
      {
          float32x4_t v2PI = vdupq_n_f32(IPP_2PI);
          for(int i=0;i<len/4*4;i+=4)
          {
              //每個元素泰勒展開項計數,初始化為0
              float32x4_t vn = vdupq_n_f32(0);
              float32x4_t vx = vld1q_f32(pSrc+i);
              //記錄當前項的展開值,初始化為第一項的值
              float32x4_t vprior = vdupq_n_f32(1.0);
              //保存每個元素展開的求和
              float32x4_t vsum = vprior;
              //將x轉換到 [-2π,2π]
              //類型轉換向下取整,注意這里是浮點數除法
              float32x4_t vk = vcvtq_f32_s32(vcvtq_s32_f32(vdivq_f32_neon(vx,v2PI)));   
              vx = vfmsq_f32(vx,v2PI,vk);
              while(1)
              {
                  //計算當前展開項
                  // 1. 計算 prior * (x^2) / (2(++n)*(2n-1))
                  vn = vaddq_f32(vn,vdupq_n_f32(1));
                  float32x4_t vcur = vmulq_n_f32(vn,2);
                  vcur = vdivq_f32_neon(vmulq_f32(vprior,vmulq_f32(vx,vx)),vmulq_f32(vcur,vsubq_f32(vcur,vdupq_n_f32(1.0))));
                  //將當前項賦值為前項,不帶符號          
                  vprior = vcur;
                  // 2. 添加正負號
                  float32_t n = vgetq_lane_f32(vn,0);
                  vcur = vmulq_n_f32(vcur,(int)n%2==0?1:-1);
                  // 3. 如果所有元素的展開項絕對值最大項小于精度控制,不再計算
                  if(vmaxvq_f32(vabsq_f32(vcur)) - EPSILON <= 0)
                      break;
                  // 4. 將當前展開項加和
                  vsum = vaddq_f32(vsum,vcur);
              }
              printf("n %f\n",vgetq_lane_f32(vn,0));
              vst1q_f32(dst+i,vsum);
          }
          //處理尾部數據
          for(int i=len/4*4;i<len;i++)
          {
              // 初始化第一個值
              int n = 0;
              double prior = 1.0; //前一項數值初始化為第一項的數值
              double sum = prior; //求和保存結果
              Ipp64f x = *(pSrc+i);
              //對x轉換到 [-2π,2π]
              Ipp32s k = x/IPP_2PI;
              x -= k*IPP_2PI;
              while (1)
              {
                  int32_t temp1 = 2* ++n;
                  double cur = prior * (x * x) /(temp1-- * temp1) ;   //當前加項的數值 = 前項*(x^2/(2n*(2n-1)))
                  prior = cur;
                  if (fabs(cur) - EPSILON <= 0) //如果小于精度值停止計算
                      break;
                  sum += (n%2==0?1:-1)*cur;   //符號位
              }
              // printf("n %d\n",n);
              *(dst+i) = sum;
          }
          return ippStsNoErr;
      }
      
      IppStatus ippsSin_32f_A24(const Ipp32f *pSrc, Ipp32f *dst, Ipp32s len)
      {    
          float32x4_t v2PI = vdupq_n_f32(IPP_2PI);
          for(int i=0;i<len/4*4;i+=4)
          {
              //每個元素泰勒展開項計數,初始化為0
              float32x4_t vn = vdupq_n_f32(0);
              float32x4_t vx = vld1q_f32(pSrc+i);
              //將x轉換到 [-2π,2π]
              //類型轉換向下取整,注意這里是浮點數除法
              float32x4_t vk = vcvtq_f32_s32(vcvtq_s32_f32(vdivq_f32_neon(vx,v2PI)));   
              vx = vfmsq_f32(vx,v2PI,vk);
              // !! 注意這里易錯點,應在區間轉換后再賦值給 vprior 
              //記錄當前項的展開值,初始化為第一項的值
              float32x4_t vprior = vx;
              //保存每個元素展開的求和
              float32x4_t vsum = vprior;
              printf("x0 %f\n",vgetq_lane_f32(vx,0));
              while(1)
              {
                  // 1. 計算 prior * (x^2) / (2(++n)*(2n+1))
                  vn = vaddq_f32(vn,vdupq_n_f32(1));
                  float32x4_t vcur = vmulq_n_f32(vn,2);
                  vcur = vdivq_f32_neon(vmulq_f32(vprior,vmulq_f32(vx,vx)),vmulq_f32(vcur,vaddq_f32(vcur,vdupq_n_f32(1))));
                  // printf("cur0 %f\n",vgetq_lane_f32(vcur,0));
                  //將當前項賦值為前項,不帶符號          
                  vprior = vcur;
                  // 2. 添加正負號
                  float32_t n = vgetq_lane_f32(vn,0);
                  vcur = vmulq_n_f32(vcur,(int)n%2==0?1:-1);
                  // 3.如果所有元素的展開項絕對值最大項小于精度控制,不再計算
                  if(vmaxvq_f32(vabsq_f32(vcur)) - EPSILON <= 0)
                      break;
                  // 4.將當前展開項加和
                  vsum = vaddq_f32(vsum,vcur);
              }
              printf("n %f\n",vgetq_lane_f32(vn,0));
              vst1q_f32(dst+i,vsum);
          }
          //處理尾部數據
          for(int i=len/4*4;i<len;i++)
          {
              // 初始化第一個值
              int n = 0;
              Ipp64f x = *(pSrc+i);
              //將x轉換到 [-2π,2π]
              Ipp32s k = x/IPP_2PI;
              x -= k*IPP_2PI;
              double prior = x; //前一項數值初始化為第一項的數值
              double sum = prior; //求和保存結果
              while (1)
              {
                  int32_t temp1 = 2* ++n;
                  double cur = prior * (x * x) /(temp1++ * temp1) ;   //當前加項的數值 = 前項*(x^2/(2n*(2n+1)))
                  prior = cur;
                  if (fabs(cur) - EPSILON <= 0) //如果小于精度值停止計算
                      break;
                  sum += (n%2==0?1:-1)*cur;   //符號位
              }
              // printf("n %d\n",n);
              *(dst+i) = sum;
          }
          return ippStsNoErr;
      }
      
      IppStatus ippsTone_32f(Ipp32f* pDst, int len, Ipp32f magn, Ipp32f rFreq,
      Ipp32f* pPhase, IppHintAlgorithm hint)
      {
          if(rFreq<0.0 || (rFreq-0.5)>EPSILON)
              return ippStsToneFreqErr;
          if(*pPhase<0.0 || (*pPhase-IPP_2PI)>=EPSILON)
              return ippStsTonePhaseErr;
          if(magn<=0.0)
              return ippStsToneMagnErr;
          
          float32x4_t vphase = vdupq_n_f32(*pPhase);
          float32x4_t vrFreq= vdupq_n_f32(rFreq);
          for(int i=0;i<len/4*4;i+=4)
          {
              //保存結果
              float32_t result[4] = {0.0};
              float32x4_t vresult = {0.0};
              // IPP_2PI*rFreq*i + phase
              float32_t x[4] = {i,i+1,i+2,i+3};
              float32x4_t vx = vld1q_f32(x);
              vx = vaddq_f32(vphase,vmulq_n_f32(vmulq_f32(vrFreq,vx),IPP_2PI));
              vst1q_f32(x,vx);
              ippsCos_32f_A24(x,result,4);
              vresult = vld1q_f32(result);
              vst1q_f32(pDst+i,vmulq_n_f32(vresult,magn));
          }
          for(int i=len/4*4;i<len;i++)
          {
              float32_t result = 0.0;
              Ipp32f x = IPP_2PI*rFreq*i + (*pPhase);
              ippsCos_32f_A24_s(&x,&result,1);
              *(pDst+i) = result * magn;  //類型強轉
          }
          return ippStsNoErr;
      }
      
      IppStatus ippsTriangle_32fc(Ipp32fc* pDst, int len, Ipp32f magn, Ipp32f rFreq, 
      Ipp32f asym, Ipp32f* pPhase)
      {
          if(rFreq<0.0 || (rFreq-0.5)>EPSILON)
              return ippStsToneFreqErr;
          if(*pPhase<0.0 || (*pPhase-IPP_2PI)>=EPSILON)
              return ippStsTonePhaseErr;
          if(magn<=0.0)
              return ippStsToneMagnErr;
          
          float32x4_t vH = vdupq_n_f32(IPP_PI+asym);
          printf("h %f\n",IPP_PI+asym);
      
          float32x4_t vPI = vdupq_n_f32(IPP_PI);
          float32x4_t v2PI = vdupq_n_f32(IPP_2PI);
          float32x4_t v2PI_H = vsubq_f32(v2PI,vH);
          float32x4_t vphase = vdupq_n_f32(*pPhase);
          float32x4_t vrFreq= vdupq_n_f32(rFreq);
          for(int i=0;i<len/4*4;i+=4)
          {
              float32_t x[4] = {i,i+1,i+2,i+3};
              float32x4_t vx = vld1q_f32(x);
              // 計算位置 x = 2*PI*rFreq*n + phase
              vx = vaddq_f32(vphase,vmulq_n_f32(vmulq_f32(vrFreq,vx),IPP_2PI));
              //因為x>0 可以轉換到 [0,2π]
              //類型轉換向下取整,注意這里是浮點數除法
              float32x4_t vk = vcvtq_f32_s32(vcvtq_s32_f32(vdivq_f32_neon(vx,v2PI)));   
              vx = vfmsq_f32(vx,v2PI,vk);
      
              printf("x %f\n",vgetq_lane_f32(vx,0));
              
              vst1q_f32(x,vx);
              // for(int i=0;i<4;i++)
              //     printf("x %f\n",x[i]);
              //保存結果包含實部虛部
              float32x4x2_t vresult;
              //實部
              float32x4_t vresultRE;
              float32x4_t vresultRE1;
              float32x4_t vresultRE2;
              //虛部
              float32x4_t vresultIM;
              float32x4_t vresultIM_1;
              float32x4_t vresultIM1;
              float32x4_t vresultIM2;
              float32x4_t vresultIM3;
              //對實部和虛部所有區間都計算最后根據 x 的范圍取舍
              //計算實部
              //[0,H] -2*x/H + 1 
              vresultRE1 = vaddq_f32(vdupq_n_f32(1.0),vdivq_f32_neon(vmulq_n_f32(vx,-2),vH));
              //[H,2π] (2(x-PI)-H)/(2PI-H)
              vresultRE2 = vdivq_f32_neon(vsubq_f32(vmulq_n_f32(vsubq_f32(vx,vPI),2),vH),v2PI_H);
      
              //計算虛部
              //[0,π-H/2] 2*x/(2PI-H)
              vresultIM1 = vdivq_f32_neon(vmulq_n_f32(vx,2),v2PI_H);
              //[π-H/2,π+H/2] -2*(x-PI)/H
              vresultIM2 = vdivq_f32_neon(vmulq_n_f32(vsubq_f32(vx,vPI),-2),vH);
              //[π+H/2,2π] 2(x-2PI)/(2PI-H)
              vresultIM3 = vdivq_f32_neon(vmulq_n_f32(vsubq_f32(vx,v2PI),2),v2PI_H);
      
              //實部取舍
              uint32x4_t vcompare = vcltq_f32(vx,vH);
          
              printf("c %u\n",vgetq_lane_u32(vcompare,0));
      
              vresultRE = vbslq_f32(vcompare,vresultRE1,vresultRE2);
              vresultRE = vmulq_n_f32(vresultRE,magn);
              //虛部取舍
              float32x4_t v2_H = vdivq_f32_neon(vH,vdupq_n_f32(2));
              // x< π-H/2 取 vresultIM1,否則取 vresultIM2
              vcompare = vcltq_f32(vx,vsubq_f32(vPI,v2_H));
              vresultIM_1 = vbslq_f32(vcompare,vresultIM1,vresultIM2);
              // x> π+H/2 取 vresultIM3,否則取 vresultIM
              vcompare = vcgtq_f32(vx,vaddq_f32(vPI,v2_H));
              vresultIM = vbslq_f32(vcompare,vresultIM3,vresultIM_1);
              vresultIM = vmulq_n_f32(vresultIM,magn);
      
              vresult.val[0] = vresultRE;
              vresult.val[1] = vresultIM;
              printf("re %f\n",vgetq_lane_f32(vresultRE,0));
              printf("im %f\n",vgetq_lane_f32(vresultIM,0));
              vst2q_f32((float32_t*)(pDst+i),vresult);
          }
          Ipp32f H = IPP_PI+asym;
          for(int i=len/4*4;i<len;i++)
          {
              Ipp32f x = IPP_2PI*rFreq*i + *pPhase;
              //x 需要轉換到 [0,2π]
              Ipp32s k = x/IPP_2PI;
              x -= k*IPP_2PI;
              Ipp32fc result;
              // printf("x %f\n",x);
              //計算實部
              if(x>=0 && (H-x)>=EPSILON)  
                   result.re = -2*x/H + 1.0;  // -2*x/H + 1 
              else    
                  result.re = (2*(x-IPP_PI)-H)/(IPP_2PI-H);// (2(x-PI)-H)/(2PI-H)
              //計算虛部
              if(x>=0.0 && ((IPP_2PI-H)/2)-x >= EPSILON)  
                  result.im = 2*x/(IPP_2PI-H);    // 2*x/(2PI-H)
              else if(x-(IPP_2PI-H)/2 >=EPSILON && (IPP_2PI+H)/2-x>=EPSILON)
                  result.im = -2*(x-IPP_PI)/H;    //-2*(x-PI)/H
              else
                  result.im = 2*(x-IPP_2PI)/(IPP_2PI-H);  //2(x-2PI)/(2PI-H)
              result.re *= magn;
              result.im *= magn;
              *(pDst+i) = result;
          }
      
          return ippStsNoErr;
      }
      

      測試代碼

      #include <stdio.h>
      #include "../include/ipp.h"
      #include "../include/ipp_s.h"
      
      #define N 5
      
      int main()
      {
          Ipp32f src[N] = {-20.055, -5.456, 7.835, 11.89, 18.76};
          Ipp32f dst[N] ={0.0};
          for(int i=0;i<N;i++)
          {
              ippsExp_32f_A24_s(&src[i],&dst[i],1);
              printf("result %f\n",dst[i]);
          }
          printf("======\n");
          ippsExp_32f_A24(src,dst,N);
          for(int i=0;i<N;i++)
              printf("result %f\n",dst[i]);
          printf("======\n");
      
          for(int i=0;i<N;i++)
          {
              ippsCos_32f_A24_s(&src[i],&dst[i],1);
              printf("result %f\n",dst[i]);
          }
          printf("======\n");
          ippsCos_32f_A24(src,dst,N);
          for(int i=0;i<N;i++)
              printf("result %f\n",dst[i]);
          printf("======\n");
      
          for(int i=0;i<N;i++)
          {
              ippsSin_32f_A24_s(&src[i],&dst[i],1);
              printf("result %f\n",dst[i]);
          }
          printf("======\n");
          ippsSin_32f_A24(src,dst,N);
          for(int i=0;i<N;i++)
              printf("result %f\n",dst[i]);
          printf("======\n");
      
          Ipp32f phase = 5.3865;
          ippsTone_32f_s(dst,N,3,0.2456,&phase,ippAlgHintFast);
          for(int i=0;i<N;i++)
              printf("result %f\n",dst[i]);
          printf("======\n");
          ippsTone_32f(dst,N,3,0.2456,&phase,ippAlgHintFast);
          for(int i=0;i<N;i++)
              printf("result %f\n",dst[i]);
          printf("======\n");
      
          Ipp32fc dst1[N];
          ippsTriangle_32fc_s(dst1,N,3,0.2456,1.47,&phase);
          for(int i=0;i<N;i++)
              printf("result %f %f\n",dst1[i].re,dst1[i].im);
          printf("======\n");
      
          ippsTriangle_32fc(dst1,N,3,0.2456,1.47,&phase);
          for(int i=0;i<N;i++)
              printf("result %f %f\n",dst1[i].re,dst1[i].im);
          printf("======\n");
      
          return 0;
      }
      

      4. Intel IPP庫測試

      官網安裝教程:https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/hpc-toolkit-download.html?packages=hpc-toolkit&hpc-toolkit-os=linux&hpc-toolkit-lin=offline

      運行前設置環境變量

      source /opt/intel/oneapi/setvars.sh
      

      編譯命令

      icx test.c -o test -lippvm -lippcore -lipps
      

      5. 測試數據

      函數 串行單精度 FIELD3 串行雙精度 FIELD5 向量化單精度 FIELD7 intel IPP FIELD9
      ippsExp_32f_A24 1.976372 0 1.976372 0
      0.004269 0.00427 0.004269 0.004271
      2527.535645 2527.535645 2527.535645 2527.535645
      145801.3281 145801.3438 145801.3281 145801.3438
      140399216 140399184 140399184 140399184
      ippsCos_32f_A24 0.357278 0.357278 0.357279 0.357278
      0.676947 0.67695 0.676952 0.67695
      0.01898 0.01898 0.018981 0.01898
      0.77985 0.77985 0.779854 0.77985
      0.995996 0.995993 0.995993 0.995993
      ippsSin_32f_A24 -0.933998 -0.933998 -0.933998 -0.933998
      0.73603 0.736029 0.73603 0.736029
      0.99982 0.99982 0.99982 0.99982
      -0.625968 -0.625967 -0.625965 -0.625966
      -0.089438 -0.089436 -0.089436 -0.089436
      ippsTone_32f 1.872611 1.872609 1.872612 1.872609
      2.394652 2.394654 2.394655 2.394655
      -1.740218 -1.74022 -1.74022 -1.74022
      -2.490863 -2.490862 -2.490861 -2.490863
      1.602509 1.602513 1.602509 1.602513
      ippsTriangle_32fc 實部 虛部 實部 虛部 實部 虛部 實部 虛部
      -0.218555 -2.920779 -0.218555 -2.920779 -0.218555 -2.920779
      2.158905 2.320416 2.158905 2.320415 2.158905 2.320415
      0.15116 1.238589 0.15116 1.238589 0.15116 1.238588
      -1.856586 -0.769157 -1.856585 -0.769157 -1.856585 -0.769157
      -0.615485 -2.776901 -0.615485 -2.776901 -0.615484 -2.776901

      6. 結果分析

      (1) 用單精度計算的結果可能誤差較大,無法保證接口A24要求單精度十進制6位數的準確,建議使用雙精度計算轉換成單精度結果;

      (2) NEON 提供有 vdiv 除法接口,比使用倒數的自定義版本精度更高。

      posted @ 2024-12-17 17:34  安洛8  閱讀(308)  評論(0)    收藏  舉報
      主站蜘蛛池模板: 富源县| 日本丶国产丶欧美色综合| 干中文字幕| 伊人成色综合人夜夜久久| 日韩欧美在线综合网另类| 国产午夜精品无码一区二区 | 中文字幕亚洲精品人妻| 成人又黄又爽又色的视频| 日本大片在线看黄a∨免费| 精品久久久中文字幕一区| а∨天堂一区中文字幕| 久久久久免费看黄a片app| 91精品国产蜜臀在线观看| 国产黄大片在线观看画质优化| 999久久久免费精品播放| 精品日韩人妻中文字幕| 亚洲人成电影在线播放| 成人免费无码大片A毛片抽搐色欲| 精品国产自在久久现线拍| 日韩少妇人妻vs中文字幕| 亚洲最大成人av在线天堂网| 蜜桃av亚洲第一区二区| 安平县| 在线a亚洲老鸭窝天堂| 国产一区二区三区小说| 91香蕉国产亚洲一二三区| 亚洲欧美偷国产日韩| 日本熟妇浓毛hdsex| 亚洲老熟女一区二区三区| 麻豆a级片| 欧美www在线观看| 日本道播放一区二区三区| 99久久国产宗和精品1上映| 亚洲综合高清一区二区三区| 不卡国产一区二区三区| 五月天国产成人AV免费观看| 国产精品一区二区久久岳| 黑人大群体交免费视频| 亚洲天堂一区二区三区三州 | 狠狠色综合tv久久久久久| 亚洲综合网国产精品一区|