<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      機器視覺-嘗試使用directml運行yolov8

      DirectML 是什么

      DirectML 是微軟提供的類似于CUDA的基礎計算框架, 不同于CUDA必須使用N卡GPU, DirectML要求顯卡支持DirectX 12即可, 所以AMD和intel的核顯也能支持.
      Pytorch 要使用 DirectML 作為計算的backend, 需要安裝python庫 torch-directml, 經測試torch-directml和DirectML 并不能支持所有的算子, 所以 暫時不能支持yolov8 .

      準備Python測試環境

      ## 安裝 python=3.9 虛擬環境
      conda.exe create --name yolo_directml python=3.9
      conda.exe activate yolo_directml
      

      安裝 torch-directml

      該包更新的不是很頻繁, 更新歷史 https://pypi.org/project/torch-directml/#history
      最后一個版本是 0.2.0.dev230426, 對應的pytorch 為 2.0; 次新版本是 0.1.13.1.dev230413, 對應的pytorch 為 1.13.1.
      這兩個版本在后面yolov8 使用中經常會使Windows藍屏.

      先不要安裝 pytorch, 直接安裝安裝 torch-directml將會自動安裝依賴 pytorch 版本.

      pip install torch-directml==0.1.13.1.dev230413
      

      檢查環境

      檢查安裝的 python 包:

      pip list 
      

      檢查torch和torch_directml的device輸出:

      import torch
      torch.cuda.is_available()
      
      import torch_directml
      torch_directml.is_available()
      torch_directml.device() 
      # 正常輸出為: device(type='privateuseone', index=0)
      

      安裝 yolov8

      pip install ultralytics
      

      檢查yolov8安裝情況:

      import ultralytics
      ultralytics.checks()
      

      改造 ultralytics\utils\torch_utils.py 文件

      yolov8 僅支持device為cpu或0, 即CPU或cuda, 所以需增加一個device=dml的選項, 以啟用DirectML 計算后臺, 需要修改 torch_utils.py 文件:

      # Ultralytics YOLO ??, AGPL-3.0 license
      
      import math
      import os
      import random
      import time
      from contextlib import contextmanager
      from copy import deepcopy
      from pathlib import Path
      from typing import Union
      
      import numpy as np
      import torch
      import torch.distributed as dist
      import torch.nn as nn
      import torch.nn.functional as F
      import torchvision
      
      from ultralytics.utils import DEFAULT_CFG_DICT, DEFAULT_CFG_KEYS, LOGGER, __version__
      from ultralytics.utils.checks import PYTHON_VERSION, check_version
      
      try:
          import thop
      except ImportError:
          thop = None
      
      TORCH_1_9 = check_version(torch.__version__, "1.9.0")
      TORCH_2_0 = check_version(torch.__version__, "2.0.0")
      TORCHVISION_0_10 = check_version(torchvision.__version__, "0.10.0")
      TORCHVISION_0_11 = check_version(torchvision.__version__, "0.11.0")
      TORCHVISION_0_13 = check_version(torchvision.__version__, "0.13.0")
      
      
      @contextmanager
      def torch_distributed_zero_first(local_rank: int):
          """Decorator to make all processes in distributed training wait for each local_master to do something."""
          initialized = torch.distributed.is_available() and torch.distributed.is_initialized()
          if initialized and local_rank not in (-1, 0):
              dist.barrier(device_ids=[local_rank])
          yield
          if initialized and local_rank == 0:
              dist.barrier(device_ids=[0])
      
      
      def smart_inference_mode():
          """Applies torch.inference_mode() decorator if torch>=1.9.0 else torch.no_grad() decorator."""
      
          def decorate(fn):
              """Applies appropriate torch decorator for inference mode based on torch version."""
              if TORCH_1_9 and torch.is_inference_mode_enabled():
                  return fn  # already in inference_mode, act as a pass-through
              else:
                  return (torch.inference_mode if TORCH_1_9 else torch.no_grad)()(fn)
      
          return decorate
      
      
      def get_cpu_info():
          """Return a string with system CPU information, i.e. 'Apple M2'."""
          import cpuinfo  # pip install py-cpuinfo
      
          k = "brand_raw", "hardware_raw", "arch_string_raw"  # info keys sorted by preference (not all keys always available)
          info = cpuinfo.get_cpu_info()  # info dict
          string = info.get(k[0] if k[0] in info else k[1] if k[1] in info else k[2], "unknown")
          return string.replace("(R)", "").replace("CPU ", "").replace("@ ", "")
      
      
      def select_device(device="", batch=0, newline=False, verbose=True):
          """
          Selects the appropriate PyTorch device based on the provided arguments.
      
          The function takes a string specifying the device or a torch.device object and returns a torch.device object
          representing the selected device. The function also validates the number of available devices and raises an
          exception if the requested device(s) are not available.
      
          Args:
              device (str | torch.device, optional): Device string or torch.device object.
                  Options are 'None', 'cpu', or 'cuda', or '0' or '0,1,2,3'. Defaults to an empty string, which auto-selects
                  the first available GPU, or CPU if no GPU is available.
              batch (int, optional): Batch size being used in your model. Defaults to 0.
              newline (bool, optional): If True, adds a newline at the end of the log string. Defaults to False.
              verbose (bool, optional): If True, logs the device information. Defaults to True.
      
          Returns:
              (torch.device): Selected device.
      
          Raises:
              ValueError: If the specified device is not available or if the batch size is not a multiple of the number of
                  devices when using multiple GPUs.
      
          Examples:
              >>> select_device('cuda:0')
              device(type='cuda', index=0)
      
              >>> select_device('cpu')
              device(type='cpu')
      
          Note:
              Sets the 'CUDA_VISIBLE_DEVICES' environment variable for specifying which GPUs to use.
          """
      
          if isinstance(device, torch.device):
              return device
      
          s = f"Ultralytics YOLOv{__version__} ?? Python-{PYTHON_VERSION} torch-{torch.__version__} "
          device = str(device).lower()
          for remove in "cuda:", "none", "(", ")", "[", "]", "'", " ":
              device = device.replace(remove, "")  # to string, 'cuda:0' -> '0' and '(0, 1)' -> '0,1'
          cpu = device == "cpu"
          dml = device == "dml"
          mps = device in ("mps", "mps:0")  # Apple Metal Performance Shaders (MPS)
          if cpu or mps or dml:
              os.environ["CUDA_VISIBLE_DEVICES"] = "-1"  # force torch.cuda.is_available() = False
          elif device:  # non-cpu device requested
              if device == "cuda":
                  device = "0"
              visible = os.environ.get("CUDA_VISIBLE_DEVICES", None)
              os.environ["CUDA_VISIBLE_DEVICES"] = device  # set environment variable - must be before assert is_available()
              if not (torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(",", ""))):
                  LOGGER.info(s)
                  install = (
                      "See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no "
                      "CUDA devices are seen by torch.\n"
                      if torch.cuda.device_count() == 0
                      else ""
                  )
                  raise ValueError(
                      f"Invalid CUDA 'device={device}' requested."
                      f" Use 'device=cpu' or pass valid CUDA device(s) if available,"
                      f" i.e. 'device=0' or 'device=0,1,2,3' for Multi-GPU.\n"
                      f"\ntorch.cuda.is_available(): {torch.cuda.is_available()}"
                      f"\ntorch.cuda.device_count(): {torch.cuda.device_count()}"
                      f"\nos.environ['CUDA_VISIBLE_DEVICES']: {visible}\n"
                      f"{install}"
                  )
      
          if not cpu and not mps and not dml and torch.cuda.is_available():  # prefer GPU if available
              devices = device.split(",") if device else "0"  # range(torch.cuda.device_count())  # i.e. 0,1,6,7
              n = len(devices)  # device count
              if n > 1 and batch > 0 and batch % n != 0:  # check batch_size is divisible by device_count
                  raise ValueError(
                      f"'batch={batch}' must be a multiple of GPU count {n}. Try 'batch={batch // n * n}' or "
                      f"'batch={batch // n * n + n}', the nearest batch sizes evenly divisible by {n}."
                  )
              space = " " * (len(s) + 1)
              for i, d in enumerate(devices):
                  p = torch.cuda.get_device_properties(i)
                  s += f"{'' if i == 0 else space}CUDA:w0obha2h00 ({p.name}, {p.total_memory / (1 << 20):.0f}MiB)\n"  # bytes to MB
              arg = "cuda:0"
          elif mps and TORCH_2_0 and torch.backends.mps.is_available():
              # Prefer MPS if available
              s += f"MPS ({get_cpu_info()})\n"
              arg = "mps"
          else:  # revert to CPU
              s += f"CPU ({get_cpu_info()})\n"
              arg = "cpu"        
          if verbose:
              LOGGER.info(s if newline else s.rstrip())
          if dml:
              import torch_directml
              return torch_directml.device()
          else:    
              return torch.device(arg)
      
      
      def time_sync():
          """PyTorch-accurate time."""
          if torch.cuda.is_available():
              torch.cuda.synchronize()
          return time.time()
      
      
      def fuse_conv_and_bn(conv, bn):
          """Fuse Conv2d() and BatchNorm2d() layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/."""
          fusedconv = (
              nn.Conv2d(
                  conv.in_channels,
                  conv.out_channels,
                  kernel_size=conv.kernel_size,
                  stride=conv.stride,
                  padding=conv.padding,
                  dilation=conv.dilation,
                  groups=conv.groups,
                  bias=True,
              )
              .requires_grad_(False)
              .to(conv.weight.device)
          )
      
          # Prepare filters
          w_conv = conv.weight.clone().view(conv.out_channels, -1)
          w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
          fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
      
          # Prepare spatial bias
          b_conv = torch.zeros(conv.weight.shape[0], device=conv.weight.device) if conv.bias is None else conv.bias
          b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
          fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
      
          return fusedconv
      
      
      def fuse_deconv_and_bn(deconv, bn):
          """Fuse ConvTranspose2d() and BatchNorm2d() layers."""
          fuseddconv = (
              nn.ConvTranspose2d(
                  deconv.in_channels,
                  deconv.out_channels,
                  kernel_size=deconv.kernel_size,
                  stride=deconv.stride,
                  padding=deconv.padding,
                  output_padding=deconv.output_padding,
                  dilation=deconv.dilation,
                  groups=deconv.groups,
                  bias=True,
              )
              .requires_grad_(False)
              .to(deconv.weight.device)
          )
      
          # Prepare filters
          w_deconv = deconv.weight.clone().view(deconv.out_channels, -1)
          w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
          fuseddconv.weight.copy_(torch.mm(w_bn, w_deconv).view(fuseddconv.weight.shape))
      
          # Prepare spatial bias
          b_conv = torch.zeros(deconv.weight.shape[1], device=deconv.weight.device) if deconv.bias is None else deconv.bias
          b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
          fuseddconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
      
          return fuseddconv
      
      
      def model_info(model, detailed=False, verbose=True, imgsz=640):
          """
          Model information.
      
          imgsz may be int or list, i.e. imgsz=640 or imgsz=[640, 320].
          """
          if not verbose:
              return
          n_p = get_num_params(model)  # number of parameters
          n_g = get_num_gradients(model)  # number of gradients
          n_l = len(list(model.modules()))  # number of layers
          if detailed:
              LOGGER.info(
                  f"{'layer':>5} {'name':>40} {'gradient':>9} {'parameters':>12} {'shape':>20} {'mu':>10} {'sigma':>10}"
              )
              for i, (name, p) in enumerate(model.named_parameters()):
                  name = name.replace("module_list.", "")
                  LOGGER.info(
                      "%5g %40s %9s %12g %20s %10.3g %10.3g %10s"
                      % (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std(), p.dtype)
                  )
      
          flops = get_flops(model, imgsz)
          fused = " (fused)" if getattr(model, "is_fused", lambda: False)() else ""
          fs = f", {flops:.1f} GFLOPs" if flops else ""
          yaml_file = getattr(model, "yaml_file", "") or getattr(model, "yaml", {}).get("yaml_file", "")
          model_name = Path(yaml_file).stem.replace("yolo", "YOLO") or "Model"
          LOGGER.info(f"{model_name} summary{fused}: {n_l} layers, {n_p} parameters, {n_g} gradients{fs}")
          return n_l, n_p, n_g, flops
      
      
      def get_num_params(model):
          """Return the total number of parameters in a YOLO model."""
          return sum(x.numel() for x in model.parameters())
      
      
      def get_num_gradients(model):
          """Return the total number of parameters with gradients in a YOLO model."""
          return sum(x.numel() for x in model.parameters() if x.requires_grad)
      
      
      def model_info_for_loggers(trainer):
          """
          Return model info dict with useful model information.
      
          Example:
              YOLOv8n info for loggers
              ```python
              results = {'model/parameters': 3151904,
                         'model/GFLOPs': 8.746,
                         'model/speed_ONNX(ms)': 41.244,
                         'model/speed_TensorRT(ms)': 3.211,
                         'model/speed_PyTorch(ms)': 18.755}
              ```
          """
          if trainer.args.profile:  # profile ONNX and TensorRT times
              from ultralytics.utils.benchmarks import ProfileModels
      
              results = ProfileModels([trainer.last], device=trainer.device).profile()[0]
              results.pop("model/name")
          else:  # only return PyTorch times from most recent validation
              results = {
                  "model/parameters": get_num_params(trainer.model),
                  "model/GFLOPs": round(get_flops(trainer.model), 3),
              }
          results["model/speed_PyTorch(ms)"] = round(trainer.validator.speed["inference"], 3)
          return results
      
      
      def get_flops(model, imgsz=640):
          """Return a YOLO model's FLOPs."""
          if not thop:
              return 0.0  # if not installed return 0.0 GFLOPs
      
          try:
              model = de_parallel(model)
              p = next(model.parameters())
              if not isinstance(imgsz, list):
                  imgsz = [imgsz, imgsz]  # expand if int/float
              try:
                  # Use stride size for input tensor
                  stride = max(int(model.stride.max()), 32) if hasattr(model, "stride") else 32  # max stride
                  im = torch.empty((1, p.shape[1], stride, stride), device=p.device)  # input image in BCHW format
                  flops = thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2  # stride GFLOPs
                  return flops * imgsz[0] / stride * imgsz[1] / stride  # imgsz GFLOPs
              except Exception:
                  # Use actual image size for input tensor (i.e. required for RTDETR models)
                  im = torch.empty((1, p.shape[1], *imgsz), device=p.device)  # input image in BCHW format
                  return thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2  # imgsz GFLOPs
          except Exception:
              return 0.0
      
      
      def get_flops_with_torch_profiler(model, imgsz=640):
          """Compute model FLOPs (thop alternative)."""
          if TORCH_2_0:
              model = de_parallel(model)
              p = next(model.parameters())
              stride = (max(int(model.stride.max()), 32) if hasattr(model, "stride") else 32) * 2  # max stride
              im = torch.zeros((1, p.shape[1], stride, stride), device=p.device)  # input image in BCHW format
              with torch.profiler.profile(with_flops=True) as prof:
                  model(im)
              flops = sum(x.flops for x in prof.key_averages()) / 1e9
              imgsz = imgsz if isinstance(imgsz, list) else [imgsz, imgsz]  # expand if int/float
              flops = flops * imgsz[0] / stride * imgsz[1] / stride  # 640x640 GFLOPs
              return flops
          return 0
      
      
      def initialize_weights(model):
          """Initialize model weights to random values."""
          for m in model.modules():
              t = type(m)
              if t is nn.Conv2d:
                  pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
              elif t is nn.BatchNorm2d:
                  m.eps = 1e-3
                  m.momentum = 0.03
              elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
                  m.inplace = True
      
      
      def scale_img(img, ratio=1.0, same_shape=False, gs=32):
          """Scales and pads an image tensor of shape img(bs,3,y,x) based on given ratio and grid size gs, optionally
          retaining the original shape.
          """
          if ratio == 1.0:
              return img
          h, w = img.shape[2:]
          s = (int(h * ratio), int(w * ratio))  # new size
          img = F.interpolate(img, size=s, mode="bilinear", align_corners=False)  # resize
          if not same_shape:  # pad/crop img
              h, w = (math.ceil(x * ratio / gs) * gs for x in (h, w))
          return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)  # value = imagenet mean
      
      
      def make_divisible(x, divisor):
          """Returns nearest x divisible by divisor."""
          if isinstance(divisor, torch.Tensor):
              divisor = int(divisor.max())  # to int
          return math.ceil(x / divisor) * divisor
      
      
      def copy_attr(a, b, include=(), exclude=()):
          """Copies attributes from object 'b' to object 'a', with options to include/exclude certain attributes."""
          for k, v in b.__dict__.items():
              if (len(include) and k not in include) or k.startswith("_") or k in exclude:
                  continue
              else:
                  setattr(a, k, v)
      
      
      def get_latest_opset():
          """Return second-most (for maturity) recently supported ONNX opset by this version of torch."""
          return max(int(k[14:]) for k in vars(torch.onnx) if "symbolic_opset" in k) - 1  # opset
      
      
      def intersect_dicts(da, db, exclude=()):
          """Returns a dictionary of intersecting keys with matching shapes, excluding 'exclude' keys, using da values."""
          return {k: v for k, v in da.items() if k in db and all(x not in k for x in exclude) and v.shape == db[k].shape}
      
      
      def is_parallel(model):
          """Returns True if model is of type DP or DDP."""
          return isinstance(model, (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel))
      
      
      def de_parallel(model):
          """De-parallelize a model: returns single-GPU model if model is of type DP or DDP."""
          return model.module if is_parallel(model) else model
      
      
      def one_cycle(y1=0.0, y2=1.0, steps=100):
          """Returns a lambda function for sinusoidal ramp from y1 to y2 https://arxiv.org/pdf/1812.01187.pdf."""
          return lambda x: max((1 - math.cos(x * math.pi / steps)) / 2, 0) * (y2 - y1) + y1
      
      
      def init_seeds(seed=0, deterministic=False):
          """Initialize random number generator (RNG) seeds https://pytorch.org/docs/stable/notes/randomness.html."""
          random.seed(seed)
          np.random.seed(seed)
          torch.manual_seed(seed)
          torch.cuda.manual_seed(seed)
          torch.cuda.manual_seed_all(seed)  # for Multi-GPU, exception safe
          # torch.backends.cudnn.benchmark = True  # AutoBatch problem https://github.com/ultralytics/yolov5/issues/9287
          if deterministic:
              if TORCH_2_0:
                  torch.use_deterministic_algorithms(True, warn_only=True)  # warn if deterministic is not possible
                  torch.backends.cudnn.deterministic = True
                  os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
                  os.environ["PYTHONHASHSEED"] = str(seed)
              else:
                  LOGGER.warning("WARNING ?? Upgrade to torch>=2.0.0 for deterministic training.")
          else:
              torch.use_deterministic_algorithms(False)
              torch.backends.cudnn.deterministic = False
      
      
      class ModelEMA:
          """Updated Exponential Moving Average (EMA) from https://github.com/rwightman/pytorch-image-models
          Keeps a moving average of everything in the model state_dict (parameters and buffers)
          For EMA details see https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage
          To disable EMA set the `enabled` attribute to `False`.
          """
      
          def __init__(self, model, decay=0.9999, tau=2000, updates=0):
              """Create EMA."""
              self.ema = deepcopy(de_parallel(model)).eval()  # FP32 EMA
              self.updates = updates  # number of EMA updates
              self.decay = lambda x: decay * (1 - math.exp(-x / tau))  # decay exponential ramp (to help early epochs)
              for p in self.ema.parameters():
                  p.requires_grad_(False)
              self.enabled = True
      
          def update(self, model):
              """Update EMA parameters."""
              if self.enabled:
                  self.updates += 1
                  d = self.decay(self.updates)
      
                  msd = de_parallel(model).state_dict()  # model state_dict
                  for k, v in self.ema.state_dict().items():
                      if v.dtype.is_floating_point:  # true for FP16 and FP32
                          v *= d
                          v += (1 - d) * msd[k].detach()
                          # assert v.dtype == msd[k].dtype == torch.float32, f'{k}: EMA {v.dtype},  model {msd[k].dtype}'
      
          def update_attr(self, model, include=(), exclude=("process_group", "reducer")):
              """Updates attributes and saves stripped model with optimizer removed."""
              if self.enabled:
                  copy_attr(self.ema, model, include, exclude)
      
      
      def strip_optimizer(f: Union[str, Path] = "best.pt", s: str = "") -> None:
          """
          Strip optimizer from 'f' to finalize training, optionally save as 's'.
      
          Args:
              f (str): file path to model to strip the optimizer from. Default is 'best.pt'.
              s (str): file path to save the model with stripped optimizer to. If not provided, 'f' will be overwritten.
      
          Returns:
              None
      
          Example:
              ```python
              from pathlib import Path
              from ultralytics.utils.torch_utils import strip_optimizer
      
              for f in Path('path/to/weights').rglob('*.pt'):
                  strip_optimizer(f)
              ```
          """
          x = torch.load(f, map_location=torch.device("cpu"))
          if "model" not in x:
              LOGGER.info(f"Skipping {f}, not a valid Ultralytics model.")
              return
      
          if hasattr(x["model"], "args"):
              x["model"].args = dict(x["model"].args)  # convert from IterableSimpleNamespace to dict
          args = {**DEFAULT_CFG_DICT, **x["train_args"]} if "train_args" in x else None  # combine args
          if x.get("ema"):
              x["model"] = x["ema"]  # replace model with ema
          for k in "optimizer", "best_fitness", "ema", "updates":  # keys
              x[k] = None
          x["epoch"] = -1
          x["model"].half()  # to FP16
          for p in x["model"].parameters():
              p.requires_grad = False
          x["train_args"] = {k: v for k, v in args.items() if k in DEFAULT_CFG_KEYS}  # strip non-default keys
          # x['model'].args = x['train_args']
          torch.save(x, s or f)
          mb = os.path.getsize(s or f) / 1e6  # file size
          LOGGER.info(f"Optimizer stripped from {f},{f' saved as {s},' if s else ''} {mb:.1f}MB")
      
      
      def profile(input, ops, n=10, device=None):
          """
          Ultralytics speed, memory and FLOPs profiler.
      
          Example:
              ```python
              from ultralytics.utils.torch_utils import profile
      
              input = torch.randn(16, 3, 640, 640)
              m1 = lambda x: x * torch.sigmoid(x)
              m2 = nn.SiLU()
              profile(input, [m1, m2], n=100)  # profile over 100 iterations
              ```
          """
          results = []
          if not isinstance(device, torch.device):
              device = select_device(device)
          LOGGER.info(
              f"{'Params':>12s}{'GFLOPs':>12s}{'GPU_mem (GB)':>14s}{'forward (ms)':>14s}{'backward (ms)':>14s}"
              f"{'input':>24s}{'output':>24s}"
          )
      
          for x in input if isinstance(input, list) else [input]:
              x = x.to(device)
              x.requires_grad = True
              for m in ops if isinstance(ops, list) else [ops]:
                  m = m.to(device) if hasattr(m, "to") else m  # device
                  m = m.half() if hasattr(m, "half") and isinstance(x, torch.Tensor) and x.dtype is torch.float16 else m
                  tf, tb, t = 0, 0, [0, 0, 0]  # dt forward, backward
                  try:
                      flops = thop.profile(m, inputs=[x], verbose=False)[0] / 1e9 * 2 if thop else 0  # GFLOPs
                  except Exception:
                      flops = 0
      
                  try:
                      for _ in range(n):
                          t[0] = time_sync()
                          y = m(x)
                          t[1] = time_sync()
                          try:
                              (sum(yi.sum() for yi in y) if isinstance(y, list) else y).sum().backward()
                              t[2] = time_sync()
                          except Exception:  # no backward method
                              # print(e)  # for debug
                              t[2] = float("nan")
                          tf += (t[1] - t[0]) * 1000 / n  # ms per op forward
                          tb += (t[2] - t[1]) * 1000 / n  # ms per op backward
                      mem = torch.cuda.memory_reserved() / 1e9 if torch.cuda.is_available() else 0  # (GB)
                      s_in, s_out = (tuple(x.shape) if isinstance(x, torch.Tensor) else "list" for x in (x, y))  # shapes
                      p = sum(x.numel() for x in m.parameters()) if isinstance(m, nn.Module) else 0  # parameters
                      LOGGER.info(f"{p:12}{flops:12.4g}{mem:>14.3f}{tf:14.4g}{tb:14.4g}{str(s_in):>24s}{str(s_out):>24s}")
                      results.append([p, flops, mem, tf, tb, s_in, s_out])
                  except Exception as e:
                      LOGGER.info(e)
                      results.append(None)
                  torch.cuda.empty_cache()
          return results
      
      
      class EarlyStopping:
          """Early stopping class that stops training when a specified number of epochs have passed without improvement."""
      
          def __init__(self, patience=50):
              """
              Initialize early stopping object.
      
              Args:
                  patience (int, optional): Number of epochs to wait after fitness stops improving before stopping.
              """
              self.best_fitness = 0.0  # i.e. mAP
              self.best_epoch = 0
              self.patience = patience or float("inf")  # epochs to wait after fitness stops improving to stop
              self.possible_stop = False  # possible stop may occur next epoch
      
          def __call__(self, epoch, fitness):
              """
              Check whether to stop training.
      
              Args:
                  epoch (int): Current epoch of training
                  fitness (float): Fitness value of current epoch
      
              Returns:
                  (bool): True if training should stop, False otherwise
              """
              if fitness is None:  # check if fitness=None (happens when val=False)
                  return False
      
              if fitness >= self.best_fitness:  # >= 0 to allow for early zero-fitness stage of training
                  self.best_epoch = epoch
                  self.best_fitness = fitness
              delta = epoch - self.best_epoch  # epochs without improvement
              self.possible_stop = delta >= (self.patience - 1)  # possible stop may occur next epoch
              stop = delta >= self.patience  # stop training if patience exceeded
              if stop:
                  LOGGER.info(
                      f"Stopping training early as no improvement observed in last {self.patience} epochs. "
                      f"Best results observed at epoch {self.best_epoch}, best model saved as best.pt.\n"
                      f"To update EarlyStopping(patience={self.patience}) pass a new patience value, "
                      f"i.e. `patience=300` or use `patience=0` to disable EarlyStopping."
                  )
              return stop
      
      

      進行 yolov8 訓練

      使用命令行進行訓練, directML不能支持混合精度 AMP, 所以關閉AMP特性:

      yolo.exe task=detect mode=train val=True data=d:\data.yaml model=D:\yolov8n.pt epochs=1 workers=0 imgsz=640 seed=1 device=dml amp=False verbose=True 
      

      但在訓練過程中仍然報錯: RuntimeError: Return counts not implemented for unique operator for DirectML.

      posted @ 2024-03-03 22:39  harrychinese  閱讀(1980)  評論(2)    收藏  舉報
      主站蜘蛛池模板: 安达市| 精品亚洲国产成人av| 欧美一区二区| ww污污污网站在线看com| 亚洲国产成人精品av区按摩| 久久人人妻人人爽人人爽| 蜜臀98精品国产免费观看| 熟妇人妻激情偷爽文| 成人啪啪高潮不断观看| 中文字幕乱码视频32| 国产精品男女午夜福利片| 阳江市| 国产精品一码二码三码| 国产精品亚洲二区在线播放| 国产成人亚洲欧美二区综合| 国产精品久久毛片| 一区二区三区精品自拍视频| 亚洲最大福利视频网| 亚洲小说乱欧美另类| 曰韩无码二三区中文字幕| 国产精品中文字幕一区| 午夜福利偷拍国语对白| 亚洲欧洲自拍拍偷午夜色| 精品国产免费一区二区三区香蕉| 成人污视频| 成人无码午夜在线观看| 三河市| 国产伦精品一区二区亚洲| 一区二区三区放荡人妻| 亚洲成aⅴ人在线电影 | 久久综合综合久久综合| 日韩精品亚洲专在线电影| 国产成人精品亚洲资源| 一区二区三区无码高清视频| 久久美女夜夜骚骚免费视频| 国产中年熟女大集合| 亚洲国产日韩一区三区| 国产成人综合久久亚洲av| 华人在线亚洲欧美精品| 永久免费av网站可以直接看的| 国产乱码日韩精品一区二区 |