<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      Yuan2.0代碼主要結構概覽及三種并行方式實現

      該代碼結構如下圖所示:

      在initialize_megatron初始化megatron的過程中,有關于數據并行、流水線并行、張量并行的實現,簡介及其實現如下:

      模型分布式環境初始化:

      以兩臺分別有8GPU服務器為例,訓練具有12層的transformer layers

      圖一

      圖二

         本示例將模型縱向切割為4部分,每部分3layers,實現pipeline-parallel(流水線并行),將模型橫向切割實現tensor-parallel(向量并行),把圖二中的“123層”切割成兩部分。

       

      圖三

      上圖說明了以model1為例,如何切割一個模型為八個部分,分別放入八個gpu的過程。

      一個完整的模型model1的含義:

      縱向三刀,把transformer layers的一共12層,切割成了四個部分,每個部分3layers,其目的是實現pipeline-parallel;【需要pipeline_model_parallel_size=4

      橫向的一刀,代表了tensor-parallel,是把(123)直到(101112)這樣的每三層layers,都切割成上下兩個部分。【需要tensor_model_parallel_size=2

      tensor model-parallel groups:代表有多少個包含向量并行的groups,由圖可知:

      model1:[0, 1; 8, 9; 4, 5; 12, 13]

      Model2:[2, 3; 10, 11; 6, 7; 14, 15]

      對應代碼示例中的:

      8 tensor model-parallel groups:
          [g0, g1], [g2, g3], [g4, g5], [g6, g7], [g8, g9], [g10, g11], [g12, g13], [g14, g15]

       

      pipeline model-parallel groups:代表有多少個包含流水線并行的模型,由圖可知:

      模型model1先縱向切割為4份為流水線并行關系,然后橫向切分,故有兩個groups,第一個,[0,4,8,12],第二個:[1,5,9,13]

      同理model2。

       

      data_parallel groups:數據并行groups,數據并行,是”含有相同參數的模型的子塊“之間進行數據并行,有圖可以看到兩臺服務器中的模型結構,(02相同),(13相同),46相同),對應代碼示例中的:

      8 data_parallel groups:
          [g0, g2], [g1, g3], [g4, g6], [g5, g7], [g8, g10], [g9, g11], [g12, g14], [g13, g15]

      代碼實現:

      initialize_model_parallel(
          tensor_model_parallel_size: int = 1,
          pipeline_model_parallel_size: int = 1,
          virtual_pipeline_model_parallel_size: Optional[int] = None,
          pipeline_model_parallel_split_rank: Optional[int] = None,
          use_fp8: bool = False,
      )
      tensor_model_parallel_size = 4
      pipeline_model_parallel_size = 2
      world_size = 16
      data_parallel_size: int = world_size // (tensor_model_parallel_size * pipeline_model_parallel_size)
      num_tensor_model_parallel_groups: int = world_size // tensor_model_parallel_size = 4
      num_pipeline_model_parallel_groups: int = world_size // pipeline_model_parallel_size = 8
      
      # Build the data-parallel groups.
      #構建數據并行groups
      all_data_parallel_group_ranks = []
      for i in range(pipeline_model_parallel_size):
          start_rank = i * num_pipeline_model_parallel_groups
          end_rank = (i + 1) * num_pipeline_model_parallel_groups
      for i in range(pipeline_model_parallel_size):
          start_rank = i * num_pipeline_model_parallel_groups
          end_rank = (i + 1) * num_pipeline_model_parallel_groups
          for j in range(tensor_model_parallel_size):
              ranks = range(start_rank + j, end_rank, tensor_model_parallel_size)
              all_data_parallel_group_ranks.append(list(ranks))
              group = torch.distributed.new_group(ranks)
              group_gloo = torch.distributed.new_group(ranks, backend="gloo")
              if rank in ranks:
                  _DATA_PARALLEL_GROUP = group
                  _DATA_PARALLEL_GROUP_GLOO = group_gloo
                  _DATA_PARALLEL_GLOBAL_RANKS = ranks
      print(all_data_parallel_group_ranks)
      
      all_data_parallel_group_ranks
      [[0, 2], [1, 3], [4, 6], [5, 7], [8, 10], [9, 11], [12, 14], [13, 15]]
      # Build the model-parallel groups.
      #構建模型并行占用groups,即模型占用了哪些GPU
      global _MODEL_PARALLEL_GROUP
      assert _MODEL_PARALLEL_GROUP is None, 'model parallel group is already initialized'
      for i in range(data_parallel_size):
          ranks = [data_parallel_group_ranks[i] for data_parallel_group_ranks in all_data_parallel_group_ranks]
          group = torch.distributed.new_group(ranks)
          print(ranks)
          if rank in ranks:
              _MODEL_PARALLEL_GROUP = group
      
      ranks
      [0, 1, 4, 5, 8, 9, 12, 13]
      [2, 3, 6, 7, 10, 11, 14, 15]
      # Build the tensor model-parallel groups.
      #構建張量并行groups
      global _TENSOR_MODEL_PARALLEL_GROUP
      assert _TENSOR_MODEL_PARALLEL_GROUP is None, 'tensor model parallel group is already initialized'
      for i in range(num_tensor_model_parallel_groups):
          ranks = range(i * tensor_model_parallel_size, (i + 1) * tensor_model_parallel_size)
          group = torch.distributed.new_group(ranks)
          print(ranks)
          if rank in ranks:
              _TENSOR_MODEL_PARALLEL_GROUP = group
      
      [0, 1]
      [2, 3]
      [4, 5]
      [6, 7]
      [8, 9]
      [10, 11]
      [12, 13]
      [14, 15]
      # Build the pipeline model-parallel groups and embedding groups
      #構建流水線并行groups和embedding groups
      for i in range(num_pipeline_model_parallel_groups):
          ranks = range(i, world_size, num_pipeline_model_parallel_groups)
          print(ranks)
          group = torch.distributed.new_group(ranks)
          if rank in ranks:
              _PIPELINE_MODEL_PARALLEL_GROUP = group
              _PIPELINE_GLOBAL_RANKS = ranks
          # Setup embedding group (to exchange gradients between
          # first and last stages).
          if len(ranks) > 1:
              embedding_ranks = [ranks[0], ranks[-1]]
              position_embedding_ranks = [ranks[0]]
              print(embedding_ranks)
              print(position_embedding_ranks)
              if pipeline_model_parallel_split_rank is not None:
                  if ranks[pipeline_model_parallel_split_rank] not in embedding_ranks:
                      embedding_ranks = [ranks[0], ranks[pipeline_model_parallel_split_rank], ranks[-1]]
                  if ranks[pipeline_model_parallel_split_rank] not in position_embedding_ranks:
                      position_embedding_ranks = [ranks[0], ranks[pipeline_model_parallel_split_rank]]
          else:
              embedding_ranks = ranks
              position_embedding_ranks = ranks
      
          group = torch.distributed.new_group(embedding_ranks)
          if rank in embedding_ranks:
              _EMBEDDING_GROUP = group
          if rank in ranks:
              _EMBEDDING_GLOBAL_RANKS = embedding_ranks
      
          group = torch.distributed.new_group(position_embedding_ranks)
          if rank in position_embedding_ranks:
              _POSITION_EMBEDDING_GROUP = group
          if rank in ranks:
              _POSITION_EMBEDDING_GLOBAL_RANKS = position_embedding_ranks
      運行結果:
      [0, 4, 8, 12]
      [0, 12]
      [0]
      [1, 5, 9, 13]
      [1, 13]
      [1]
      [2, 6, 10, 14]
      [2, 14]
      [2]
      [3, 7, 11, 15]
      [3, 15]
      [3]

      參考:

      https://zhuanlan.zhihu.com/p/470279673

       

      模型分布式環境初始化:

      以兩臺分別有8GPU服務器為例,訓練具有12層的transformer layers

      posted @ 2024-01-19 17:46  sunshine丶23  閱讀(112)  評論(0)    收藏  舉報
      主站蜘蛛池模板: 日韩精品人妻黄色一级片| 忘忧草社区在线www| 四房播色综合久久婷婷| 国产精品欧美福利久久| 狠狠综合久久av一区二| 日产精品久久久久久久| 国产国产人免费人成免费| 成人午夜视频一区二区无码| 国产成人精品无码一区二区老年人| 国产在线乱子伦一区二区| 99re6这里有精品热视频| 精品国产一区二区在线视| 国产精品午夜av福利| 加勒比中文字幕无码一区| 久久亚洲av成人一二三区| 国产精品中文第一字幕| 湖南省| 国产精品老熟女一区二区| 亚洲区欧美区综合区自拍区| 白嫩少妇激情无码| 国产绿帽在线视频看| 精品亚洲欧美高清不卡高清 | 久久精品熟女亚洲av艳妇| 亚洲偷自拍另类一区二区| 国产亚洲精品久久久久秋霞| 精品一区二区三区女性色| 九九热视频免费在线播放| 亚洲精品色在线网站| 精品自拍偷拍一区二区三区| 99热精品毛片全部国产无缓冲 | 最新偷拍一区二区三区| 亚洲天码中文字幕第一页| 性欧美VIDEOFREE高清大喷水| 99欧美日本一区二区留学生 | 丝袜美腿亚洲综合第一页| 久久精品无码一区二区小草 | 精品在线观看视频二区| 中文字幕一区二区三区久久蜜桃 | 亚洲一区二区三区激情视频| 欧美国产日产一区二区| 国产精品午夜剧场免费观看|