期望libai mae支持graph格式数据并行，流水线并行和模型并行 #259

KellyZhang2020 · 2022-04-12T00:44:47Z

No description provided.

rentainhe · 2022-04-15T06:44:01Z

好的，这个我们会陆续推进

rentainhe · 2022-04-15T06:45:21Z

MAE-pytorch迁移至MAE-oneflow的接口缺失整理 (与算子兼容计划同步)

torch.cuda.synchronize
torch.cuda.max_memory_alocated
torch.nn.parallel.DistributedDataParallel() 入参没对齐
缺少tensor.median()方法
oneflow.nn.utils.clip_grad_norm_不支持传入None

BBuf · 2022-04-18T02:38:55Z

可以稍微写详细点吗？比如贴一个没对齐或者报错的示例。

@rentainhe

rentainhe · 2022-04-18T02:42:27Z

可以稍微写详细点吗？比如贴一个没对齐或者报错的示例。

@rentainhe

好的，我这边跟用户一起整理一下

rentainhe · 2022-04-21T02:11:07Z

最小复现example

tensor.median()

import torch
x = torch.randn(1, 2, 4)
print(x.median())

import oneflow as flow
y = flow.randn(1, 2, 4)
print(y.median())

torch.cuda.synchronize
torch.cuda.max_memory_alocated

这两个应该是没有对应接口

torch.nn.parallel.DistributedDataParallel()入参没对齐

import torch
torch.nn.parallel.DistributedDataParallel()
"""
Args:
    module,
    device_ids=None,
    output_device=None,
    dim=0,
    broadcast_buffers=True,
    process_group=None,
    bucket_cap_mb=25,
    find_unused_parameters=False,
    check_reduction=False,
    gradient_as_bucket_view=False,
    static_graph=False,
"""

import oneflow.nn.parallel as parallel
parallel.DistributedDataParallel()
"""
Args:
    module: "flow.nn.Module"
    broadcast_buffers: bool = True, 
    bucket_size: int = 10
"""

BBuf mentioned this issue Apr 22, 2022

OneFlow 算子对齐 PyTorch 完备计划推进表 Oneflow-Inc/oneflow#4936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

期望libai mae支持graph格式数据并行，流水线并行和模型并行 #259

期望libai mae支持graph格式数据并行，流水线并行和模型并行 #259

KellyZhang2020 commented Apr 12, 2022

rentainhe commented Apr 15, 2022

rentainhe commented Apr 15, 2022 •

edited

Loading

BBuf commented Apr 18, 2022

rentainhe commented Apr 18, 2022

rentainhe commented Apr 21, 2022

期望libai mae支持graph格式数据并行，流水线并行和模型并行 #259

期望libai mae支持graph格式数据并行，流水线并行和模型并行 #259

Comments

KellyZhang2020 commented Apr 12, 2022

rentainhe commented Apr 15, 2022

rentainhe commented Apr 15, 2022 • edited Loading

MAE-pytorch迁移至MAE-oneflow的接口缺失整理 (与算子兼容计划同步)

BBuf commented Apr 18, 2022

rentainhe commented Apr 18, 2022

rentainhe commented Apr 21, 2022

最小复现example

rentainhe commented Apr 15, 2022 •

edited

Loading