机器学习分布式框架horovod安装 (Linux环境)
2021/12/19 7:20:30
本文主要是介绍机器学习分布式框架horovod安装 (Linux环境),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1、openmi 下载安装
下载连接:
https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.gz
安装命令
1 2 3 4 5 | shell$ gunzip -c openmpi-4.0.1.tar.gz | tar xf - shell$ cd openmpi-4.0.1 shell$ ./configure --prefix=/usr/local <...lots of output...> shell$ make all install |
sudo ldconfig
2、horovod安装
官方文档: https://github.com/horovod/horovod#install
[sudo] pip3 install horovod
安装支持NCCL的版本的horovod
HOROVOD_GPU_ALLREDUCE=NCCL pip3 install --no-cache-dir horovod
3、horovod 使用
3.1 tensorFLow 修改
import tensorflow as tf import horovod.tensorflow as hvd # Initialize Horovod hvd.init() # Pin GPU to be used to process local rank (one GPU per process) config = tf.ConfigProto() config.gpu_options.visible_device_list = str(hvd.local_rank()) # Build model... loss = ... opt = tf.train.AdagradOptimizer(0.01 * hvd.size()) # Add Horovod Distributed Optimizer opt = hvd.DistributedOptimizer(opt) # Add hook to broadcast variables from rank 0 to all other processes during # initialization. hooks = [hvd.BroadcastGlobalVariablesHook(0)] # Make training operation train_op = opt.minimize(loss) # Save checkpoints only on worker 0 to prevent other workers from corrupting them. checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None # The MonitoredTrainingSession takes care of session initialization, # restoring from a checkpoint, saving to a checkpoint, and closing when done # or an error occurs. with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir, config=config, hooks=hooks) as mon_sess: while not mon_sess.should_stop(): # Perform synchronous training. mon_sess.run(train_op)
3.2 tensorflow 运行
mpi 指定mca通讯端口
mpirun --allow-run-as-root --oversubscribe \ -np 8-H ubuntu1:4,ubuntu2:4 \ -bind-to none -map-by slot \ -mca plm_rsh_args "-p 22" \ -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \ -mca pml ob1 -mca btl ^openib \ python3 -u train.py
这篇关于机器学习分布式框架horovod安装 (Linux环境)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-12如何创建可引导的 ESXi USB 安装介质 (macOS, Linux, Windows)
- 2024-11-08linux的 vi编辑器中搜索关键字有哪些常用的命令和技巧?-icode9专业技术文章分享
- 2024-11-08在 Linux 的 vi 或 vim 编辑器中什么命令可以直接跳到文件的结尾?-icode9专业技术文章分享
- 2024-10-22原生鸿蒙操作系统HarmonyOS NEXT(HarmonyOS 5)正式发布
- 2024-10-18操作系统入门教程:新手必看的基本操作指南
- 2024-10-18初学者必看:操作系统入门全攻略
- 2024-10-17操作系统入门教程:轻松掌握操作系统基础知识
- 2024-09-11Linux部署Scrapy学习:入门级指南
- 2024-09-11Linux部署Scrapy:入门级指南
- 2024-08-21【Linux】分区向左扩容的方法