python & 機械学習 - cspp.cc.u-tokyo.ac.jp

1
2021/1/5
l
4. 1020 l
5. 1027 l 2
10. 128 l (1) l
11. 1215 l (2)
12. 1222 l RB-HGPU 1
13. 15 l Python

2021/1/5 3
Python •
• •
• (C, CUDA) • • Python
2021/1/5 4
Python • Python (The Python Package Index(PyPI))
https://pypi.org • pip •
• https://www.python.jp/install/docs/install_plan.html
1. Module • module avail Deep Learning) • PyTorch
module load pytorch/1.4.0 • Tensorflow:
module load anaconda3/2020.07 •
2021/1/5 6
Python (RB) (2) 2.
• Anaconda •
4. (Singularity)
2020/12/22 9
Python •
• $ cat mat-mat-py.bash.oXXXXX

2021/1/5 11
-Python, naive •
2020/12/22 12
import numpy as np # numpy import time # nn = 100 a=np.ndarray([nn,nn], dtype=‘float64’) # b=np.ndarray([nn,nn], dtype='float64') c=np.ndarray([nn,nn], dtype='float64') a[:,:] = 1.0; b[:,:] = 1.0; c[:,:] = 0.0
start=time.perf_counter() # for i in range(nn):
for j in range(nn): for k in range(nn):
c[i,j] += (a[i,k] * b[k,j]) end=time.perf_counter() #
NumPy • https://numpy.org
• • BLAS
• SciPy https://scipy.org
• Anaconda $ conda list|& grep –e numpy –e scipy numpy 1.19.2 py38h54aff64_0 numpy-base 1.19.2 py38hfa32c7d_0 scipy 1.5.2 py38h0b6359f_0
• PyPI $ pip list|& grep -e numpy -e scipy numpy 1.19.2 scipy 1.5.2
2021/1/5 13
• Chainer
• Anaconda (pip) $ conda list|& grep cupy cupy-cuda102 8.3.0 pypi_0 pypi
(PyPI $ pip list|& grep cupy cupy-cuda102 8.3.0
2021/1/5 14
c = np.dot(a,b)
2020/12/22 16
import numpy as np # NumPy import cupy as cp # CuPy import time # nn = 100 a=np.ndarray([nn,nn], dtype=‘float64’) # b=np.ndarray([nn,nn], dtype='float64') c=np.ndarray([nn,nn], dtype='float64') a[:,:] = 1.0; b[:,:] = 1.0; c[:,:] = 0.0
a_gpu = cp.asarray(a) # GPU=> a_gpu b_gpu = cp.asarray(b) c_gpu = cp.asarray(c)
-Python, CuPy(2) • NumPyCuPy
• np.dot => cp.dot
c_gpu = cp.dot(a_gpu,b_gpu) cp.cuda.Stream.null.synchronize() # GPU
c_gpu = cp.matmul(a_gpu, b_gpu) cp.cuda.Stream.null.synchronize()
c_gpu = a_gpu @ b_gpu cp.cuda.Stream.null.synchronize()
Python #!/bin/bash #PBS -q h-lecture #PBS -Wgroup_list=gt59 #PBS -l select=1:mpiprocs=2 #PBS -l walltime=00:05:00 #PBS -j oe
cd $PBS_O_WORKDIR . /etc/profile.d/modules.sh
python mat-mat.py python mat-mat-gpu.py
h-lecture
gt59
136.2896652473447 [MFLOPS] np.matmul(a,b): N = 100 Mat-Mat time = 0.00039076502434909344 [sec.]
5118.165330511469 [MFLOPS] a@b: N = 100 Mat-Mat time = 5.2541960030794144e-05 [sec.]
38064.81522249696 [MFLOPS]
0.07115527287393623 [MFLOPS]
7.027896937150386 [MFLOPS]
24868.80690194262 [MFLOPS]
5289.997266916574 [MFLOPS]

2021/1/5 20
MPI for Python: mpi4py • PythonMPI
https://mpi4py.readthedocs.io • MPI openmpi/4.0.5/gnu
• MPI import numpy as np from mpi4py import MPI comm = MPI.COMM_WORLD # MPI_COMM_WORLD myid = comm.Get_rank() numprocs = comm.Get_size()
sum = np.ndarray(1, dtype=‘float64’) total = np.ndarray(1, dtype=‘float64’) comm.Reduce(sum, total, op=MPI.SUM, root=0) #
2021/1/5 21
• $ cat pi-py.bash.oXXXXX
Python #!/bin/bash #PBS -q h-lecture #PBS -Wgroup_list=gt59 #PBS -l select=2:mpiprocs=36 #PBS -l walltime=00:05:00 #PBS -j oe
cd $PBS_O_WORKDIR . /etc/profile.d/modules.sh
mpirun python pi.py --num 10000000
2021/1/5 23
h-lecture
gt59
• Tensorflow • Google • 2.4.0
• (Tensorflow

←
• $ cat pytorch-mnist.log.XXXX.reedbush-pbsadmin0
MNIST by PyTorch • https://github.com/pytorch/examples/blob/master/mnist/main.py •
• Epoch: • [n / 60000 (p%)]: 6np% • Loss: • Accuracy: • elapse: 1 epoch
Train Epoch: 1 [0/60000 (0%)] Loss: 2.351223 Train Epoch: 1 [640/60000 (1%)] Loss: 1.251258 … Train Epoch: 14 [59520/60000 (99%)] Loss: 0.000265
Test set: Average loss: 0.0268, Accuracy: 9918/10000 (99%) elapse: 16.79 sec
2021/1/5 27
$ singularity exec /lustre/gt59/share/tf-image.file python -c "import tensorflow as tf; print(tf.__version__, tf.keras.__version__)" 2.4.0 2.4.0
• () $ qsub keras-tf-mnist.bash
• $ cat keras-tf-mnist.log.XXXX.reedbush-pbsadmin0
MNIST by Keras+Tensorflow • https://www.tensorflow.org/tutorials/images/cnn?hl=ja
• PyTorch
… Successfully opened dynamic library libcudart.so.11.0 Epoch 1/5 1875/1875 [========…=] - 9s 4ms/step - loss: 0.3237 - accuracy: 0.8979 … Epoch 5/5 1875/1875 [========…=] - 8s 4ms/step - loss: 0.0189 - accuracy: 0.9938 313/313 - 1s - loss: 0.0233 - accuracy: 0.9927 0.9926999807357788
2021/1/5 29
2021/1/5 30
$ singularity exec container.img python mnist.py
$ singularity run container.img or $ ./container.img
shub://
Singularity recipe

• • 1 •
• •
• è
• PyTorch • () $ qsub pytorch-horovod-mnist.bash
• $ cat pytorch-horovod-mnist.log.XXXX.reedbush- pbsadmin0
2020/12/22 33
MNIST (PyTorch + Horovod) • 24GPU) 2.7
Train Epoch: 1 [0/15000 (0%)] Loss: 2.346126 Train Epoch: 1 [0/15000 (0%)] Loss: 2.305933 Train Epoch: 1 [0/15000 (0%)] Loss: 2.285888 Train Epoch: 1 [0/15000 (0%)] Loss: 2.354724 … Train Epoch: 10 [14720/15000 (98%)] Loss: 0.245247 Train Epoch: 10 [14720/15000 (98%)] Loss: 0.101313 Train Epoch: 10 [14720/15000 (98%)] Loss: 0.139264 Train Epoch: 10 [14720/15000 (98%)] Loss: 0.191978
Test set: Average loss: 0.0540, Accuracy: 98.28%
elapse: 6.16 sec
GPU • MPIGPU
• MPI0 CPU #0 ó GPU #0 • MPI1 CPU #1 ó GPU #1
• GPU •
NUMA
Reedbush
ChainerMN • Chainer
• ChainerMN • Chainer • MPI (Message Passing Interface) • NCCL (NVIDIA Collective Communications Library)
• GPU • Allreduce
Reedbush-HImageNet
• ResNet-50 • 100 • 64, 128, 240 GPU (RB-H) • ChainerMN 1.0.0 • OpenMPI 2.1.1, NCCLv2 • Chainer 3.1.0 • python 3.6.1, CUDA 8, cuDNN7,
cupy 2.1.0
2021/1/5 38
32GPU
64GPU
– OpenMPI 2.1.1, NCCLv2 – Chainer 3.1.0 – python 3.6.1, CUDA 8, cuDNN7,
cupy 2.1.0
2. [L40] Reedbush
41
•L00: •L10 •L20 •L30 •L40 •L50 L
2021/1/5

python & 機械学習 - cspp.cc.u-tokyo.ac.jp

Documents