thrust & curand dr. bo yuan e-mail: [email protected]

32
Thrust & Curand Dr. Bo Yuan E-mail: [email protected]

Upload: angela-wetherill

Post on 30-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Thrust & CurandThrust & Curand

Dr. Bo Yuan

E-mail: [email protected]

Page 2: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

2

OutlineOutline

• Thrust Basic

• Work with Thrust

• General Types of Thrust

• Fancy Iterators of Thrust

• Curand Library

Page 3: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

3

Thrust BasicThrust Basic

• What is Thrust?

• Containers

• Iterators

• Interaction with raw pointers

• Namespace

Page 4: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

4

What is Thrust?What is Thrust?

• C++ template library for CUDA– Mimics Standard Template Library (STL)

• Containers– thrust::host_vector<T>– thrust::device_vector<T>

• Algorithms– thrust::sort()– thrust::reduce()– thrust::inclusive_scan()– …

Page 5: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

5

ContainersContainers

• Make common operations concise and readable – Hides cudaMalloc, cudaMemcpy and cudaFree

// allocate host vector with two elements thrust::host_vector<int> h_vec(2);•   // copy host vector to device thrust::device_vector<int> d_vec = h_vec;

// manipulate device values from the host d_vec[0] = 13; d_vec[1] = 27;•   std::cout << "sum: " << d_vec[0] + d_vec[1] << std::endl;

// vector memory automatically released w/ free() or cudaFree()

Page 6: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

6

IteratorsIterators• Sequences defined by pair of iterators

// allocate device vectorthrust::device_vector<int> d_vec(4);

// returns iterator at first element of d_vecd_vec.begin();

// returns iterator one past the last element of d_vecd_vec.end()

// [begin, end) pair defines a sequence of 4 elements

d_vec.begin() d_vec.end()

Page 7: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

7

IteratorsIterators

// initialize random values on host thrust::host_vector<int> h_vec(1000);thrust::generate(h_vec.begin(), h_vec.end(), rand);

// copy values to devicethrust::device_vector<int> d_vec = h_vec;

// compute sum on hostint h_sum =

thrust::reduce(h_vec.begin(),h_vec.end());

// compute sum on deviceint d_sum =

thrust::reduce(d_vec.begin(),d_vec.end());

Page 8: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

8

Convertible to Raw PointersConvertible to Raw Pointers

// allocate device vectorthrust::device_vector<int> d_vec(4);

// obtain raw pointer to device vector’s memoryint * ptr = thrust::raw_pointer_cast(&d_vec[0]);

// use ptr in a CUDA C kernelmy_kernel<<<N/256, 256>>>(N, ptr);

// Note: ptr cannot be dereferenced on the host!

Page 9: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

9

Wrap Raw Pointers with device_ptrWrap Raw Pointers with device_ptr

// raw pointer to device memoryint * raw_ptr;cudaMalloc((void **) &raw_ptr, N * sizeof(int));

// wrap raw pointer with a device_ptr thrust::device_ptr<int> dev_ptr(raw_ptr);

// use device_ptr in thrust algorithmsthrust::fill(dev_ptr, dev_ptr + N, (int) 0);

// access device memory through device_ptrdev_ptr[0] = 1;

Page 10: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

10

NamespacesNamespaces

• C++ supports namespaces– Thrust uses thrust namespace

• thrust::device_vector• thrust::copy

– STL uses std namespace• std::vector• std::list

• Avoids collisions– thrust::sort()– std::sort()

• For brevity– using namespace thrust;

Page 11: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

11

Usage of Thrust:Usage of Thrust:

• Thrust provides many standard algorithms– Transformations– Reductions– Prefix Sums– Sorting

• Generic definitions– General Types

• Built-in types (int, float, …)• User-defined structures

– General Operators• reduce with plus operator• scan with maximum operator

Page 12: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

12

TransformationsTransformations// allocate three device_vectors with 10 elements

thrust::device_vector<int> X(10);

thrust::device_vector<int> Y(10);

thrust::device_vector<int> Z(10);

// initialize X to 0,1,2,3, ....

thrust::sequence(X.begin(), X.end());

// compute Y = -X

thrust::transform(X.begin(), X.end(), Y.begin(), thrust::negate<int>());

// fill Z with twos

thrust::fill(Z.begin(), Z.end(), 2);

// compute Y = X mod 2

thrust::transform(X.begin(), X.end(), Z.begin(), Y.begin(),

thrust::modulus<int>());

// replace all the ones in Y with tens

thrust::replace(Y.begin(), Y.end(), 1, 10);

Page 13: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

13

ReductionsReductions

int sum = thrust::reduce(D.begin(), D.end(), (int) 0, thrust::plus<int>());

int sum = thrust::reduce(D.begin(), D.end(), (int) 0);

int sum = thrust::reduce(D.begin(), D.end())

thrust::count_ifthrust::min_elementthrust::max_elementthrust::is_sortedthrust::inner_product

Page 14: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

14

SortingSorting

• thrust::sort

• thrust::stable_sort

• thrust::sort_by_key

• thrust::stable_sort_by_key

Page 15: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

15

General Types and OperatorsGeneral Types and Operators

#include <thrust/reduce.h>// declare storagedevice_vector<int> i_vec = ... device_vector<float> f_vec = ... // sum of integers (equivalent calls)reduce(i_vec.begin(), i_vec.end());reduce(i_vec.begin(), i_vec.end(), 0, plus<int>());// sum of floats (equivalent calls)reduce(f_vec.begin(), f_vec.end());reduce(f_vec.begin(), f_vec.end(), 0.0f,

plus<float>());// maximum of integersreduce(i_vec.begin(), i_vec.end(), 0,

maximum<int>());

Page 16: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

16

General Types and OperatorsGeneral Types and Operatorsstruct negate_float2{ __host__ __device__ float2 operator()(float2 a) { return make_float2(-a.x, -a.y); }};// declare storagedevice_vector<float2> input = ... device_vector<float2> output = ...// create functornegate_float2 func;// negate vectorstransform(input.begin(), input.end(), output.begin(),

func);

Page 17: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

17

General Types and OperatorsGeneral Types and Operators

// compare x component of two float2 structuresstruct compare_float2{ __host__ __device__ bool operator()(float2 a, float2 b) { return a.x < b.x; }};// declare storagedevice_vector<float2> vec = ...// create comparison functorcompare_float2 comp;// sort elements by x componentsort(vec.begin(), vec.end(), comp);

Page 18: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

18

Advanced ThrustAdvanced Thrust

• Behave like “normal” iterators– Algorithms don't know the difference

– constant_iterator– counting_iterator– transform_iterator– zip_iterator

Page 19: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

19

Constant Constant

// create iteratorsconstant_iterator<int> begin(10);constant_iterator<int> end = begin + 3;

begin[0] // returns 10begin[1] // returns 10begin[100] // returns 10

// sum of [begin, end)reduce(begin, end); // returns 30 (i.e. 3 * 10)

Page 20: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

20

CountingCounting

// create iteratorscounting_iterator<int> begin(10);counting_iterator<int> end = begin + 3;

begin[0] // returns 10begin[1] // returns 11begin[100] // returns 110

// sum of [begin, end)reduce(begin, end); // returns 33 (i.e. 10 + 11 + 12)

Page 21: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

21

TransformTransform

– Yields a transformed sequence– Facilitates kernel fusion– Conserves memory capacity and bandwidth

X Y Z

F( x ) F( x ) F( y ) F( z )

Page 22: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

22

TransformTransform

// initialize vectordevice_vector<int> vec(3);vec[0] = 10; vec[1] = 20; vec[2] = 30;// create iterator (type omitted)begin = make_transform_iterator(vec.begin(),

negate<int>());end = make_transform_iterator(vec.end(),

negate<int>());

begin[0] // returns -10begin[1] // returns -20begin[2] // returns -30

// sum of [begin, end)reduce(begin, end); // returns -60 (i.e. -10 + -20 + -30)

Page 23: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

23

ZipZip

– Looks like an array of structs (AoS)– Stored in structure of arrays (SoA)

Page 24: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

24

ZipZip// initialize vectorsdevice_vector<int> A(3);device_vector<char> B(3);A[0] = 10; A[1] = 20; A[2] = 30;B[0] = ‘x’; B[1] = ‘y’; B[2] = ‘z’;// create iterator (type omitted)begin = make_zip_iterator(make_tuple(A.begin(),

B.begin()));end = make_zip_iterator(make_tuple(A.end(), B.end()));

begin[0] // returns tuple(10, ‘x’)begin[1] // returns tuple(20, ‘y’)begin[2] // returns tuple(30, ‘z’)// maximum of [begin, end)maximum< tuple<int,char> > binary_op;reduce(begin, end, begin[0], binary_op); // returns

tuple(30, ‘z’)

Page 25: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

25

Curand LibraryCurand Library

• # include <curand.h>

• curandStatus_t curandGenerate(curandGenerator_t generator, unsigned int *outputPtr, size_t num)

• A given state may not be used by more than one block.

• A given block may generate random numbers using multiple states.

Page 26: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

26

DistributionDistribution

__device__ floatcurand_uniform (curandState_t *state)

__device__ floatcurand_normal (curandState_t *state)

__device__ unsigned int curand_poisson (curandState_t *state, double lambda)

__device__ double curand_uniform_double (curandState_t *state)

__device__ float2 curand_normal2 (curandState_t *state)

Page 27: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

27

struct estimate_pi{ public thrust::unary_function<unsigned int, float>{ __device__ float operator()(unsigned int

thread_id) {float sum = 0;unsigned int N = 10000; // samples per threadunsigned int seed = thread_id;curandState s;// seed a random number generatorcurand_init(seed, 0, 0, &s);

// take N samples in a quarter circle for(unsigned int i = 0; i < N; ++i) {

Estimating πEstimating π

Page 28: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

28

// draw a sample from the unit squarefloat x = curand_uniform(&s);float y = curand_uniform(&s);float dist = sqrtf(x*x + y*y); // measure distance from the origin// add 1.0f if (u0,u1) is inside the quarter circleif(dist <= 1.0f)sum += 1.0f; }// multiply by 4 to get the area of the whole circlesum *= 4.0f;// divide by Nreturn sum / N; }};

Estimating πEstimating π

Page 29: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

29

int main(void) { // use 30K independent seeds int M = 30000; float estimate = thrust::transform_reduce

( thrust::counting_iterator<int>(0),thrust::counting_iterator<int>(M), estimate_pi(), 0.0f, thrust::plus<float>());

estimate /= M;std::cout << std::setprecision(3); std::cout << "pi is approximately "; std::cout << estimate << std::endl; return 0; }

Estimating πEstimating π

Page 30: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

30

ReviewReview

• What are containers and iterators?

• How to transfer information between Host and Device via Thurst?

• How to use thrust’s namespace?

• In which way can we make containers associated with raw pointers?

• How to use Thrust to perform transformation, reduce and sort?

• How to use general types in Thrust?

Page 31: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

31

ReviewReview

• Why constant iterators can save storage?

• When do we need to use counting iterators, and how much memory does it require?

• How to use transfor iterators?

• Why do we need zip iterators?

• How to generate different distribution by Curand?

• How to use Curand with Thrust?

• How to use Curand in a kernel function?

Page 32: Thrust & Curand Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

32

ReferencesReferences

• An Introduction to Thrust

• GTC 2010

– (Part 1) - High Productivity Development

– (Part 2) - Thrust By Example

• Thrust_Quick_Start_Guide

• Thrust - A Productivity-Oriented Library for CUDA

• http://code.google.com/p/thrust/

• CURAND_Library