Defines routines to allocate memory using different APIs and copy between them. More...

Classes
struct	DefaultAllocator< T >
	Default allocator that uses C++ new/delete This class uses standard C++ routines for allocation and memory manipulation: new[], delete[] and std::copy. More...

struct	DeviceAllocator< T >
	CUDA device memory allocator that uses cudaMalloc,cudaMemcpy,cudaFree It creates a pointer that is allocated on the device. The pointer cannot be used by the caller and should only be passed to a CUDA kernel. The copy uses cudaMemcpy to transfer data between 2 device arrays. More...

struct	HostAllocator< T >
	CUDA host memory allocator uses cudaMallocHost,cudaMemcpy,cudaFreeHost Host memory allocator is similar to malloc. The pointers point to memory that can be used by C++. However, CUDA documentation claims that copying to device memory from a CUDA allocated host array is faster than memory allocated using malloc. More...

struct	MappedHostAllocator< T >
	CUDA host memory allocator similar to HostAllocator using device mapped memory A Mapped memory is accessible on host and device. However, the pointers are different and this complicated everything. According to CUDA manual, version 4.0 of CUDA SDK uses unified pointers so there in to need to map the pointer. In that case, The pointer obtained using this allocator can be passed to a kernel. More...

Functions
template<class A , class T >
void	alloc_copy (A, A, T begin, T end, T *dst)
	Simple copy between the same allocator. Uses the copy method of the allocator.

template<class T >
void	alloc_copy (DefaultAllocator< T >, DeviceAllocator< T >, T begin, T end, T *dst)
	Copy from host memory to device memory.

template<class T >
void	alloc_copy (DeviceAllocator< T >, DefaultAllocator< T >, T begin, T end, T *dst)
	Copy from device memory to host memory.

Detailed Description

Defines routines to allocate memory using different APIs and copy between them.

The classes provide a generalized interface for different memory allocators. This is similar to what C++ library uses. The difference is, device memory allocators allocate a memory that is not accessible to the CPU. So in addition to allocations we need copy routines between allocators.

Every allocator provides three basic actions:

create
copy
free

The copy routine inside the allocator only copies inside the same memory hierarchy. For memory transfers between domains, we need to use the function template alloc_copy. alloc_copy is specialized for different combinations of source and destination to use specific CUDA calls for copying.

Current allocators are:

C++ new/delete
CUDA [GPU] device memory
CUDA host memory
CUDA device-mapped host memory

Definition in file allocators.hpp.

Classes

Functions

Detailed Description