High Performance Computing Set 2 MCQs

Q1 | NVIDIA CUDA Warp is made up of how many threads?

512
1024
312
32

Answer: 32

Q2 | Out-of-order instructions is not possible on GPUs.

true
false
--
--

Answer: false

Q3 | CUDA supports programming in ....

c or c++ only
java, python, and more
c, c++, third party wrappers for java, python, and more
pascal

Answer: c, c++, third party wrappers for java, python, and more

Q4 | FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.

32-bit ieee floating point instructions
32-bit integer instructions
both
none of the above

Answer: 32-bit ieee floating point instructions

Q5 | Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).

1024
128
512
8

Answer: 8

Q6 | Each NVIDIA GPU has ------ Streaming Multiprocessors

8
1024
512
16

Answer: 16

Q7 | CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.

“programming-overhead”, 2 clock
“zero-overhead”, 1 clock
64, 2 clock
32, 1 clock

Answer: “zero-overhead”, 1 clock

Q8 | Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.

simd (single instruction multiple data)
simt (single instruction multiple thread)
sisd (single instruction single data)
sist (single instruction single thread)

Answer: simt (single instruction multiple thread)

Q9 | Limitations of CUDA Kernel

recursion, call stack, static variable declaration
no recursion, no call stack, no static variable declarations
recursion, no call stack, static variable declaration
no recursion, call stack, no static variable declarations

Answer: no recursion, no call stack, no static variable declarations

Q10 | What is Unified Virtual Machine

it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
it is a technique for managing separate host and device memory spaces.
it is a technique for executing device code on host and host code on device.
it is a technique for executing general purpose programs on device instead of host.

Answer: it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.

Q11 | _______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.

python, gpus.
c, cpus.
cuda c, gpus.
java, cpus.

Answer: cuda c, gpus.

Q12 | The CUDA architecture consists of --------- for parallel computing kernels and functions.

risc instruction set architecture
cisc instruction set architecture
zisc instruction set architecture
ptx instruction set architecture

Answer: ptx instruction set architecture

Q13 | CUDA stands for --------, designed by NVIDIA.

common union discrete architecture
complex unidentified device architecture
compute unified device architecture
complex unstructured distributed architecture

Answer: compute unified device architecture

Q14 | The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.

true
false
---
---

Answer: true

Q15 | The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.

128, 256, 512
32, 64, 128
64, 128, 256
256, 512, 1024

Answer: 128, 256, 512

Q16 | NVIDIA 8-series GPUs offer -------- .

50-200 gflops
200-400 gflops
400-800 gflops
800-1000 gflops

Answer: 50-200 gflops

Q17 | IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.

32-bit ieee floating point instructions
32-bit integer instructions
both
none of the above

Answer: 32-bit integer instructions

Q18 | CUDA Hardware programming model supports:a) fully generally data-parallel archtecture;b) General thread launch;c) Global load-store;d) Parallel data cache;e) Scalar architecture;f) Integers, bit operation

a,c,d,f
b,c,d,e
a,d,e,f
a,b,c,d,e,f

Answer: a,b,c,d,e,f

Q19 | In CUDA memory model there are following memory types available:a) Registers;b) Local Memory;c) Shared Memory;d) Global Memory;e) Constant Memory;f) Texture Memory.

a, b, d, f
a, c, d, e, f
a, b, c, d, e, f
b, c, e, f

Answer: a, b, c, d, e, f

Q20 | What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World!\n"); return 0; }

int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
__global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
__global__ void kernel( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
__global__ int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }

Answer: __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }

Q21 | Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }

a
b
both a,b
---

Answer: a

Q22 | A simple kernel for adding two integers: __global__ void add( int *a, int *b, int *c ) { *c = *a + *b; } where __global__ is a CUDA C keyword which indicates that:

add() will execute on device, add() will be called from host
add() will execute on host, add() will be called from device
add() will be called and executed on host
add() will be called and executed on device

Answer: add() will execute on device, add() will be called from host

Q23 | If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:

cudamalloc( &dev_a, sizeof( int ) )
malloc( &dev_a, sizeof( int ) )
cudamalloc( (void**) &dev_a, sizeof( int ) )
malloc( (void**) &dev_a, sizeof( int ) )

Answer: cudamalloc( (void**) &dev_a, sizeof( int ) )

Q24 | If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:

memcpy( dev_a, &a, size);
cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
memcpy( (void*) dev_a, &a, size);
cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );

Answer: cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );

Q25 | Triple angle brackets mark in a statement inside main function, what does it indicates?

a call from host code to device code
a call from device code to host code
less than comparison
greater than comparison

Answer: a call from host code to device code