On This Page
This set of High Performance Computing HPC Multiple Choice Questions & Answers (MCQs) focuses on High Performance Computing Set 2
Q1 | NVIDIA CUDA Warp is made up of how many threads?
- 512
- 1024
- 312
- 32
Q2 | Out-of-order instructions is not possible on GPUs.
- true
- false
- --
- --
Q3 | CUDA supports programming in ....
- c or c++ only
- java, python, and more
- c, c++, third party wrappers for java, python, and more
- pascal
Q4 | FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.
- 32-bit ieee floating point instructions
- 32-bit integer instructions
- both
- none of the above
Q5 | Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).
- 1024
- 128
- 512
- 8
Q6 | Each NVIDIA GPU has ------ Streaming Multiprocessors
- 8
- 1024
- 512
- 16
Q7 | CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.
- “programming-overhead”, 2 clock
- “zero-overhead”, 1 clock
- 64, 2 clock
- 32, 1 clock
Q8 | Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.
- simd (single instruction multiple data)
- simt (single instruction multiple thread)
- sisd (single instruction single data)
- sist (single instruction single thread)
Q9 | Limitations of CUDA Kernel
- recursion, call stack, static variable declaration
- no recursion, no call stack, no static variable declarations
- recursion, no call stack, static variable declaration
- no recursion, call stack, no static variable declarations
Q10 | What is Unified Virtual Machine
- it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
- it is a technique for managing separate host and device memory spaces.
- it is a technique for executing device code on host and host code on device.
- it is a technique for executing general purpose programs on device instead of host.
Q11 | _______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.
- python, gpus.
- c, cpus.
- cuda c, gpus.
- java, cpus.
Q12 | The CUDA architecture consists of --------- for parallel computing kernels and functions.
- risc instruction set architecture
- cisc instruction set architecture
- zisc instruction set architecture
- ptx instruction set architecture
Q13 | CUDA stands for --------, designed by NVIDIA.
- common union discrete architecture
- complex unidentified device architecture
- compute unified device architecture
- complex unstructured distributed architecture
Q14 | The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.
- true
- false
- ---
- ---
Q15 | The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.
- 128, 256, 512
- 32, 64, 128
- 64, 128, 256
- 256, 512, 1024
Q16 | NVIDIA 8-series GPUs offer -------- .
- 50-200 gflops
- 200-400 gflops
- 400-800 gflops
- 800-1000 gflops
Q17 | IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.
- 32-bit ieee floating point instructions
- 32-bit integer instructions
- both
- none of the above
Q18 | CUDA Hardware programming model supports:a) fully generally data-parallel archtecture;b) General thread launch;c) Global load-store;d) Parallel data cache;e) Scalar architecture;f) Integers, bit operation
- a,c,d,f
- b,c,d,e
- a,d,e,f
- a,b,c,d,e,f
Q19 | In CUDA memory model there are following memory types available:a) Registers;b) Local Memory;c) Shared Memory;d) Global Memory;e) Constant Memory;f) Texture Memory.
- a, b, d, f
- a, c, d, e, f
- a, b, c, d, e, f
- b, c, e, f
Q20 | What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World!\n"); return 0; }
- int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
- __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
- __global__ void kernel( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
- __global__ int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
Q21 | Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }
- a
- b
- both a,b
- ---
Q22 | A simple kernel for adding two integers: __global__ void add( int *a, int *b, int *c ) { *c = *a + *b; } where __global__ is a CUDA C keyword which indicates that:
- add() will execute on device, add() will be called from host
- add() will execute on host, add() will be called from device
- add() will be called and executed on host
- add() will be called and executed on device
Q23 | If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:
- cudamalloc( &dev_a, sizeof( int ) )
- malloc( &dev_a, sizeof( int ) )
- cudamalloc( (void**) &dev_a, sizeof( int ) )
- malloc( (void**) &dev_a, sizeof( int ) )
Q24 | If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:
- memcpy( dev_a, &a, size);
- cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
- memcpy( (void*) dev_a, &a, size);
- cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );
Q25 | Triple angle brackets mark in a statement inside main function, what does it indicates?
- a call from host code to device code
- a call from device code to host code
- less than comparison
- greater than comparison