On This Page

This set of High Performance Computing HPC Multiple Choice Questions & Answers (MCQs) focuses on High Performance Computing Set 2

Q1 | NVIDIA CUDA Warp is made up of how many threads?
  • 512
  • 1024
  • 312
  • 32
Q2 | Out-of-order instructions is not possible on GPUs.
  • true
  • false
  • --
  • --
Q3 | CUDA supports programming in ....
  • c or c++ only
  • java, python, and more
  • c, c++, third party wrappers for java, python, and more
  • pascal
Q4 | FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.
  • 32-bit ieee floating point instructions
  • 32-bit integer instructions
  • both
  • none of the above
Q5 | Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).
  • 1024
  • 128
  • 512
  • 8
Q6 | Each NVIDIA GPU has ------ Streaming Multiprocessors
  • 8
  • 1024
  • 512
  • 16
Q7 | CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.
  • “programming-overhead”, 2 clock
  • “zero-overhead”, 1 clock
  • 64, 2 clock
  • 32, 1 clock
Q8 | Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.
  • simd (single instruction multiple data)
  • simt (single instruction multiple thread)
  • sisd (single instruction single data)
  • sist (single instruction single thread)
Q9 | Limitations of CUDA Kernel
  • recursion, call stack, static variable declaration
  • no recursion, no call stack, no static variable declarations
  • recursion, no call stack, static variable declaration
  • no recursion, call stack, no static variable declarations
Q10 | What is Unified Virtual Machine
  • it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
  • it is a technique for managing separate host and device memory spaces.
  • it is a technique for executing device code on host and host code on device.
  • it is a technique for executing general purpose programs on device instead of host.
Q11 | _______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.
  • python, gpus.
  • c, cpus.
  • cuda c, gpus.
  • java, cpus.
Q12 | The CUDA architecture consists of --------- for parallel computing kernels and functions.
  • risc instruction set architecture
  • cisc instruction set architecture
  • zisc instruction set architecture
  • ptx instruction set architecture
Q13 | CUDA stands for --------, designed by NVIDIA.
  • common union discrete architecture
  • complex unidentified device architecture
  • compute unified device architecture
  • complex unstructured distributed architecture
Q14 | The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.
  • true
  • false
  • ---
  • ---
Q15 | The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.
  • 128, 256, 512
  • 32, 64, 128
  • 64, 128, 256
  • 256, 512, 1024
Q16 | NVIDIA 8-series GPUs offer -------- .
  • 50-200 gflops
  • 200-400 gflops
  • 400-800 gflops
  • 800-1000 gflops
Q17 | IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.
  • 32-bit ieee floating point instructions
  • 32-bit integer instructions
  • both
  • none of the above
Q18 | CUDA Hardware programming model supports:a) fully generally data-parallel archtecture;b) General thread launch;c) Global load-store;d) Parallel data cache;e) Scalar architecture;f) Integers, bit operation
  • a,c,d,f
  • b,c,d,e
  • a,d,e,f
  • a,b,c,d,e,f
Q19 | In CUDA memory model there are following memory types available:a) Registers;b) Local Memory;c) Shared Memory;d) Global Memory;e) Constant Memory;f) Texture Memory.
  • a, b, d, f
  • a, c, d, e, f
  • a, b, c, d, e, f
  • b, c, e, f
Q20 | What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World!\n"); return 0; }
  • int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
  • __global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
  • __global__ void kernel( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
  • __global__ int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\\n"); return 0; }
Q21 | Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }
  • a
  • b
  • both a,b
  • ---
Q22 | A simple kernel for adding two integers: __global__ void add( int *a, int *b, int *c ) { *c = *a + *b; } where __global__ is a CUDA C keyword which indicates that:
  • add() will execute on device, add() will be called from host
  • add() will execute on host, add() will be called from device
  • add() will be called and executed on host
  • add() will be called and executed on device
Q23 | If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:
  • cudamalloc( &dev_a, sizeof( int ) )
  • malloc( &dev_a, sizeof( int ) )
  • cudamalloc( (void**) &dev_a, sizeof( int ) )
  • malloc( (void**) &dev_a, sizeof( int ) )
Q24 | If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:
  • memcpy( dev_a, &a, size);
  • cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
  • memcpy( (void*) dev_a, &a, size);
  • cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );
Q25 | Triple angle brackets mark in a statement inside main function, what does it indicates?
  • a call from host code to device code
  • a call from device code to host code
  • less than comparison
  • greater than comparison