CUDA Programming 1: Hints

  1. There are 16 integers totally. Each thread block has 4 threads, so your code should set 4 thread blocks.
  2. You needs to calculate global index in your kernel code as “int idx = blockIdx.x*blockDim.x + threadIdx.x; “.