CPU &&/ GPU: An Evaluation of Hybrid Multicore Computing

Kishore Kothapalli, Dip Sankar Banerjee, P. J. Narayanan, Surinder Sood,
Aman Bahl, Shashank Sharma, Shrenik Lad, Krishna Kumar Singh, Kiran Matam,
Sivaramakrishna Bharadwaj, Rohit Nigam, Parikshit Sakurikar, Aditya Deshpande,
Ishan Misra, Siddharth Choudhary, Shubham Gupta
International Institute of Information Technology, Hyderabad
Gachibowli, Hyderabad 500 032, India.

Abstract

Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has significantly relegated the role of CPUs in computation. As CPUs evolve and also offer matching computational resources, it is important to include also CPUs in the computation. We call such a model as hybrid computing.

In this paper we evaluate the case for hybrid multicore computing by experimenting with a set of 13 diverse work- loads. The workloads we consider range from databases, image processing, sparse matrix kernels, and graph anal- ysis. On a high end hybrid platform consisting of a 6-core Intel i7-980X CPU and a NVidia GTX 280 GPU, our hybrid solutions offer an average of 35% speed-up compared to pure GPU solutions. We then experiment with a more democratic hybrid platform which would closely resemble the usual computing platform that everyone is likely to have at their disposal. On such a platform, consisting of an Intel dual core CPU and an NVidia GT 520 GPU, we show that our hybrid solutions offer an average of 38% advantage compared to pure GPU solutions.

Our work therefore suggests that hybrid computing can offer tremendous advantages at the more realistic system scale with significant performance gains and power efficiency to the large scale user community.

The paper details the experience of implementing hybrid solutions for 13 workloads from diverse application areas. Since the submission could not accommodate all the implementation details, the following items provide links to papers/technical reports that contain the details.

  1. Monte Carlo: The results used in the IPDPS submission appear in the paper titled "An On-Demand Fast Parallel Pseudo Random Number Generator with Applications", to appear in Proc. of the Workshop on Large Scale Parallel Processing (LSPP), 2012, in conjunction with IPDPS 2012. The paper can be accessed here.
  2. List Ranking: The results used in the IPDPS submission appear in the paper titled "Hybrid Algorithms for List Ranking and Graph Connected Components", in the Proc. of 18th Annual International Conference on High Performance Computing (HiPC), Bangalore, India, 2011, and partly in another paper titled "An On-Demand Fast Parallel Pseudo Random Number Generator with Applications", to appear in Proc. of the Workshop on Large Scale Parallel Processing (LSPP), 2012, in conjunction with IPDPS 2012. The paper can be accessed here.
  3. Connected Components: The results used in the IPDPS submission appear in the paper titled "Hybrid Algorithms for List Ranking and Graph Connected Components", in the Proc. of 18th Annual International Conference on High Performance Computing (HiPC), Bangalore, India, 2011.
  4. Image Dithering: The results used in the IPDPS submission appear in the paper titled "Hybrid Implementation of Error Diffusion Dithering", in the Proc. of 18th Annual International Conference on High Performance Computing (HiPC), Bangalore, India, 2011.
  5. Bundle Adjustment : The results used in the IPDPS submission appear in the paper titled "Practical Time Bundle Adjustment for 3D Reconstruction on GPU", in the Proc. of the ECCV 2010 Workshop on Computer Vision on GPUs (CVGPU 2010).
  6. Sparse Matrix-Matrix Multiplication (sgemm): The results used in the IPDPS submission appear in the paper titled "Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms".
  7. Convolution: A report that details the implementation of the convlution workload that we used in the IPDPS submission is available here.. The report is yet unpublished.
  8. Bilater Transforms: A report that details the implementation of the convlution workload that we used in the IPDPS submission is available here.. The report is yet unpublished.
  9. spmv: A report that details the implementation of the spmv workload that we used in the IPDPS submission is available here.. The report is yet unpublished.
  10. Ray Tracing: A report that details the implementation of the spmv workload that we used in the IPDPS submission is available here.. The report is yet unpublished.
  11. Lattice Boltzman: A report that details the implementation of the spmv workload that we used in the IPDPS submission is available here.. The report is yet unpublished.