Object detection has been the most important task in the field of computer vision. As researchers have proposed many efficient object detection methods, the deployment and application of these methods in an engineering perspective has become a problem. We propose to implement an efficient object detection pipeline on the GPU using CUDA. The pipeline offers an end-to-end solution which can take images as input and output bounding boxes to mark the detected objects. We use PVANet as our object detection method, concatenated with a Non-maximum Suppression (NMS) algorithm to eliminate redundant bounding boxes. We focus on minimize the program latency and footprint by using various CUDA optimization techniques. A new method to implement the NMS algorithm is proposed to improve the algorithm efficiency on the GPU.