Abstract: Stereo imaging is a powerful technique fordetermining the distance to objects using a pairs of cameraspaced apart. The extremely high computational requirements ofstereo vision limit application to non realtime applications wherehigh computing power is available. To overcome the limitation, we utilized the general strategy for parallelization of dense cost functions on Compute Unified Device Architecture (CUDA) with Graphic Processing Unit (GPU), especially for pervasive environment. The challenges of mapping a sequential stereo matching algorithm to a massively parallel thread environment are considered. Compared to the CPU counterpart, the processing speed of the stereo matching algorithm based on CUDA programming can be improved by about from 107-369 times.