Traffic forecasting has emerged as a core component of intelligent transportation systems. Traffic forecasting is crucial for public safety and resource optimization that can be modelled as saptio-temporal data. The uncertainty hinders spatio-temporal data prediction in time-series data, the existence of diverse data patterns and incompetence in accessing and accommodating spatial dynamics, causing inconsistent performance. Most recent traffic prediction works are based on deep learning models, which have applied CNN, RNN, encoder-decoder, graph-based and transformer. These approaches harness spatial and temporal features for prediction but fail to combine spatial and temporal dynamics together with generalization and high robust model capacity. In this work, we propose TrapNet, combining convolution and transformer, resulting in better generalization and higher model capacity. Convolution captures the spatial dynamics by modelling the spatial features, and the Self-Attention in the transformer captures the temporal dynamics by modelling temporal features. TrapNet has been trained and evaluated on the PEMS-BAY traffic dataset, and it has been compared with existing machine learning and deep learning techniques. The proposed model achieves higher accuracy by 1.51%, 1.23%, 2.19% from best baselines in long-term traffic prediction.