Machine learning (ML) tools are a powerful means of predicting culvert conditions via learning from existing data records. However, the data available on such subjects are usually low in quantity and quality. This work illustrates the essential data preparation techniques, including data wrangling and feature engineering techniques, to enhance ML models’ performance and reliability using a variety of commonly used ML algorithms. This work finds the niche in handling insufficiently managed culvert records due to limited resources by using a series of data preparation techniques that can enhance the ML models’ prediction efficiency and performance. This study focuses on classification problems. Rich literature resources are available on the technical/theoretical practices of data wrangling and feature engineering. However, in the environmental engineering fields, there still is a knowledge gap that might limit the reliability of ML applied to pipeline condition prediction models. This can lead to low trust from utilities or unsuitable (inaccurate/inappropriate) maintenance routine scheduling due to overly optimistic prediction results. This study visually illustrates each process in data preparation to provide better transparency and understanding of the ML models to practitioners therefore more trust.