Using Machine Learning to Predict Freight Mode Choice
Understanding how freight mode decisions are made is necessary for developing effective transport policies and forecasting future freight demand. Compared to traditional Multinomial Logit (MNL) models, machine learning methods offer the potential of improved predictive performance due to their ability to capture more complex, nonlinear relationships between factors influencing mode choice.
At Panteia, a recent project explored the use of machine learning to model freight mode choice in the European Union. More accurate predictions can lead to better forecasts and ultimately better-informed policy recommendations for managing freight transport and shifting goods to more sustainable modes.
Three machine learning algorithms (logistic regression, Random Forest, and XGBoost) were trained on aggregated EU freight flow data representing tonnes transported by road, rail, and inland waterway between regions, with explanatory factors such as cost, distance, commodity type, and regional characteristics. The models achieved overall accuracies between 89% and 92%. Performance was strongest for road, the majority mode, and lower for the less frequent inland waterway and rail classes, a common challenge with imbalanced datasets. As a result of this class imbalance, the F1-scores (the harmonic mean of precision and recall) ranged between 0.59-0.61 for inland waterway, 0.61-0.75 for rail, and 0.94-0.96 for road.
Previous studies have utilized only disaggregate shipment-level data, which contains detailed information that machine learning methods can leverage for high predictive accuracy. This research extends current knowledge by showing that models trained on aggregated data can also produce meaningful results.
Future work could focus on enhancing data quality (for example, including estimates for regions with sparse data), incorporating additional explanatory variables, and exploring hybrid modeling approaches that combine the interpretability of MNL with the predictive power of machine learning. With these developments, machine learning models could be applied to scenario analyses for evaluating the potential impacts of new transport policies, such as infrastructure investments or road pricing measures.
This research was part of a master’s thesis which can be found here.