Accurate predictions of crop yields at the farm level are essential for improving agricultural productivity, enhancing food security, and supporting informed decision-making among smallholder farmers. However, conventional field assessments and simple statistical models are often time-consuming, limited in scope, and unable to capture complex interactions among climatic and soil factors. To address these challenges, this paper proposes a machine learning-based model for predicting the productivity of multiple crops, including maize, rice, and beans, using multi-source farm-level data from Tanzania. The dataset integrates climate variables such as temperature and rainfall, soil type, farm size, and crop type. Four ensemble learning models, namely Random Forest, Gradient Boosting, Extreme Gradient Boosting, and Extra Trees, were evaluated using an 80/20 train–test split on 9,897 farm-level records acquired from the Mbeya, Ruvuma, and Songwe regions between 2022 and 2024. Hyperparameter tuning with a fivefold cross-validation was applied to improve model generalization and reduce overfitting. Among the evaluated models, the Extra Trees ensemble achieved the highest performance, with a pooled multi-crop R² of 95%, while crop-specific R² values ranged from 79% to 81% for maize, rice, and beans. These findings demonstrate the potential of the proposed approach to support farm-level cultivation planning and climate adaptation decisions for smallholder farmers.
| Publisher | Elsevier |
| Geographic coverage | Tanzania |
| Originally published | 13 Mar 2026 |
| Knowledge service | Metadata | Global Food and Nutrition Security | Research and Innovation | Smallholder farmer |
| Digital Europa Thesaurus (DET) | ForecastingFarmcerealsRiceleguminous vegetableAgriculturemachine learningDatacrop productionclimate changeCrop yield |