Skip to main content
Knowledge4Policy
Knowledge for policy
Supporting policy with scientific evidence

We mobilise people and resources to create, curate, make sense of and use knowledge to inform policymaking across Europe.

  • Publication | 2026
Farm-level yield prediction for maize, rice, and beans in Tanzania using machine learning and multi-source agricultural data

Accurate predictions of crop yields at the farm level are essential for improving agricultural productivity, enhancing food security, and supporting informed decision-making among smallholder farmers. However, conventional field assessments and simple statistical models are often time-consuming, limited in scope, and unable to capture complex interactions among climatic and soil factors. To address these challenges, this paper proposes a machine learning-based model for predicting the productivity of multiple crops, including maize, rice, and beans, using multi-source farm-level data from Tanzania. The dataset integrates climate variables such as temperature and rainfall, soil type, farm size, and crop type. Four ensemble learning models, namely Random Forest, Gradient Boosting, Extreme Gradient Boosting, and Extra Trees, were evaluated using an 80/20 train–test split on 9,897 farm-level records acquired from the Mbeya, Ruvuma, and Songwe regions between 2022 and 2024. Hyperparameter tuning with a fivefold cross-validation was applied to improve model generalization and reduce overfitting. Among the evaluated models, the Extra Trees ensemble achieved the highest performance, with a pooled multi-crop R² of 95%, while crop-specific R² values ranged from 79% to 81% for maize, rice, and beans. These findings demonstrate the potential of the proposed approach to support farm-level cultivation planning and climate adaptation decisions for smallholder farmers.