New machine learning model offers simple solution to crop yield predictions

Reading Time: 2 minutes

Published: October 2, 2024

A new machine-learning model for predicting crop yield using environmental data and genetic information could lead to development of higher-performing crop varieties.

A new machine-learning model for predicting crop yield using environmental data and genetic information could lead to development of higher-performing crop varieties.

Igor Fernandes, a statistics and analytics master’s student at the University of Arkansas, entered agriculture studies with a data science background and some exposure to agronomy as an undergraduate assistant for Embrapa, the Brazilian Agricultural Research Corporation.

He developed a novel approach to forecast how crop varieties will perform in the field.

Read Also

Barry Senft is stepping down as CEO of Seeds Canada after four years.. Photo: John Greig

Senft to step down as CEO of Seeds Canada

Barry Senft, the founding CEO of the five-year-old Seeds Canada organization is stepping down as of January 2026.

Sam Fernandes, Igor’s advisor and an assistant professor of agricultural statistics and quantitative genetics with the Arkansas Agricultural Experiment Station, said Igor had an idea “that was not at all what we would use in genetics, and it was just surprising that it worked well.”

Igor Fernandes’ model, which focused on environmental data, led him to second place in this year’s international Genome to Fields competition.

The competition entry showed environmental data alone worked better than expected at predicting crop yield, and researchers saw an opportunity to build a comprehensive study that compared the novel approach to established prediction models used in genomic breeding.

Genomic breeding, a process of screening thousands of candidates for field trials based on DNA alone, can save time and resources needed to develop a new plant variety. An important part of genomic breeding involves genomic prediction to estimate a plant’s yield using its DNA.

Adding information into a model on how that plant would interact with environmental conditions increases the accuracy of the genomic prediction. The practice is called enviromics. Still, there is no consensus on the best machine learning approach to combine environmental and genetic data.

The study used the same data on corn plots from the Genomes to Fields Initiative that were used in the competition, but the researchers adjusted inputs as genetic, environmental or a combination of both in “additive” and “multiplicative” manners. When including environmental and genetic data in a more straightforward “additive” manner, the prediction accuracy was better than the more complicated “multiplicative” manner.

The simpler model took less time for the computer to process, and the mean prediction accuracy improved seven per cent over the established model. The experiment was validated in three scenarios typically encountered in plant breeding.

“One of the unique things that Igor did is how he processed the environmental data,” Sam Fernandes said. “There are fancier models that people can throw in all sorts of information. But what Igor did is a simple yet efficient way of combining the genetic and environmental data using feature engineering to process the information and get a summary of variables that is more informative.”

Collectively, the researchers say the results are promising, especially with increasing interest in combining environmental features and genetic data for prediction purposes. Their immediate goal is to apply it to increase the capability of screening genotypes for field trials.

explore

Stories from our other publications