The statistical analyses on the following pages are an attempt to take us beyond the realm of any single grower's anecdotal experience or subjective perception, and to add some scientific rigor to the analysis of the
significant influences on crop yield. There are many fields or
variables contained in the YOR database, and we are
applying powerful statistical analytic techniques to try to tease apart these possible influences on crop yield to determine what really matters and what doesn't.
"So in the early rounds of analysis, we do simple
correlations between one predictor at a time and crop yield, to see if the predictor seems to have any influence on crop yield. We do a large number of these correlations, one for each predictor, and remove from future consideration any variable that appears to have only a very weak correlation with crop yield"
Man do these guys know there math.. I cant beleve how much work it must have been to make this.
Here we can see that there is a fairly good correspondence between predicted and actual weight, indicating that the model is a good one. As the footnote in the graph indicates, the overall correlation between predicted and actual weight is .737 (compared with a maximum theoretical coefficient of 1.00.) If we square this number and then do some adjustments for our sample size, we arrive at a value of about .53. This means that we are explaining about 53% of the database's variation in crop yield using just these four predictors. And while some people might think this isn't a very good model, it actually is. In fact, it is rarely possible to explain anywhere near all of the variation in data such as those we are working with here. There are undoubtedly many subtle influences not captured in the YOR database. In addition, there is probably some error in the reporting of the data by growers.