ANOVA redefined – a fresh perspective

Analysis of variance, or ANOVA for short, is a classic topic in statistical data analysis. Has everything about it already been written? Perhaps. However, here we’ll focus on illustrating various concepts associated with it using appropriate graphs and diagrams. After all, a picture is worth a thousand words…
In ANOVA, we test the significance of the impact of factors on a certain numerical quantity that can change continuously, which we call the dependent variable. We assume that no factor under investigation behaves like the dependent variable; in other words, each factor can take only one of a few possible values. Therefore, the value a factor assumes either affects the average level of the dependent variable or does not, and if it doesn’t, we call the factor insignificant. The origins of ANOVA are typically traced back to field and agricultural experiments conducted over a century ago, and we often associate its development with this gentleman – does anyone still recognize him?

So let’s have an example question that can be answered by ANOVA.
Example Question:
How do irrigation level, soil type, and fertilization method affect the potato yield, expressed in quintals per hectare?

 Diagram showing the influence of various factors (irrigation, soil type, fertilization) on potato growth and development, in the context of yield.

Here we have three factors, and let’s say each combination of their values, such as:

medium irrigation + clay soil + fertilizer B

is applied to a certain number of separate plots. Besides these known factors, the yield is influenced by numerous other elements that are essentially beyond our control, and their combined effect is treated as random.

Whether in this or another scenario, the total variability of the dependent variable – its variance – is what we aim to break down into components originating from different sources. Hence the name “analysis of variance.” For potatoes, this might look as follows:

Pie chart showing the percentage influence of various factors (irrigation, soil, fertilization, randomness) on yield, with 'Randomness' as the dominant variable.

Naturally, we also want to determine the average level of the dependent variable for each factor level, provided the factor turns out to be significant. For instance, it could be the average yield for each type of fertilizer, considered either alone or in combination with soil type. ANOVA provides answers to all of this, as long as relevant assumptions are met and the correct relationships between factors are identified.

As shown in the above pie chart, such a relationship often appears as an interaction, where the effect of one factor level depends on the level of another factor. Evidence of an interaction is the “significant non-parallelism” in the average values’ progression – a concept that becomes clearer when viewing the following graph:

Line chart showing the interaction between three soil types (I, II, III) and four fertilizer types (A, B, C, D) on yield, with visible confidence intervals.

If there were no interaction, the lines above would be approximately parallel.

What if we applied different fertilizers depending on soil type – say, three per soil type? With three soil types, we’d have nine fertilizers in total, and knowing the fertilizer would immediately indicate which soil type it was applied to. In that case, we would be dealing with nested factors, as depicted in the following diagram:

Diagram showing three soil types (brown squares) with three different fertilizers (triangles F, S, M; N, U, P; E, L, H) nested within each, illustrating nested factors.

If measurements of the dependent variable are taken over time, we introduce the factor of repeated measures. This factor can enter the ANOVA analysis either on its own or alongside other, standard factors. It involves the need to account for another factor – the subject factor – which undergoes these repeated measurements. We usually don’t test its significance. More importantly, it is a random factor rather than a fixed one, as are most factors discussed so far.

What’s the difference? A factor is random when its levels in the study are randomly sampled from a larger set of possible levels. If the levels included in the analysis are precisely those intended for study – or simply all possible levels – the factor is fixed. This distinction can be visualized as follows:

In the case of potato yield, a random factor might be soil, if instead of selecting three specific soil types, we randomly sampled three planting locations from, say, dozens of locations with various soil conditions.

To summarize, we can consider interactions, nesting, fixed/random factors, repeated measures… and even nesting within interactions. There are no mathematical obstacles – only psychological or common-sense ones 😊

How should ANOVA look like for your data? We can guide you through these complexities and help tailor ANOVA specifically to your data through training sessions, consultations, or commissioned analysis.

Author: Paweł Januszewski, Senior Consultant in the Data Analysis Team.

Back to news

Do you have questions?

Get in Touch!

Our team is ready to help with any questions you might have. Just fill out the form, send us a message, or give us a call, and we’ll get back to you as soon as we can!

    Skip to content