Fellowship+R

The group were unsure of what the p value meant when relating it back to the null hypothesis and another lecture on interpreting chi squared results would definitely be useful.


 * || Black || Red || Orange || Yellow || Green ||
 * Grass A || 6 || 0 || 6 || 7 || 1 ||
 * Grass B || 5 || 9 || 1 || 3 || 2 ||
 * Grass C || 5 || 4 || 4 || 5 || 2 ||
 * Wood A || 7 || 3 || 3 || 6 || 1 ||
 * Wood B || 5 || 3 || 4 || 5 || 3 ||
 * Wood C || 6 || 0 || 6 || 5 || 3 ||

In this table Yellow and Green were not grouped together. This posed a problem as the degrees of freedom (20) is greater than the sum of green jelly babies so the statistical test would not be accurate.

Grouping Yellow and Green solves this problem by decreasing the degrees of freedom to 15 and discounting the problem of the frequency of green jelly babies.
 * || Black || Red || Orange |||| Yellow/Green ||
 * Grass A || 6 || 0 || 6 || 8 ||  ||
 * Grass B || 5 || 9 || 1 || 5 ||  ||
 * Grass C || 5 || 4 || 4 || 7 ||  ||
 * Wood A || 7 || 3 || 3 || 7 ||  ||
 * Wood B || 5 || 3 || 4 || 8 ||  ||
 * Wood C || 6 || 0 || 6 || 8 ||  ||

The null hypothesis is that there will be no significant variation from the expected values i.e. no significant difference in the population frequencies between samples.


 * __ Is variation present? __**

The group decided to compare the proportions of polymorphisms across a sample. If the p value is 0.05 then there is a 5% chance of getting a chi squared value equal to or greater if the null hypothesis is true. The results will be significant if the p value is less than or equal to 0.05.

Below are the results across all the samples, in every sample the p value is greater than 0.05. It is unlikely that the null hypothesis is true and that there is a difference between the values. This means the results suggest sampling error, but the same result could also be produced through the action of genetic drift, selection and gene flow which would prompt further investigation as variation is present. If there is a low P value then there is not much deviation from the expected values between the polymorphisms.


 * GRASS A || Black || Red || Orange || Y/G || Totals ||
 * observed || 6 || 0 || 6 || 8 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 1 || -5 || 1 || 3 ||  ||
 * (O - E)^2 || 1 || 25 || 1 || 9 || chi^2 = ||
 * ((O - E)^2)/E || 0.2 || 5 || 0.2 || 1.8 || 7.2 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.0658 ||


 * GRASS B || Black || Red || Orange || Y/G || Totals ||
 * observed || 5 || 9 || 1 || 5 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 0 || 4 || -4 || 0 ||  ||
 * (O - E)^2 || 0 || 16 || 16 || 0 || chi^2 = ||
 * ((O - E)^2)/E || 0 || 3.2 || 3.2 || 0 || 6.4 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.0937 ||


 * GRASS C || Black || Red || Orange || Y/G || Totals ||
 * observed || 5 || 4 || 4 || 7 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 0 || -1 || -1 || 2 ||  ||
 * (O - E)^2 || 0 || 1 || 1 || 4 || chi^2 = ||
 * ((O - E)^2)/E || 0 || 0.2 || 0.2 || 0.8 || 1.2 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.753 ||


 * WOOD A || Black || Red || Orange || Y/G || Totals ||
 * observed || 7 || 3 || 3 || 7 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 2 || -2 || -2 || 2 ||  ||
 * (O - E)^2 || 4 || 4 || 4 || 4 || chi^2 = ||
 * ((O - E)^2)/E || 0.8 || 0.8 || 0.8 || 0.8 || 3.2 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.3618 ||


 * WOOD B || Black || Red || Orange || Y/G || Totals ||
 * observed || 5 || 3 || 4 || 8 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 0 || -2 || -1 || 3 ||  ||
 * (O - E)^2 || 0 || 4 || 1 || 9 || chi^2 = ||
 * ((O - E)^2)/E || 0 || 0.8 || 0.2 || 1.8 || 2.8 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.4235 ||

None of the p values above are below 0.05, therefore it can be suggested that there is deviation and some variation between the samples. Whether this is due to genetic drift, sampling error, gene flow or selection is unclear at this point.
 * WOOD C || Black || Red || Orange || Y/G || Totals ||
 * observed || 6 || 0 || 6 || 8 || 20 ||
 * expected || 5 || 5 || 5 || 5 || 20 ||
 * O - E || 1 || -5 || 1 || 3 ||  ||
 * (O - E)^2 || 1 || 25 || 1 || 9 || chi^2 = ||
 * ((O - E)^2)/E || 0.2 || 5 || 0.2 || 1.8 || 7.2 ||
 * ||  ||   ||   ||   || p-value = ||
 * ||  ||   ||   ||   || 0.0658 ||

__** Is genetic drift present? **__

If there is selection or gene flow taking place the variation of each polymorphism between the separate samples will be low and a P value of <0.05, the null hypothesis that there is no significant difference between the values is more likely to be true. If genetic drift is the predominant factor in causing variation we would expect P values >0.05, however this could also be due to sampling error.

The results for the grass environment are significant as the p value is under 0.05, this suggests the results are not due to sampling error and more likely to be due to selection or gene flow.
 * **Separate environments GRASS** ||  ||   ||   ||
 * || Black || Red || Orange || Yellow/Green || Totals ||
 * EXPECTED || 5.333333 || 4.333333 || 3.666667 || 6.666667 ||  ||
 * O - E || 0.666667 || -4.33333 || 2.333333 || 1.333333 ||  ||
 * || -0.33333 || 4.666667 || -2.66667 || -1.66667 ||  ||
 * || -0.33333 || -0.33333 || 0.333333 || 0.333333 ||  ||
 * (O - E)^2 || 0.444444 || 18.77778 || 5.444444 || 1.777778 ||  ||
 * || 0.111111 || 21.77778 || 7.111111 || 2.777778 ||  ||
 * || 0.111111 || 0.111111 || 0.111111 || 0.111111 ||  ||
 * ((O - E)^2)/E || 0.083333 || 4.333333 || 1.484848 || 0.266667 ||  ||
 * || 0.020833 || 5.025641 || 1.939394 || 0.416667 ||  ||
 * || 0.020833 || 0.025641 || 0.030303 || 0.016667 ||  ||
 * DoF = 6 ||  ||   ||   ||   ||   ||
 * Chi ^ 2 || 0.125 || 9.384615 || 3.454545 || 0.7 || 13.664 ||
 * P value || 0.94 || 0.0091 || 0.1777 || 0.7046 || 0.0336 ||
 * **Separate environments WOOD** ||  ||   ||   ||
 * || Black || Red || Orange || Yellow/Green ||  ||
 * EXPECTED || 6 || 2 || 4.333333 || 7.666667 ||  ||
 * O - E || 1 || 1 || -1.33333 || -0.66667 ||  ||
 * || -1 || 1 || -0.33333 || 0.333333 ||  ||
 * || 0 || -2 || 1.666667 || 0.333333 ||  ||
 * (O - E)^2 || 1 || 1 || 1.777778 || 0.444444 ||  ||
 * || 1 || 1 || 0.111111 || 0.111111 ||  ||
 * || 0 || 4 || 2.777778 || 0.111111 ||  ||
 * ((O - E)^2)/E || 0.166667 || 0.5 || 0.410256 || 0.057971 ||  ||
 * || 0.166667 || 0.5 || 0.025641 || 0.014493 ||  ||
 * || 0 || 2 || 0.641026 || 0.014493 ||  ||
 * DoF = 6 ||  ||   ||   ||   ||   ||
 * Chi^2 || 0.333333 || 3 || 1.076923 || 0.086957 || 4.497 ||
 * P value || 0.85 || 0.22 || 0.58 || 0.96 || 0.6097 ||
 * Chi^2 || 0.333333 || 3 || 1.076923 || 0.086957 || 4.497 ||
 * P value || 0.85 || 0.22 || 0.58 || 0.96 || 0.6097 ||

The results for wood may be non significant because the total number of red jelly babies was equal to the degrees of freedom so the statistical test may not be completely accurate. Also the result may be down to sampling error, or genetic drift having a greater effect on polymorphisms than selection.

__** Is it gene flow? **__ It has been found that the environment of grass could be experiencing selection or gene flow in a greater amount than genetic drift. The next table shows p-values for the averages of the raw data in each environment. The idea behind it is by finding a significant difference between the values of the two environments this could suggest selection is taking place. This could also show a reduced role of gene flow (if gene flow were taking place it would be expected that the values would be similar). The null hypothesis is that there is no significant variation from the expected values.

This P value is very high, this suggests the results are not significant and it is likely the results are due to sampling error (or genetic drift). This means there is large variation from the expected values and gene flow is unlikely to be having an effect (as this would have produced a low p value).
 * Average Grass || 5.333333 || 4.333333 || 3.666667 || 6.666667 ||  ||
 * Average Wood || 6 || 2 || 4.333333 || 7.666667 ||  ||
 * **Comparing average of different environments** ||  ||   ||   || Totals ||
 * || Black || Red || Orange || Yellow/Green ||  ||
 * EXPECTED || 5.666667 || 3.166667 || 4 || 7.166667 ||  ||
 * O-E || -0.33333 || 1.166667 || -0.33333 || -0.5 ||  ||
 * || 0.333333 || -1.16667 || 0.333333 || 0.5 ||  ||
 * (O-E)^2 || 0.111111 || 1.361111 || 0.111111 || 0.25 ||  ||
 * || 0.111111 || 1.361111 || 0.111111 || 0.25 ||  ||
 * ((O - E)^2)/E || 0.019608 || 0.429825 || 0.027778 || 0.034884 ||  ||
 * || 0.019608 || 0.429825 || 0.027778 || 0.034884 ||  ||
 * DoF = 3 ||  ||   ||   ||   ||   ||
 * Chi ^ 2 || 0.039216 || 0.859649 || 0.055556 || 0.069767 || 1.02395 ||
 * P value || 0.843 || 0.3538 || 0.8136 || 0.7916 || 0.7954 ||
 * || 0.019608 || 0.429825 || 0.027778 || 0.034884 ||  ||
 * DoF = 3 ||  ||   ||   ||   ||   ||
 * Chi ^ 2 || 0.039216 || 0.859649 || 0.055556 || 0.069767 || 1.02395 ||
 * P value || 0.843 || 0.3538 || 0.8136 || 0.7916 || 0.7954 ||
 * P value || 0.843 || 0.3538 || 0.8136 || 0.7916 || 0.7954 ||

In conclusion, in all the samples there was variation in the polymorphisms which suggests that selection, gene flow or genetic drift is taking place (or sampling error). Furthermore, it is suggested that selection or gene flow is taking place on a greater scale in the grass environment compared to the wood environment where it is suggested that genetic drift or sampling error is having a greater effect. Finally, there is large variation between the environments which suggests either selection, genetic drift or sampling error is taking place. Through deduction, the differences in polymorphisms in grass are likely to be due to selection and the polymorphisms in wood are likely to be due to genetic drift or sampling error. This is because both areas have had gene flow discounted.
 * __ Conclusion __**