cover.jpg

Choosing and Using Statistics

A Biologist’s Guide

Calvin Dytham

Department of Biology, University of York

Third Edition

Image

Preface

My aim was to produce a statistics book with two characteristics: to assume that the reader is using a computer to analyse data and to contain absolutely no equations.

This is a handbook for biologists who want to process their data through a statistical package on the computer, to select the most appropriate methods and extract the important information from the, often confusing, output that is produced. It is aimed, primarily, at undergraduates and masters students in the biological sciences who have to use statistics in practical classes and projects. Such users of statistics don’t have to understand exactly how the test works or how to do the actual calculations. These things are not covered in this book as there are more than enough books providing such information already. What is important is that the right statistical test is used and the right inferences made from the output of the test. An extensive key to statistical tests is included for the former and the bulk of the book is made up of descriptions of how to carry out the tests to address the latter.

In several years of teaching statistics to biology students it is clear to me that most students don’t really care how or why the test works. They do care a great deal that they are using an appropriate test and interpreting the results properly. I think that this is a fair aim to have for occasional users of statistics. Of course, anyone going on to use statistics frequently should become familiar with the way that calculations manipulate the data to produce the output as this will give a better understanding of the test.

If this book has a message it is this: think about the statistics before you collect the data! So many times I have seen rather distraught students unable to analyse their precious data because the experimental design they used was inappropriate. On such occasions I try to find a compromise test that will make the best of a bad job but this often leads to a weaker conclusion than might have been possible if more forethought had been applied from the outset. There is no doubt that if experiments or sampling strategies are designed with the statistics in mind better science will result.

Statistics are often seen by students as the ‘thing you must do to data at the end’. Please try to avoid falling into this trap yourself. Thought experiments producing dummy data are a good way to try out experimental designs and are much less labour-intensive than real ones!

Although there are almost no equations in this book I’m afraid there was no way to totally avoid statistical jargon. To ease the pain somewhat, an extensive Glossary and key to symbols are included. So when you are navigating your way through the key to choosing a test you should look up any words you don’t understand.

In this book I have given extensive instructions for the use of four commonly encountered software packages: SPSS, R, Excel and MINITAB. However, the key to choosing a statistical test is not at all package-specific, so if you use a software package other than the four I focus on or if you are using a calculator you will still be able to get a good deal out of this book.

If every sample gave the same result there would be no need for statistics. However, all aspects of biology are filled with variation. It is statistics that can be used to penetrate the haze of experimental error and the inherent variability of the natural world to reach the underlying causes and processes at work. So, try not to hate statistics, they are merely a tool that, when used wisely and properly, can make the life of a biologist much simpler and give conclusions a sound basis.

The third edition

In the 8 years since I wrote the second edition of this book there have, of course, been several new versions of the software produced. I have received many comments about the previous editions and I am grateful for the many suggestions on how to improve the text and coverage. Requests to add further statistical packages have been the most common suggestion for change. There was surprisingly little consensus on the packages to add for the second edition, but since 2000 the freely available, and very powerful, package R has become extremely widely used so I have added that to the mix this time.

How to use this book

This is definitely not a book that should be read from cover to cover. It is a book to refer to when you need assistance with statistical analysis, either when choosing an appropriate test or when carrying it out. The basics of statistical analysis and experimental design are covered briefly but those sections are intended mostly as a revision aid, or to outline of some of the more important concepts. The reviews of other statistics books may help you choose those that are most appropriate for you if you want or need more details.

The heart of the book is the key. The rest of the book hinges on the key, explaining how to carry out the tests, giving assistance with the statistical terms in the Glossary or giving tips on the use of computers and packages.

Packages used

MINITAB® version 15, MINITAB Inc.

SPSS® versions 16 and 17, SPSS Inc.

Excel version 2007 and 2008 for Mac, Microsoft Corporation

Running on:

Windows® versions XP, 2000, 7 and Vista, Microsoft Corporation

Mac OS 10, Apple Inc.

Example data

In the spirit of dummy data collection, all example data used throughout this book have been fabricated. Any similarity to data alive or dead is purely coincidental.

Acknowledgements for the first edition

Thanks to Sheena McNamee for support during the writing process, to Andrea Gillmeister and two anonymous reviewers for commenting on an early version of the manuscript and to Terry Crawford, Jo Dunn, David Murrell and Josephine Pithon for recommending and lending various books. Thanks also to Ian Sherman and Susan Sternberg at Blackwell and to many of my colleagues who told me that the general idea of a book like this was a sound one. Finally, I would especially like to thank the students at the University of York, UK, who brought me the problems that provided the inspiration for this book.

Acknowledgements for the second edition

Thanks to all the many people who contacted me with suggestions and comments about the first edition. I hope you can see that many of the corrections and improvements have come directly from you. Five anonymous reviewers provided many useful comments about the proposal for a second edition. Thanks to Sarah Shannon, Cee Brandston, Katrina McCallum and many others at Blackwell for seeing this book through and especially for producing a second superb and striking cover. S’Albufera Natural Parc and Nick Riddiford provided a very convenient bolt-hole for writing. Once again, I give special thanks to Sheena and to my colleagues, PhD students and undergraduate students at the University of York. Finally, thanks to everyone on the MRes EEM course over the last 4 years.

Acknowledgements for the third edition

It’s been thanks to the pushing of Ward Cooper at Wiley-Blackwell and Sheena McNamee that this third edition has seen the light of day. Thanks to Emma Rand, Olivier Missa and Frank Schurr for encouraging me to enter the brave new world of R. Thanks to Nik Prowse for guiding me through the final editing.

Calvin Dytham,

York 1998, 2002 and 2010

1
Eight steps to successful data analysis

This is a very simple sequence that, if you follow it, will integrate the statistics you use into the process of scientific investigation. As I make clear here, statistical tests should be considered very early in the process and not left until the end.

  1. 1 Decide what you are interested in.
  2. 2 Formulate a hypothesis or several hypotheses (see Chapters 2 and 3 for guidance).
  3. 3 Design the experiment, manipulation or sampling routine that will allow you to test the hypotheses (see Chapters 2 and 4 for some hints on how to go about this).
  4. 4 Collect dummy data (i.e. make up approximate values based on what you expect to obtain). The collection of ‘dummy data’ may seem strange but it will convert the proposed experimental design or sampling routine into something more tangible. The process can often expose flaws or weaknesses in the datacollection routine that will save a huge amount of time and effort.
  5. 5 Use the key presented in Chapter 3 to guide you towards the appropriate test or tests.
  6. 6Carry out the test(s) using the dummy data. (Chapters 6–9 will show you how to input the data, use the statistical packages and interpret the output.)
  7. 7 If there are problems go back to step 3 (or 2); otherwise, proceed to the collection of real data.
  8. 8 Carry out the test(s) using the real data. Report the findings and/or return to step 2.

I implore you to use this sequence. I have seen countless students who have spent a long time and a lot of effort collecting data only to find that the experimental or sampling design was not quite right. The test they are forced to use is much less powerful than one they could have used with only a slight change in the experimental design. This sort experience tends to turn people away from statistics and become ‘scared’ of them. This is a great shame as statistics are a hugely useful and vital tool in science.

The rest of the book follows this eight-step process but you should use it for guidance and advice when you become unsure of what to do.

2
The basics

The aim of this chapter is to introduce, in rather broad terms, some of the recurring concepts of data collection and analysis. Everything introduced here is covered at greater length in later chapters and certainly in the many statistics textbooks that aim to introduce statistical theory and experimental design to scientists.

The key to statistical tests in the next chapter assumes that you are familiar with most of the basic concepts introduced here.

Observations

These are the raw material of statistics and can include anything recorded as part of an investigation. They can be on any scale from a simple ‘raining or not raining’ dichotomy to a very sophisticated and precise analysis of nutrient concentrations. The type of observations recorded will have a great bearing on the type of statistical tests that are appropriate.

Observations can be simply divided into three types: categorical where the observations can be in a limited number of categories which have no obvious scale (e.g. ‘oak’, ‘ash’, ‘elm’); discrete where there is a real scale but not all values are possible (e.g. ‘number of eggs in a nest’ or ‘number of species in a sample’) and continuous where any value is theoretically possible, only restricted by the measuring device (e.g. lengths, concentrations).

Different types of observations are considered in more detail in Chapter 5.

Hypothesis testing

The cornerstone of scientific analysis is hypothesis testing. The concept is rather simple: almost every time a statistical test is carried out it is testing the probability that a hypothesis is correct. If the probability is small then the hypothesis is deemed to be untrue and it is rejected in favour of an alternative. This is done in what seems to be a rather upside down way as the test is always of what is called the null hypothesis rather than the more interesting hypothesis. The null hypothesis is the hypothesis that nothing is going on (it is often labelled as H0). For example, if the weights of bulbs for two cultivars of daffodils were being investigated, the null hypothesis would be that there is no weight difference between cultivars: ‘the weights of the two groups of bulbs are the same’ or, more correctly, ‘the two groups of bulbs are samples from a larger population with the same distribution’. A statistical test is carried out to find out how likely that null hypothesis is to be true. If we decide to reject the null hypothesis we must accept the alternative, more interesting, hypothesis (H1) that: ‘the weights of bulbs for the two cultivars are different’ or, more correctly, that ‘the groups are samples from populations with different distributions’.

P-values

The P-value is the bottom line of most statistical tests. (Incidentally, you may come across it written in upper or lower case, italic or not: e.g. P value, P-value, p value or p-value.) It is the probability of seeing data this extreme or more extreme if the null hypothesis is true. So if a P-value is given as 0.06 it indicates that you have a 6% chance of seeing data like this if the null hypothesis is true. In biology it is usual to take a value of 0.05 or 5% as the critical level for the rejection of a hypothesis. This means that providing a hypothesis has a less than one in 20 chance of being true we reject it. As it is the null hypothesis that is nearly always being tested we are always looking for low P-values to reject this hypothesis and accept the more interesting alternative hypothesis.

Clearly the smaller the P-value the more confident we can be in the conclusions drawn from it. A P-value of 0.0001 indicates that if the null hypothesis is true the chance of seeing data as extreme or more extreme than that being tested is one in 10 000. This is much more convincing than a marginal P = 0.049.

P-values and the types of errors that are implicitly accepted by their use are considered further in Chapter 4.

Sampling

Observations have to be collected in some way. This process of data acquisition is called sampling. Although there are almost as many different methods that can be used for sampling as there are possible things to sample, there are some general rules. One of the most obvious is that a large number of observations is usually better than a small number. Balanced sampling is also important (i.e. when comparing two groups take the same number of observations from each group).

Most statistical tests assume that samples are taken at random. This sounds easy but is actually quite difficult to achieve. For example, if you are sampling beetles from pit-fall traps the sample may seem totally random but in fact is quite biased towards those species that move around the most and fail to avoid the traps. Another common bias is to chose a point at random and then measure the nearest individual to that point, assuming that this will produce a random sample. It will not be random at all as isolated individuals and those at the edges of clumps are more likely to be selected than those in the middle. There are methods available to reduce problems associated with non-random sampling but the first step is to be aware of the problem.

A further assumption of sampling is that individuals are either only measured once or they are all sampled on several occasions. This assumption is often violated if, for example, the same site is visited on two occasions and the same individuals or clones are inadvertently remeasured.

The sets of observations collected are called variables. A variable can be almost anything it is possible to record as long as different individuals can be assigned different values.

Some of the problems of sampling are considered in Chapter 4.

Experiments

In biology many investigations use experiments of some sort. An experiment occurs when anything is altered or controlled by the investigator. For example, an investigation into the effect of fertilizer on plant growth will use a control plot (or several control plots) where there is no fertilizer added and then one or more plots where fertilizer has been added at known concentrations set by the investigators. In this way the effect of fertilizer can be determined by comparison of the different concentrations of fertilizer. The condition being controlled (e.g. fertilizer) is usually called a factor and the different levels used called treatments or factor levels (e.g. concentrations of fertilizer). The design of this experiment will be determined by the hypothesis or hypotheses being investigated. If the effect of the fertilizer on a particular plant is of interest then perhaps a range of different soil types might be used with and without fertilizer. If the effect on plants in general is of interest then an experiment using a variety of plants is required, either in isolation or together. If the optimum fertilizer treatment is required then a range of concentrations will be applied and a cost-benefit analysis carried out.

More details and strategies for experimental design are considered in Chapter 4.

Statistics

In general, statistics are the results of manipulation of observations to produce a single, or small number of results. There are various categories of statistics depending on the type of summary required. Here I divide statistics into four categories.

Descriptive statistics

The simplest statistics are summaries of data sets. Simple summary statistics are easy to understand but should not be overlooked. These are not usually considered to be statistics but are in fact extremely useful for data investigation. The most widely used are measures of the ‘location’ of a set of numbers such as the mean or median. Then there are measures of the ‘spread’ of the data, such as the standard deviation. Choice of appropriate descriptive statistic and the best way of displaying the results are considered in Chapters 5 and 6.

Tests of difference

A familiar question in any field of investigation is going to be something like ‘is this group different from that group?’. A question of this kind can then be turned into a null hypothesis with a form: ‘this group and that group are not different’. To answer this question, and test the null hypothesis, a statistical test of difference is required. There are many tests that all seem to answer the same type of question but each is appropriate when certain types of data are being considered. After the simple comparison of two groups there are extensions to comparisons of more than two groups and then to tests involving more than one way of dividing the individuals into groups. For example, individuals could be assigned to two groups by sex and also into groups depending on whether they had been given a drug or not. This could be considered as four groups or as what is known as a factorial test, where there are two factors, ‘sex’ and ‘drug’, with all combinations of the levels of the two factors being measured in some way. Factorial designs can become very complicated but they are very powerful and can expose subtleties in the way the factors interact that can never be found though investigation of the data using one factor at a time.

Tests of difference can also be used to compare variables with known distributions. These can be statistical distributions or derived from theory. Chapter 7 considers tests of difference in detail.

Tests of relationships

Another familiar question that arises in scientific investigation is in the form ‘is A associated with B?’. For example, ‘is fat intake related to blood pressure?’. This type of question should then be turned into a null hypothesis that ‘A is not associated with B’ and then tested using one of a variety of statistical tests. As with tests of difference there are a many tests that seem to address the same type of problem, but again each is appropriate for different types of data.

Test of relationships fall into two groups, called correlation and regression, depending on the type of hypothesis being investigated. Correlation is a test to measure the degree to which one set of data varies with another: it does not imply that there is any cause-and-effect relationship. Regression is used to fit a relationship between two variables such that one can be predicted from the other. This does imply a cause-and-effect relationship or at least an implication that one of the variables is a ‘response’ in some way. So in the investigation of fat intake and blood pressure a strong positive correlation between the two shows an association but does not show cause and effect. If a regression is used and there is a significant positive regression line, this would imply that blood pressure can be predicted using fat intake or, if the regression uses the fat intake as the ‘response’, that fat intake can be predicted from blood pressure.

There are many additional techniques that can be employed to consider the relationships between more than two sets of data. Tests of relationships are described in Chapter 8.

Tests for data investigation

A whole range of tests is available to help investigators explore large data sets. Unlike the tests considered above, data investigation need not have a hypothesis for testing. For example, in a study of the morphology of fish there may be many fin measures from a range of species and sites that offer far too many potential hypotheses for investigation. In this case the application of a multivariate technique may show up relationships between individuals, help assign unknown specimens to categories or just suggest which hypotheses are worth further consideration.

A few of the many different techniques available are considered in Chapter 9.