This A to Z of Research Skills has been compiled to help undergraduate researchers understand some of the key concepts they need to be familiar with when producing research.
The A to Z is very much 'work in progress' and any suggested amendments or additions would be very welcome from students or from staff. Please email Pete Smith at prsmith at brookes dot ac dot uk
Analysis is the process of examining a set of data in order to detect any patterns or regularities that will address the research objective. Even the smallest set of data will consist of a great deal of unsorted information and analysis is sometimes referred to as ‘data reduction’ or ‘data editing’ in that it picks out the elements that are important to the topic. For example, in a study of recreation a host of different kinds of data may be collected to try and explain the reasons why some students take more exercise than others. If a set of quantitative data has been collected, then the analysis will involve the techniques of descriptive statistics and correlation. If a set of qualitative data has been collected, then the analysis should pick out key words from the text. These key words are referred to as codes and you should then examine the codes and see whether they can be sensibly grouped into categories that can be a given a more general name. It is sometimes appropriate to transform qualitative data in order to make use of quantitative techniques and this is known as content analysis. Whatever kind of data you are dealing with there will be a number of subjective decisions to be made about how the analysis is carried out and it is therefore important to think about its reliability. The direction of your analysis should always be guided by the conceptual framework and the results of the analysis can then be used to then modify that framework.
A baseline is a set of information against which you can gauge the result from an analysis of data. For example, a study on recreation at a university may have established the level of participation in sport by students by calculating, for example, some descriptive statistics. It would be useful to know whether this figure was high or low and so you should attempt to compare it with other universities which may have carried out their own research. The figures from these other universities would be called baseline figures and it is, of course, important that the figures have been collected in the same kind of way. A research project should try to collect as many different kinds of baseline figures as possible and another kind of baseline would be the recreation levels at the same university in previous years. It would also be useful to establish baseline data for other groups in the population besides students and likewise against baselines consisting of regional and national averages for levels of participation in sport.
A case is an individual for which data is collected on some of their characteristics or attributes. A case will always be one of a set of several cases so that, for example, in a study of student recreation a set of several students will be selected and data would be collected on some of their characteristics to do with recreational behaviour. A case will often be a human individual but may also be an object, a place, or an organisation so the recreational study could collect data on, for instance, a set of sports centres, a set of towns or a set of universities. If quantitative data is collected, then it is likely that a relatively large sample of cases will be chosen and the characteristics of the cases will be known as variables. If qualitative data is collected, then a much smaller sample of cases will probably be used. Whatever the type of data it is important that the cases are given a clear definition in order that it is quite clear which ones should be included. For example, in the study of student recreation the researcher needs to know whether to include part-time students as well as full-time students, graduate students as well as undergraduate students.
A case study involves carrying out an in-depth piece of research on a small number of cases and sometimes only one case. For example, a study of the level of participation in sport may have identified a particular group of students who have found difficulties in making use of university facilities. A research project could therefore set out to find the reasons for this by carrying out an in-depth examination of a small sample of the students involved. A case study will use every available kind of data and is sometimes called a mixed method approach because it will invariably include both quantitative data and qualitative data. The collective value of several kinds of data will be increased if they can somehow be linked together and this is sometimes called triangulation. For example, a questionnaire could be conducted and analysed and used to help generate the topics to be covered in some in-depth interviews.
Causality refers to the process where a cause leads to an effect. If quantitative data is being used, then it is said that a causal variable (for short, X) leads to an effect variable (for short, Y). For example, it is likely that the amount of study you do will influence the mark that you achieve in a module. Here the amount of study is the X variable and your module mark is the Y variable. It is important to correctly set out the logic of this relationship and label the variables accordingly so that X is followed by Y, as in the alphabet. To decide which is X and which is Y, think of which variable occurs first and which variable occurs second. Which is X and which is Y depends entirely on the two variables you are thinking about. So, for example, level of motivation may influence amount of study so here amount of study is now the Y with level of motivation the X. Several variables can be linked together to show a number of causal relationships and it is useful to show all of these in a flowchart, which is sometimes called a causal system. This flowchart or causal system can function as a conceptual framework where the variable at the conclusion features in the objective of the project. When some data has been collected an analysis can be carried out to establish whether there is any evidence to support the hypothesis that there is a relationship between an X and a Y. The technique of analysis used is correlation but great care should be taken to ensure that a strong correlation truly indicates a strong causal relationship.
A concept is an abstract and wide-ranging idea or notion which is considered to be a key element of the topic being researched. A topic will invariably contain several concepts and they will need to be organised into some kind of conceptual framework in order to supply the project with a sense of direction. In a study of students at university a concept of interest might be ‘the student experience'. Other examples of concepts are 'quality of life’, 'deprivation' and ‘accessibility’. A concept is often too vague and abstract to be observed or measured and must therefore be defined it is going to be used in research. When a concept is defined it is often found to have several dimensions or components so that, for instance, the student experience may have the dimensions of academic studies, residential accommodation and leisure. These components convey rather more about what might be meant by the student experience but they will themselves need further definition to produce variables for which quantitative data can be collected. In the collection of qualitative data not all researchers would subscribe to the advance definition of concepts and would instead use the data collection itself to develop a sense of how concepts are perceived. For example, in the investigation of the student experience the researcher would ask the students themselves to talk about how this experience is defined.
A conceptual framework (or theoretical framework or, simply, a theory) is a structure for systematically organising all of the concepts in the topic that is being investigated. Importantly, the framework will need to include the concept that is the focus of the research. So, for example, if the objective of your project is to explain amounts of student recreation, then this should feature in the framework together with the concepts which might influence amounts of recreation. A conceptual framework would here show how different concepts are linked together in a process of causality and should be based on a thorough literature review to represent current understanding of a particular subject area. The framework should be used to guide the collection of data and the subsequent analysis, which can then be used to modify the framework. Eventually the framework can help to place the results in a context when writing up the final conclusion.
A conclusion will always be the final chapter in a research report and will draw together everything that the study has achieved. Importantly, the chapter of conclusions should only cover material that has already been presented in greater detail in an earlier part of the report and should not introduce new ideas. The conclusion should address the original objective and should be placed in a context by making use of the conceptual framework. You should appraise the internal validity and the external validity of your findings and draw attention to the flaws that you are aware of but, at the same time, do not overlook the lessons you can pass on about how to do research. Wherever possible you should make suggestions on how any shortcomings could have been addressed and indicate further research which could be carried out. More often than not, the conclusions will be one of the first chapters that a reader will visit and you should strive to try and create a good first impression. However, the conclusions will be almost the last chapter you write so be sure to leave yourself time before the deadline to find the right words.
Correlation is a technique of analysis which is applied to a set of quantitative data to establish whether the values of two variables display a statistical connection. For example, a set of student data may contain the number of hours a week spent on academic study and the percentage module marks. If these two variables showed a correlation close to 1.0, then the variables would be very be strongly correlated. In other words, students who studied a lot would also have high marks and students who studied little would have low marks. If, on the other hand, the correlation is close to 0.0, then the variables would not be correlated and it would not be possible to describe any connection between study and marks. The interpretation of a correlation must be mindful of the process of causality and which variable is the cause X and which is the effect Y. In addition, any correlation between X and Y needs to be interpreted by writing a sentence which logically explains the connection. For example, by spending more time on academic study X a student will gain a better grasp of the subject and produce a submission which gains a high module mark Y.
Descriptive statistics are calculated in the analysis of a set of quantitative data. An example of a descriptive statistic is the average which could be calculated for a variable such as the age of a group of students. You would simply add up the ages of all the students and then divide by the number of students in the group. The kind of descriptive statistic that is appropriate depends on the scale of measurement of the variable. An average can be used for age because the variable is on a numerical scale. In contrast, gender is a category and it would make no sense to calculate the average of males and females. Instead you should count up how many students are male and female and then calculate the percentages of each. Descriptive statistics are calculated for one variable at a time and are produced in order to create a profile of the cases in a set of data. Thus, the age and gender of students would be only two of several variables for which descriptive statistics would be calculated in order to understand what kinds of students are in the group. Once descriptive statistics have been calculated it is usual to go on and use correlation to find out whether there are any connections between pairs of variables.
Attention to ethics is imperative in the collection of either quantitative data or qualitative data where human participants are directly or indirectly involved. For example, in a study of university recreation you might recruit some students to complete a questionnaire or to take part in an interview. Alternatively, you might invite them to undergo a trial using some gym equipment or carry out some observations in a sports centre. In all such kinds of data collection you must be mindful of the twin ethical principles of DO GOOD and DO NO HARM. The first of these principles requires that research needs to be conducted for some useful purpose and by a trained researcher. The second principle requires that the project should avoid subjecting participants to coercion, intrusion, deception, loss of confidentiality and anonymity, and physical or psychological harm. The ethical requirement is that any harm that participants may be exposed to can be justified by the good that may come from the research. In other words, the ethics of data collection is not judged against absolute standards but by examining whether any potential harm is warranted by the potential good. For example, an experiment may not be justified to research a new cosmetic but it might be justified to research a treatment for a serious illness. The ethics of data collection is assessed through a process of ethics review.
The external validity of a research study is the level of support which is provided for claims that the results can be generalised beyond the cases that have actually been investigated. For example, if a study of student recreation establishes that males spend longer on physical exercise than females, then this finding might be extended to students on other campuses, or to students in the future, or to other kinds of people who live in the town. Generalisations (or inferences) of this kind are an important feature of academic research but they can only be made if it can be demonstrated that the sample of cases that has been researched is representative of further cases that have not been researched. Thus, to generalise the findings on student recreation, it is necessary that the students where the study has been carried out are genuinely typical of students in other universities. Assessing external validity only makes logical sense if you have first assessed internal validity. Even if it is difficult to establish the external validity of your work you should always try to discuss this aspect of your research in the final conclusion.
See Qualitative data
Internal validity is the extent to which the evidence in a research report supports the findings that have been produced for the cases examined or, in other words, the cases that are internal to the research. For example, in a study of student recreation you may conclude that students with more than the average number of modules in their programme are likely to spend fewer hours on physical exercise. Establishing the internal validity of this finding involves asking some searching questions about the way in the finding was arrived at. Did the conceptual framework allow all of the possible explanations for levels of exercise to be tested? Were the definitions of the concepts and variables appropriate and consistent with established theory and practice? Was the quantitative data and/or qualitative data the appropriate way of recording student behaviour? Were the techniques of analysis appropriate and does their interpretation deliver a convincing result? If the internal validity of a project is sound, then the researcher should go on to examine its external validity.
A key informant is an individual who through their job or their position in a community can supply a good deal of information or data that would otherwise have to be gained from several different sources. For example, in a study of student recreation it may be useful to contact the person who manages the university sports centre as they would clearly be able to tell you a lot about the centre and the people who use it. It is likely that a key informant should be accessed at an early stage in a project to help guide the collection of both quantitative data and qualitative data. However, you should make good use of the key informant’s time and should certainly have worked up a conceptual framework and examined any published data that is available on your topic.
Knowledge consists of an understanding of how the real world works and the production of new knowledge is the over-arching objective of research. It is important to realise that facts or data alone do not constitute knowledge and the expression ‘theoretical knowledge’ is sometimes used to emphasise that research needs to establish that facts conform to some kind of theoretical pattern. A research project therefore needs to examine data in the light of a conceptual framework (or theoretical framework or theory) and such frameworks often consist of some process of causality explaining how things happen. For example, you may have accurate data on the levels of use of the university sports centre but knowledge only starts to be established when there is an understanding of the reasons for the different levels. It should be emphasised that this account of knowledge is very much derived from a scientific view research, which invariably makes use of quantitative data. Some social researchers have resisted this approach with the counter that human phenomena can only be explained with the use of qualitative data. The most extreme point of view is that each human phenomenon is unique and therefore cannot be made the subject of a general theory. For example, every individual student may have their own personal reasons for how much physical exercise they do. Furthermore, knowledge itself may be seen differently by different people. A student may feel that they do not have enough time for physical exercise because of the amount of work they have been set. A tutor may take the view that the student does not find the time for exercise because of poor time management skills. Thus, knowledge can be regarded as not what is true but, rather, what an individual believes to be true. These kinds of issues are very much the subject of epistemology, which is the study of what are the appropriate ways of establishing knowledge.
See Case study
An objective sets out what a research project is intending to find out in terms of establishing some new knowledge about the topic you are interested in. The most useful knowledge that can be established by research is to find the reasons that have given rise to a particular problem or concern or issue. The objective of research is probably best expressed as a question so that in a study of student recreation, for example, the objective question might be ‘What are the reasons for the amount of physical exercise taken by students?’ The reasons for students’ levels of physical exercise would combine for a process of causality and this process which would be developed as the conceptual framework to guide the collection of quantitative data and qualitative data. In this way the objective is the starting point of a research project and, if neglected, the study will lack focus and direction. It is important that an objective is seen to be worth pursuing and supported by authoritative sources, most obviously by previous research that has drawn attention to gaps in knowledge. In the conclusion of a project it is crucial that the objective is explicitly addressed together with indications of shortcomings in the work that require further research.
Qualitative data is usually in the form of written text but, more recently, images have been recognised as a useful way of recording information. Qualitative data contrasts with quantitative data in that it is usually collected for a small sample of cases for an in-depth examination. For example, in a study on student recreation you might speak to someone so that they can describe how and why they take part in physical exercise. The written text of this description would be keyed in to an electronic file for an analysis that picks out the essentials of what the person said. In small-scale research projects, the most common method of collecting qualitative data is through an interview, which importantly should allow the interviewee to supply information in their own words. In advance of the interview, an interview guide should be prepared consisting of a list of topics to be covered and the session should be recorded electronically and/or by taking notes. Interviews require a good deal of skill and tact on the part of the researcher and it is important that the work is subjected to a process of ethics review. More experienced researchers may use alternative methods of qualitative data collection such as focus group discussions and participant observation. When a range of different types of data are collected in the same project, it is said that a case study approach is being taken.
Quantitative data (also called statistical data or numerical data) contains numerical values that are recorded on a scale of measurement for a set of variables. Quantitative data contrasts with qualitative data in that it is usually collected for a large sample of cases for a breadth of mainly factual information. For example, a study on student recreation might collect some data on variables such as amount of exercise, amount of academic study, age, and attitude to exercise. This data would be assembled in an electronic file and examined using a technique of analysis such as descriptive statistics or correlation. Quantitative data can be collected from a published source, especially on the internet, as government or similar agencies routinely collect a vast amount of such data. These sources are extremely valuable in providing data over long periods of time and/or for large and numerous geographical areas. Published sources should always be thoroughly reviewed before you consider collecting your own data as this is time-consuming and requires you to consider questions of ethics. However, generating your own data may be necessary where you want information about particular small groups or about a narrow topic of interest. In small-scale research projects, the most common methods of collecting quantitative data is through a questionnaire or by direct observation. For example, you could create a questionnaire to ask students about their recreation and you could observe the the use of a sports centre by carrying out a pedestrian count of visitors to the centre. For both of these sorts of exercise a survey form should be created to record the information and a pilot survey should be carried out to check for problems. In particular, it is important that a survey is designed to maximise the response rate.
If a method of processing data is repeated a second time and produces a different result from the first time, then the method is said to be unreliable. Reliability is an issue in both the collection of quantitative data and qualitative data and in the analysis of data. For example, an interviewer might ask an interviewee how often they visit a sports centre. If a second and third interviewer asked the same question and got a quite different reply, then the information could not be relied on and this might be for a number of reasons. Perhaps the interviewee tends to be inconsistent or perhaps the interviewers presented the question in quite different ways. Likewise, if the text of an interview is summarised by several researchers, then they would need to produce similar results for the method of analysis to be reliable. It is, of course, rare for either data collection or analysis to be conducted more than once and it is therefore the responsibility of the researcher to be mindful of the possibility of unreliability and to report any concerns in their conclusions. If any part of the method of research is not reliable, then this will compromise the internal validity of the study.
A response rate is the proportion of people who take part in a method of quantitative data collection with those who do not take part being referred to as non-respondents. For example, if a questionnaire survey selects a sample of 120 students and 80 students reply (so there are 40 non-respondents), then the response rate would be 80 divided by 120 which is 0.75. This figure is usually multiplied by 100 to be expressed as a percentage so here it would be 75%. The problem with non-response is that the data will inevitably be biased so that the analysis will be distorted. It is therefore important to try and maximise the response rate by encouraging people to reply but without coercing them in ways that would contravene principles of ethics. A good response rate is more likely to be achieved if respondents are fully informed about the survey and empathise with the subject of the study. It is important that any bias from non-response is reported in the conclusion.
A sample is a set of cases which is selected from a much larger set of cases, often known as a population. For example, in a study of recreation you may select a sample of students from the much larger group of all the students on the university campus. The advantage of a sample is that it reduces the amount of data to be collected but then presents the problem of establishing whether the results of the analysis can be generalised to the complete population. If quantitative data is being collected, then it is usual to use random sampling to pick out a reasonably large sample of 30 or more cases. A large random sample should ensure it is representative of the population but the practical difficulty is that a low response rate then introduces bias. If qualitative data is being collected, then a much smaller sample is selected by using non-random sampling to hand-pick particular cases of interest.
A scale of measurement shows the range of values which a variable can display for the cases that feature in a set of quantitative data. There are three main types of scale: a numerical scale, an ordinal scale and a nominal scale. A numerical scale (or interval/ratio scale) is the kind of scale we use to measure everyday things using a ruler or a tape measure. In a study on recreation, the kinds of variables that we might record on a numerical scale could be the number of hours spent on physical exercise and the number of kilometres someone lives from a sports centre. For a variable on a numerical scale it makes sense to carry out analysis which involves simple arithmetic and allows the calculation of descriptive statistics such as an average. The other two scales of measurement, the ordinal scale and the nominal scale, both contain values that are categories. On an ordinal scale the categories can be ranked or ordered so, for example, someone’s attitude to physical exercise could be placed on an ordinal scale by asking them to select a category such as very interested, interested, neutral, disinterested, and very disinterested. This five-point opinion scale is quite common in research and is sometimes called a Likert scale. The scale allows the researcher to identify whether one person’s opinion of exercise is higher or lower than another person’s but does not allow the calculation of the amount by which their opinions differ. Thus arithmetic cannot be used on the values and the calculation of descriptive statistics is not appropriate. A nominal scale is where a case can be placed into one of a number of categories that cannot be rank ordered. If you asked someone to select their favourite sports centre, you could offer a list of centres from which they could choose. Each centre is here a category but it would not be possible to say that one person’s favourite centre was higher or lower than another. When values are recorded on a category scale (either ordinal or nominal) the analysis involves calculating the number and percentage of cases who selected each category. A variable is often derived by defining a broader concept and the definition should usually imply the scale of measurement.
See Case study
A variable features in a set of quantitative data and is a characteristic of a set of cases which is different from one case to another. For example, in a study of recreation it may be found that different students spend different amounts of time on physical recreation and so physical recreation is a variable. A variable is recorded on a scale of measurement and is often derived by defining a broader concept. If a concept produces several variables, then these variables are often referred to as indicators. The analysis of a set of data will usually begin with the calculation of descriptive statistics for each of the variables. Further analysis would involve the correlation of different pairs of variables and this requires a distinction to be made between the independent variable (or X-variable) and the dependent variable (or Y-variable).