Grammar, Spelling, and Presentation Things Not To Do
Common Errors & How To Avoid Them
Hugo van den Berg
MOAC and Systems Biology Doctoral Training Centres
Warwick University
2011
The illustrated version may be downloaded here.
I. Composition
“Grammar don’t matter, do it?” The following is a list of elements of style, grammar and spelling, to which you must pay attention whenever you write something to hand in. You may object that this is unfair: that all that matters is the quality of your scientific insight, knowledge, and achievements, not your grasp of grammar or the elegance of your writing. Indeed, you may be more cynical and suggest that success in science does not even depend primarily on the quality of your work. Still, if you wish your written work to have lasting value and appeal to people in future generations whom you cannot influence by other means, you will have to learn to write with clarity. Moreover, it is easy to grossly overestimate how well you understand a given topic. Attempting to write with clarity is a useful reality check. You may object that language is just a set of conventions. True, and you must adhere to these conventions for the same reasons you observe the Highway Code. Remember that written text is a poor medium, compared to conversation. When speaking to a person, he or she can indicate that you need to explain something in more detail (or, on the contrary, that they know all about it so you can cut to the chase). But when you are writing you lack all these clues, and the elements of style that make up good prose constitute one way of making up for these shortcomings.
An asterisk (*) indicates that an incorrect sentence or clause follows. Error codes used when marking students’ work are indicated in bold face.
Agreement (Ag) The grammatical number of the verb must be the same as that of the corresponding noun:
* The pH of the P-phase and the N-phase were measured.
The pH of the P-phase and the N-phase was measured.
This is a typical example where the plurality of the intervening clause causes the writer to forget that it is the pH that was measured. Note that statistics, dynamics, genetics, proteomics, genomics are all singular. Data is actually the plural of datum, but is nowadays treated by almost all speakers as a singular mass term (which raises the question of what to call a single data item: a data point? an observation? say datum and you sound like the professor who ordered a martinus).
one bacterium two or more bacteria
one criterion two or more criteria
one phenomenon two or more phenomena
one ganglion two or more ganglia
Bacteria (the plural) might refer to several bacterial cells, or two or several bacterial species. The locution they for a singular person looks and sounds much better than he or she or (s)he, but in written text it is jarring because it looks too much like an agreement error and, moreover, many still view singular they as a colloquialism (q.v.).
Apostrophe (Apo) The apostrophe indicates relations of possession:
the enzyme’s = of the enzyme
the enzymes = more than one enzyme
the enzymes’ = of more than one enzyme
The rule is no different for acronyms and abbreviations:
the RNA’s = of the RNA
the RNAs = more than one RNA
the RNAs’ = of more than one RNA
although some writers feel that the plural of an acronym needs an apostrophe, too. Names ending in -s follow the same rules (Bridget Jones’s Diary, the Joneses’ new car), with the exception of time-honoured luminaries (Jesus’ teachings; and no, you do not belong to this set). The rule is different for its, which like the pronouns theirs and hers is a possessive without an apostrophe; it’s means it is or it has, but remember that you should not use contractions in academic writing. Irregular plural possessives are formed thus: children’s, people’s. Thus, men’s clothing is men’s wear, even though retail signage invariably reads *menswear.
Bastardized English (BE) Foreign students should take care to note that not everything they have come to believe is English actually is English. They are kindly requested not to refer to a data projector as a beamer, the latter being a car (“beemer”) manufactured by BMW. They should avoid non-idiomatic constructions such as
*This is how it looks like.
*We now have the possibility to obtain an asymptotic result.
The first of these must be the most common example of non-idiomatic English uttered in seminars; the second sounds like something a Ukrainian gangster might say (the grammar, not the maths). Such things can and do change, but this is best left to native speakers. German students should refrain from referring to their mobile phones as handies (or, even worse, Handy’s) and learn the idiomatic differences between to do and to make. Asians should take care to avoid incorrect locutions with about:
* Discuss about… * Mention about… *Analyse about… *A problem about…
Already and yet require a perfect past tense:
*The experiment was done already by Ed et al.
Do not use since where for is correct, as in:
*The protocol, due to Al et al., has been in use since ten years.
In each of the following pairs of sentences, the two juxtaposed sentences mean different things:
I like to express my gratitude. I would like to express my gratitude.
I am interesting. I am interested.
The ones on the left express distinctly oddball sentiments.
Colloquialisms (Coll) Strive to write as you speak (indeed, you will avoid most syntactical errors if you simply avoid writing things you would never say) but remember that written text lacks some of the advantages of interpersonal contact. In particular, written text can look odd, jejune, or strained when it is too informal:
*This leaves the RNA polymerase molecule in a bit of a bind.
*The law of large numbers is da bomb.
*Hopefully the octopus makes another attempt to copulate.
*Anaerobic bacteria are ideally suited to this sort of thing.
The first example may well be perfectly acceptable ten years from now, whereas the second example will be, like, so last decennium. While hopefully could be defended as an elliptic idiom, the trouble is that the third sentence can be read as imputing hope to octopi, which is probably not what is meant (although the sentence would be acceptable as part of a wildlife video narration). In a slightly informal expository text, an expression such as this sort of thing might not be out of place. Overused filler words (very, really, definitely, fairly, quite, nice) should be avoided, unless of course you really really mean it (to ban all such words outright would be pedantic; nonetheless, be careful). Mentally substitute the word damned for very whenever you want to write the latter and decide whether you really do feel that strongly about it.
*Separation of variables is a very important technique.
Separation of variables is a technique that often proves useful in practice.
Here, the need to eliminate very prompted a more precise and informative rephrasing. One reason why these words are overused in conversation, and look so sloppy in writing, is that each of them can mean many different things. If you are tempted to use such a word, try to think of a synonym with a less wide meaning. For instance, instead of really consider truly, genuinely, considerably; instead of very consider extremely, intensely, utmost, or, better yet, add a phrase that explains the very and renders it superfluous. Avoid dropping successfully in sentences reporting even the slightest of accomplishments.
Dangling elements (Dang) A dangler is a participle or gerund that is not linked to a corresponding noun:
*Considering the affinity, the mutant enzyme had a lower Km.
*Using these definitions, the key equation follows.
*Having spoken at various conferences, Diplodocus was a giant herbivore.
*When studying spiders, salticids are not easily mistaken for something else.
The -ing forms that start these sentences express an action not possible for the subjects of these sentences (enzyme, equation, Diplodocus, salticids, although intriguingly salticids do seem to be keen observers of fellow arachnids). While danglers could be defended as idiomatic elliptical constructions, they should be avoided in view of the comical effect they can have. Some students, vaguely remembering that -ing forms at the beginning of a sentence are associated with some sort of trouble, will seek the safety of the following construction:
*In terms of affinity, the mutant enzyme had a lower Km.
Whereas this is not strictly wrong, such clunky use of in terms of does not make for attractive prose and is symptomatic of lazy writing.
Green squiggles: The built-in grammar checker that puts green squiggles underneath some bits of your prose is usually right, but not always.
Heterogeneous co-ordination (Het) Nouns that are syntactically co-ordinate should belong to the same category of meaning:
*The Calvin cycle is more costly than heterotrophy.
*Genomics includes alternative splicing.
*Multiple signaling pathways control homeostasis.
Heterotrophy, as a mode of existence, should be compared to autotrophy (a key component of which is the biochemical pathway of the Calvin cycle).
Irrelevant material (Irr) Your essay, assignment write-up, or research report is there to get a point across (or a cluster of related points). Anything that detracts from this goal should not be there. Material that interrupts the flow of the text too much but should be there to serve the needs of some readers (long tables, detailed proofs) should be delegated to appendices. Above all, do not succumb to the feeling that you need to include material merely to showcase your knowledge or understanding (some lecturers do play “gotcha” but if this happens you can console yourself with the knowledge that they are poor teachers, and that you will do better when you become one).
Mixed construction (Mix) The construction of the sentence should not change in mid-stream:
*Meiosis is when the diploid genome becomes haploid.
Such errors occur very frequently and can easily be prevented simply by listening to what you have written.
Colon, semi-colon, comma, full stop (Punc) The colon is the “double dot” and is used when the following material elaborates the implications of the initial statement:
Substance X is a non-competitive inhibitor: it changes Vmax but not Km.
The semi-colon is the “dot-comma” and separates statements that are complementary and parallel. When in doubt, use a full stop (unless all your sentences end up being less than 10 words long, which will make you sound like a robot). The subject of your sentence does not end with a comma, even when it is a long subject complement clause:
*Integrative homeostatic dynamics models, have been used more recently.
If you are afraid the sentence becomes too difficult to parse without the comma, you should rephrase it. A comma is nowadays more and more used where one would traditionally expect a semi-colon or a full stop:
*Microarrays chart gene expression patterns, two systems are available.
This sounds as if the writer does not properly understand the logical connection between the two clauses. The comma should not be regarded as a one-stop shop for connecting any old pair of related thoughts:
*The mutant ligand is ineffective, it is unable bind the receptor.
Instead, use a full stop or an appropriate co-ordinating conjunction:
The mutant ligand is ineffective, because it is unable to bind the receptor.
Note that you could not use therefore instead of because in this last sentence. To develop a feeling where commas should go, read your sentences out loud and pause where you have written commas. You will hear superfluous commas as unnatural pauses. From this discussion you may get the impression that a full stop is your best bet when in doubt; this is not too bad as a general rule of thumb, as long as you remember that each sentence should be complete, with main verb and predicate, and that two many short sentences following upon one another result in a staccato “machine gun” effect.
A subordinate clause, which you would read out in a lower voice, should be flanked on both sides by commas:
The Van der Waals forces, named after one of the many brilliant Dutch physicists, play a key role in intermolecular interactions.
*We will explain with the aid of examples, the advantages of differential equations.
The last sentence requires either another comma (before with) or that the one that follows examples be left out. The word however has two meanings. In the meaning “be this as it may” (or simply “but”), however should be flanked by commas or, if it appears at the beginning of a sentence, it should be followed by a comma:
However, the second experiment showed an unexpected result.
The microarray analysis, however, did not confirm our hypothesis.
When however has its other meaning of “regardless of” it is not followed by a comma:
The neurone did not hyperpolarize, however much ATP was added.
Good writers instinctively form enumerations of threes: “this, that, and the other” where the comma preceding the “and” is the famous Oxford comma. It is good practice in scientific writings, because the individual items will themselves often contain an “and” so that the Oxford comma helps the reader to parse the list as intended by the author.
Full stops (periods) end sentences. Having a full stop where one should have a semi-colon is usually admissible, but a semi-colon for a full stop may look pretentious. Full stops also end abbreviations, but not those that end in the last letter of the unabbreviated word:
doctor: Dr doctors: Drs
mister: Mr misters: Messrs
A selection of Latin abbreviations that occur regularly in scientific writing:
cf. = compare (confer) It does not mean “see”.
c.q. = in which case (casu quo) It does not mean “or”.
c.s. = and fellows (cum suis)
et al. = and others (et alia) No period follows et which is a complete word.
etc. = and so on (et cetera) When speaking, avoid saying “egg seterah”.
e.g. = for example (exempli gratia)
i.e. = that is (id est) When speaking, try to say “that is” and not “Aye ee”.
q.v. = which one should look up (quod vide)
s.l. = in the broad sense (sensu lato)
s.s. = strictly speaking, in the narrow sense (sensu stricto)
viz. = namely (videlicet)
The abbreviation c.s. is to refer to a usually prominent person together with the people he or she works with or who follow him or her. The abbreviation et al. is now spelled et al without the full stop by many scientific journals. Sensu lato and sensu stricto are usually written out in full. Note that the word cum, when Latin, as in “kitchen-cum-dining area” is not to be rendered as “come”.
It is lazy writing to put etc. at the end of a list or enumeration when you have a vague feeling you may have forgotten one or more similar items (and are afraid, perhaps, that the reader will take you to task for it). Only use etc. if the reader can easily supply more examples:
Specialized training is required to treat zoo animals such as monkeys, elephants, crocodiles, tigers etc.
*The blood transports oxygen, nutrients, enzymes etc.
In the second sentence, there certainly are other blood components that have been left out, but they do not belong to a single category and the list is therefore not readily extendable. You can always use including or some phrase to similar effect to indicate the fact that the enumeration is not complete, nor meant to be. (Another legitimate use of etc. is to abbreviate a formula such as a list of honorifics, but you are unlikely to find yourself needing this in scientific writing.)
In the type-setting language LaTeX, input
i.\ e.\ or: i.e.\ et al.\
to obtain proper spacing following the full stop (omit the second backslash if the abbreviation actually ends the sentence, and note in passing that a single full stop will do the job of ending both abbreviation and sentence). Microsoft Word is hopeless at this sort of thing, so it is better to write i.e. than i. e. Also, you are not required to italicize these abbreviations, although you should feel free to do so.
Quotations & reference (Quo) Always attribute facts and findings to the source that provided them, both to pay tribute to the original contribution and to assign responsibility. (Of course, your source is in no way responsible for any misinterpretations on your part.) By all means use wikipedia, but always follow up references; if the wikipedia page does not provide them, find your own. Wikipedia cannot be trusted; its editing process means that pages often do not even concord with their own references! Fragments of text that you lift from your sources should be put between quotation marks and be attributed. If you fail to do this you are plagiarizing. Note that opening quotes are “sixes” and closing quotes are “nines”. In the last sentence the nines precede the full stop, whereas standard practice reverses this order; you should feel free to follow either convention. In scientific prose the need seldom arises to quote whole paragraphs (this is different for scholarly work). If you quote sentence fragments, make sure they are syntactically contiguous with the surrounding text. Single sixes and nines can be employed to distinguish the mention of a word from its use:
‘Boston’ has six letters, whereas Boston has six million inhabitants.
Alternatively, you can put the mentioned words in italics (Boston has six letters). Arguing from a strictly logical point of view, you would expect that offensive words become inoffensive when you mention them rather than use them, but this is not the case: such words still jump from the page and may trigger outrage.
Restrictive versus non-restrictive (Res) Compare the following:
The fuel of red blood cells is the carbohydrate glucose.
*The fuel of red blood cells is the carbohydrate, glucose.
The second sentence suggests (incorrectly) that glucose is the only carbohydrate. Additional (non-restrictive) information appears between commas:
Lactate dehydrogenase, which is a protein, is found in red blood cells.
If you use that instead of which in the previous sentence, you imply that there is also a non-proteinaceous lactate dehydrogenase (which could be true but is probably not what you meant). Defining (restrictive) information cannot appear between commas:
The enzyme that converts pyruvate to lactate is found in red blood cells.
It would be incorrect to put a comma before that and/or following lactate. In British English, it is acceptable to use which instead of that in a restrictive clause, but that can only appear in a restrictive clause.
Split infinitive (SI) It is not always wrong to split an infinitive:
(*?)To fully understand the effect, a more detailed analysis is required.
Nevertheless, in some cases it is better avoided:
*The parasite attempts to forcefully enter the host.
(*??)To systematically elucidate the relationship between HDL and atherosclerotic risk, we need to better understand the key regulatory factors.
Spelling (Spell) Use the facilities available (automated spelling correction, oed.com). Spell checkers do not pick up mistakes if the misspelled word happens to spell something else:
weather = a meteorological condition; whether = if in the subjunctive sense
which = that witch = gothic-looking woman who casts spells
principal = main, foremost; principle = fundamental element, axiom
to forgo = to give up on, to do without to forego = to go before, to precede
to effect = to make happen; to affect = to modify, alter, influence
an effect = a consequent phenomenon; an affect = a certain cognitive state
complimentary = courtesy-wise complementary = supplying the remainder
to is the preposition too = also
to advise (verb) an advice (noun)
to extend (verb) an extent (noun)
to save (verb) safe (noun and adjective)
to price (cost), to prize (appreciate) a price (cost); a prize (award)
ensure = make sure insure = what an insurance company does
If you come across a paper in which the authors perform *“principle component analysis” you should wonder whether the authors have any idea what they are talking about. British and American spelling are equally valid, but you should be consistent in your choice. Verbs ending in -ize or -ise present a special problem. One solution is to use -ise in all cases (as William Shakespeare has a character exclaim: “Thou whoreson zed! Thou unnecessary letter!”). However, etymology and phonetics both favour -ize in most cases (to characterize, to analyze). Exceptions (which should always be spelled with an s) include: to devise, to advise, to apprise, to comprise, to despise, to excise, to revise, to supervise, to surmise, to exercise, to improvise.
The indefinite article is written either a or an. Correct usage follows phonetics, not spelling:
an mRNA molecule an LSD-derivative an x-axis an NYPD officer
a uniform a Yemenite a utopia a NASA initiative
Symbols at beginning of sentences (Sym) Avoid beginning a sentence with a mathematical symbol or a chemical formula or a digit:
*f is defined by an ordinary differential equation.
The function f is defined by an ordinary differential equation.
*Al responded differently.
Aluminium responded differently.
*4 mutants were selected for further study.
Four mutants were selected for further study.
This When this is followed by its referent, confusion is unlikely to arise:
This phenomenon is called ‘stochastic resonance’.
When this refers back to an element in a preceding sentence, its precise meaning may elude the reader:
*For larger parameter values, two stationary points appear. This is a bifurcation.
This (!) is even more the case from paragraph to paragraph. It is safer to augment such occurrences of this with a noun or clause that recapitulates the referent:
This variation of the number of stationary points as the parameter value changes is known as a bifurcation.
Repetitive material (Rep) Saying things more than once in different ways is a key technique in exposition, so not all repetition is automatically bad. However, a paragraph or sentence that contains nothing new, or does not permit the reader to view the matter in a different way, serves no purpose and had better be omitted.
Usage (Usage) Note the difference between whether and if:
We must see whether the weather allows it, and if it does, we will go.
Adverbs in English tend to end in -ly or -wise or -ways, but not always:
Work hard and you will succeed.
Shakespeare would still have written hardly here and not meant it in the modern sense. Not everything that seems to explicate a verb is an adverb:
*The door was painted redly.
We say red here because it is a predicative adjunct (it says more about what the door becomes than about the painting process). The advice to native speakers is not to add -ly where their instincts tell them to leave it out.
Other things to bear in mind:
We need only show... It suffices to show...
different from... (never *different than)
farther (distance) further (anything else)
to imply (A implies B) to infer (a person infers B from A)
to compare = seek similarities to contrast = bring out differences
uninterested = not interested disinterested = without a stake in the matter
*warm/cold temperature high/low temperature
*expensive/cheap cost or price high/low cost/price (or: prohibitive etc.)
*irregardless irrespective (or: regardless)
few people, a few attempts less money, less daunting, less water
Fewer than is now almost invariably replaced by less than in everyday speech, and it is to be expected that written English will follow suit within the next few decades.
A thesaurus is a good tool if you momentarily cannot think of the expression that is on the tip of your tongue, but do not be tempted by the delicious unusual words you see along the way (the plural of rhinoceros is rhinocerotes; the collective noun of butterflies is a kaleidoscope). Stick to words that belong to your normal voice and if you do try something new, make sure the word or phrase means what you think it means.
Oh, and it’s wookiee, not wookie.
II. Presenting Statistical Results
“Lies, damned lies, and statistics” You will learn about statistics in a separate module and you may well decide that mastering the technical nitty-gritty of it is not for you. Be that as it may, you are negligent if you fail to heed the following points of advice.
Mean versus median The mean is the average of the data (the sum of the observations divided by their number). The median is any number such that half the data are larger than this number (i.e. the 50th percentile). In symmetric distributions, the mean is a median, but this is not the case when the distribution from which the data were sampled is skewed. In the latter case, it may be better to report the median.
SEM versus SD The SEM is the standard error of the mean. It estimates the accuracy of the sample mean as an estimate of the population mean. The latter is unknown, but is of course more reliably estimated when the sample size increases. Thus, as the number of observations becomes ever larger, the SEM shrinks to zero. The SEM is often (incorrectly) abbreviated to SE, ‘standard error’. The SD is the sample standard deviation. It is a measure of the variability (often called “spread”) in the data around the sample mean. You should use the one marked sn-1 on your calculator, since this provides an unbiased estimate of the standard deviation of the distribution from which the data were sampled. The SD should be used when one is reporting data. However, almost all scientists incorrectly use the SEM nowadays because it is invariably smaller than the SD. If your supervisor is one of these people, try to re-educate him or her.
Statistical significance versus scientific importance Report the P-value. If this is not possible, present the lowest of the conventional cut-off values that is higher than the P-value. (They are, traditionally, 0.001, 0.01, and 0.05 although you may encounter other values. Thus, if P=0.005, report P<0.01, not P<0.05 even though the latter is of course also true.) Now, if you cross a certain busy road two times a day with a probability of 5 percent to be run over, your chances of being alive next week are worse than even and you will almost certainly be dead a year from now. It is therefore right and proper that P-values above 0.05 are not considered to indicate a statistically significant result, but you should remember that, say, P=0.04 is not much better. A few quality journals have moved the goal post to 0.001, which is commendable.
Whether or not a finding is statistically significant is much less important than the associated confidence interval. Suppose that a drug is found to lower blood pressure on average 8 mm Hg from 100 to 92 mm Hg (mm Hg is not the SI unit of pressure, but it is what most medics use). The finding is statistically significant. Is it clinically significant? Well, that depends. The confidence interval may be 2 to 14 mm Hg. The higher end of this range is clinically important, whereas the lower end is not. Altogether the findings are clinically inconclusive. You may agree that the very term ‘significant’ tends to obfuscate the issue. Many statisticians nowadays prefer the term statistically detectable; you are encouraged to adopt this usage and re-educate your supervisors.
Returning to the drug example, suppose that the study is followed up by a similar study with more observations. Now the confidence interval is 7 to 10 mm Hg, which is a clinically significant range. Why not do the study with more subjects straight away? Because resources are scarce, so that on the whole it makes sense to do a pilot study first. More observations mean higher statistical power (q.v.), which means that smaller differences become statistically detectable.
Fishing for significance Genomics, proteomics and metabolomics all afford huge, richly structured data sets that can be subjected to any number of significant tests. Do enough tests and you are bound to come up with a statistically significant (better to say: statistically detectable) result or two. Any number of statistically detectable differences can be found, in fact, if you keep at it long enough (there are many different ways of forming subgroups of the objects in your study, if they have several attributes). Of course it is inappropriate to report only the statistically detectable findings. Worse, it is unethical.
There are various ways to deal correctly with multiple testing situations. A simple and straightforward procedure is the Bonferroni correction (q.v.), which has the drawback that it is far too conservative for the modern “omics” environment. More suitable for this environment is the step-down procedure (q.v.). The ethical entanglements can be side-stepped by specifying in advance which tests are going to be carried out. This leads to the difficulty that the observations may throw up something important and unexpected. The proper way to deal with this is to report the original hypothesis with the originally projected test in one paper, and use the new findings as a pilot upon which you base a study specifically directed at the new finding.
Percentages There is a bizarre perception, deeply ingrained in our culture, that things are somehow easier to understand in terms of percentages. Quite often this is simply not the case. For instance, a natural way to express mortality and morbidity is in terms of per person, per kilo-annum. (Annum means year.) But one usually hears these things quoted as a percentage. For example, breast cancer mortality is reported as 0.5 percent, which presumably means that one in 200 women diagnosed with breast cancer die every year. Early diagnosis reduces this rate. For instance, in Sweden it was found that the rate went down from 0.51 % to 0.39 % during a breast cancer screening trial. This good news was reported as a 24 percent reduction in the media, with an all too predictable effect on public opinion. Perhaps there is just no percentage in honesty—there certainly is little honesty in percentages. Be deeply suspicious of any science-related percentages being bandied about in the media, and always try to select honest ways of expressing your own findings.
III. Graphical Presentation of Results
The message of this section can be summarized in a single sentence: whatever the MS-Excel default would have you do, do the exact opposite.
Line graphs For time series, a line graph is usually better than a column chart. (A) Use straight lines to connect the means at the various time points sampled. Smooth interpolating curves are best not used when the purpose of the graph is to report the data, since you are in effect adding pseudo-data. If you have a mathematical model that purports to capture the processes that gave rise to the data, you can present a curve derived from this model together with the data. Usually the curve is shown for the parameter values that result in the closest “fit” to the data set. Sometimes the data are going to be used in a calculation, and some preprocessing (e.g. smoothing) is used. In such cases it is permissible to exhibit the curve that represents this preprocessing. (B) Use different line styles (e.g. solid, dashed, dotted) to distinguish multiple time courses in the same graph from one another. Different symbols at the data points simply do not work well. (C) Use axes of even length. For long time series, graphs that are much wider than they are tall are appropriate, but it is not appropriate to deviate from squarish proportion in order to exaggerate the impression you want the graph to make. (D) In graphs with linear scales, the most objective representation is obtained if the axes cross at the point (0,0), although you may decide to deviate from this rule to avoid too much waste space in the panel.
Column charts and bar charts A set of vertically (occasionally horizontally) arranged bars is often used to represent data sets. Such a graph is called a ‘column chart’ (‘bar chart’ if the bars are horizontal). These are mostly used to represent categorical data rather than time series, where each observation corresponds to a different treatment, mutant, peptide, or whatever the case may be. The length of a column or bar is proportional to the value you wish to depict. A default choice is to let the column length correspond to a sample mean, and to adorn the column with T-shaped extensions that indicate the standard deviation (or, incorrectly, the standard error). Instead, one could show the raw data as a cloud of points along the centre line of where the column would go. This has several advantages. Readers can assess the shape of the distribution and the spread in the data for themselves, which is important to judge whether appropriate statistical tests have been used. Moreover, it is easier to see if alleged statistically significant differences are due to outliers or groups of outliers. Quite a bit of information is conveyed in the same amount of space a column chart would occupy. Disadvantages are that the chart becomes too crowded if the data set contains more than (say, roughly) 40 data points, and that cognoscenti can glean more from your chart than you wish to reveal. It is not unusual for experimentalists to be coy about their raw data, their averred stock-in-trade.
Instead of mean and SD, one can show the 5th, 25th, 50th, 75th, and 95th percentiles. A traditional format is the box-and-whisker plot (q.v.), which has the drawback of being tremendously space-consuming Slightly more economical is an open column without fill colour and horizontal bars at the percentile points.
Three dimensional column charts show the data as a Manhattan cityscape. Use this option wisely, and do not be tempted to add a phony third dimension to what is really a conventional graph.
Pie charts Categorical data that add up to 100 % are often depicted in a pie chart. Such charts are extremely space consuming (one could print them smaller to save space, but then they become harder to interpret). The same information can be depicted in a column or bar with differently coloured (or cross-hatched) subsections. This is particularly useful if a number of distributions are to be compared.
Keys and legends Graphs are easier to interpret if key information appears in the field of the panel (e.g. labels for the various lines, arrows that indicate when a drug was added, with the name of the drug next to it). Incredibly, there are journals that insist that all of this information must appear in the legend (the caption below the graph). This made sense in a time when graphs had to be prepared by artists who would (typically) not quite understand the subject matter, which caused graphs with keys and labels to go through many time and money consuming iterations. Today, when investigators can prepare the graphs themselves on a computer, there is no good reason to adhere to this practice.