How to read a scientific paper

In considering if an article is valid, there is no alternative to reviewing the Materials and Methods sections of an article for the methodology used in the study

Referring to the Evidence-Based Pyramid may help a veterinarian decide if the results presented in the study are likely to be true.
Courtesy Arlt, S. P., & Heuwieser, W. (2016). The Staircase of Evidence – a New Metaphor Displaying the Core Principles of Evidence-based Veterinary Medicine. Veterinary Evidence, 1(1).

It is important for veterinarians to constantly strive to improve their knowledge of diagnostic and therapeutic interventions, so they remain consistent with new findings. To do otherwise puts them at risk of failing to keep up with the latest clinical advancements.

The most accessible means of obtaining the latest information is usually in clinical journals. However, the readily available sources of biomedical literature are expanding rapidly, while the time to read them is becoming increasingly more difficult to find. If a veterinarian wants to try to stay current with developments in veterinary medicine without getting overwhelmed, it is important to develop a system for reading and evaluating scientific papers.

Given the sometimes-conflicting demands of clinical practice and the desire to maintain some sort of a nonpractice life, it is reasonable to assume: 1) Most veterinarians are already behind in their reading and, 2) They will never have more time to read than they do right now. Therefore, veterinarians should consider focusing their limited reading time on the relative few articles that are both scientifically valid and applicable to one’s area of interest, while rejecting most other articles almost immediately.

Learning the clinical course and prognosis

Here, the first thing to look for is if there was a control group assembled. Ideally, the control group is identical to the study group, except it does not possess the characteristic or has not been exposed to the treatment under study. Controls should ideally include no treatment groups; some studies also include groups that receive placebo treatments.

Controlled studies may be divided into case-control studies, in which cases have already developed the disease, and the controls are those that have not; or cohort studies, in which the cases are separated into study and control groups before the investigator is aware of whether the cases have or will develop the condition being studied.

Controls are especially important in studying subjective analyses (did the animal get ‘‘better?’’) where there might be a significant observer bias—that is, where the observer might be inclined to believe that there was improvement merely because the patient was treated. If the study fails to include controls, the results of the study will be unpredictable.

Distinguishing a therapy

Check to confirm the study went through a randomization, where a method was used to randomly assign patients to treatment groups. This is the best way to group patients at the start of a trial who are identical in their risk of the events.

Randomization tends to balance the groups for prognostic factors, such as the severity of the studied disease. Unevenly distributed prognostic factors can exaggerate, cancel, or even counteract the effects of therapy, leading to false-positive or false-negative results. Further, if the studying clinicians are unaware of the randomization (‘‘blinded’’) they will not know which treatment the next patient will receive. Thus, they will not be able to distort, consciously or unconsciously, the balance between the two groups being compared. Failure to conceal randomization tends to lead to situations where patients with a more favorable prognosis receive the experimental therapy, exaggerating the benefits of the therapy, and perhaps even lead to a false-positive conclusion.

Ideally, the study will also be ‘‘double-blinded,’’ that is, neither the patient nor the clinician will know who is receiving treatment until after the study is over. Unfortunately, blinding is not possible in all studies (e.g. surgical ones). However, it is still a goal that should be pursued wherever possible to try and eliminate subtle biases that may influence study results (See “Evaluation matters”).

An article about diagnostic tests should include an independent blind comparison with a ‘‘gold standard’’ of diagnosis. That is, some objective method of determination, such as a biopsy, surgery, postmortem examination, or long-term follow-up, should have been used to show the patients had the disease in question. There must be a second group of patients shown not to have had the disease. Then, the test should have been interpreted by “blind” clinicians who did not know whether a given patient really had the disease. Afterward, these test results should be compared to the gold standard. If these procedures are not followed, the paper (and the test) might be rejected out of hand.


How much impact can randomization, blinding, and controls have in the exaggeration of the odds of a therapy’s effectiveness? Quite a bit.

  • Inadequate method of treatment allocation—results larger by 41 percent
  • Unclear method of treatment allocation—results larger by 30 percent
  • Trials not double blind—results larger by 17 percent1

If a paper does not meet randomization and blinding criteria, it is reasonable to consider rejecting it out of hand. If there is still curiosity about the validity of the paper, it’s important to look critically at the evidence provided in support of the conclusions and at how the conclusions were drawn.

Relevance of the study

If the study does not apply to an individual’s practice, it may not be worth reading at all. One might be able to tell if an article is worth a more thorough evaluation by:

  • Looking at the title. Is the article potentially useful in an individual’s practice? If not, consider going on to the next article. For example, is an article about a new orthopedic procedure useful for a general practitioner?
  • Reading the summary. Here, the objective is simply to decide if the conclusion, if valid, would be important. The issue here is not whether the results are true; rather, it is whether the results, if true, would be useful.
  • Considering the expertise required. For example, if a new technique for laparoscopic renal biopsy is proposed, would the practice have the expertise, or access to the required facilities and equipment to perform the technique? A technique that is applied to dogs may not be useful for a practice limited to cats. Otherwise stated, are the patients in the study similar to those in the practice and could the results be applied?

Validity of the study

Courtesy David Ramey

A study is not necessarily worth a look just because it appears in a reputable journal. The review and editorial policies of even the best journals do not protect the reader from errors. The mere fact that the article got printed makes it subject to potential biases such as:

  • Submission bias: Research workers are more strongly motivated to complete studies that have positive results and submit them for publication.
  • Publication bias: Editors are more likely to publish positive studies.
  • Methodological bias: Errors such as flawed randomization produce positive biases.

In fact, it is reasonable to be skeptical of any conclusion from the onset. Accepting an article solely based on its conclusion runs the risk of accepting false information unless the conclusion is negative (i.e. negative results are more likely to be valid than positive ones1). Thus, in considering if an article is valid, there is no alternative to reviewing the Materials and Methods sections of an article for the methodology used in the study. That is where the meat of any article can be found.

Evidence of the study

If a research paper is not randomized, blinded, and/or controlled, the veterinarian may be in a bit of a bind; otherwise stated, the evidence may not be that good. In such cases, it may be helpful to think about the evidence in terms of quality.  All evidence is not created equal (Figure 1).

Even if the quality of evidence is poor, some things may be gleaned from poor studies.

If the treatment effect is huge, it is less likely that it is a false-positive study. This usually only happens when the prognosis is uniformly terrible, which is rare.

If the nonrandomized study concluded that a therapy was useless or harmful, it is usually safe to accept that conclusion. False-negative conclusions from studies are less likely than false-positive ones.

Results of the study

If it has been decided a paper is worth reading, and the evidence may be applicable, the next step is to see the likelihood that the results have some significance.

The most commonly used indicator of significance is the p value. The p value is a measurement of the likelihood the data obtained merely arose by chance or natural variation: a p value of 0.05 has been arbitrarily chosen as the standard of significance.  Ideally, a p value should be as far below 0.05 as possible.

It should be noted that a p of 0.05 does not mean there is a 95 percent chance the results of the study are valid. Even at a p of 0.05, there is still a five percent probability (one in 20) the results occurred by chance. Statistical significance tests measure probabilities, but no matter what level of significance is chosen, there is always some probability of seeing a difference between studied groups when none really exists.

The p value has other limitations as well. It does not indicate the size of the difference in the study groups.

A statistically significant difference may have slight clinical relevance. For example, if a study of 40,000 individuals showed that a new treatment reduced risk of 0.5 deaths per 100 from an older treatment, even though the difference would be significant, it would mean difference of one life per 200 individuals treated. Whether the benefit is worth the cost is not a question that can be answered statistically.

By using systematic approach to reading scientific papers (Figure 2), a practitioner will almost certainly be able to dramatically reduce his or her reading time. However, following such strict criteria may mean the practitioner will have virtually nothing to read. Nevertheless, using a systematic approach to reading scientific papers will veterinary practitioners sort out the wheat from the chaff and help make sure that the precious time devoted to reading is well spent.


There might be any number of reasons why veterinarians might what to consider reading scientific journals, including to:

  • Improve the health of patients and to respect the costs to the client
  • Keep abreast of news in the profession
  • Understand pathobiology
  • Find out how a seasoned clinician handles a particular problem
  • Find out whether to use a new or existing diagnostic test
  • Learn the clinical features and course of a disorder
  • Distinguish useful from useless or even harmful therapies
  • Determine etiology or causation
  • Sort out claims concerning new therapies
  • Read the letters to the editor

David Ramey, DVM, is a Los Angeles, Calif.,-based equine practitioner.  He is also the current president of the Evidence-Based Veterinary Medicine Association (EBVMA).  For information about the EBVMA, click


  1. Data from Schultz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995;273:408–412.
  2. Ionnaidis, JPA. Why most published research findings are false. PLOS Medicine, 2005.  Available at  Accessed 8.31.2022.
  3. Ramey, D, How to Read a Scientific Paper. Proc 45th AAEP, 1999: 281.  Reprinted with permission of the American Association of Equine Practitioners.

Post a Comment