The effects of DNA extraction and PCR amplification for our protocols were much larger than those due to sequencing and classification.

We created two additional sets of 80 mock communities by mixing prescribed quantities of DNA and PCR product to quantify the relative contribution to bias of (1) DNA extraction, (2) PCR amplification, and (3) sequencing and taxonomic classification for particular choices of protocols for each step.

We developed models to predict the “true” composition of environmental samples based on the observed proportions, and applied them to a set of clinical vaginal samples from a single subject during four visits.

We observed that using different DNA extraction kits can produce dramatically different results but bias is introduced regardless of the choice of kit.

We observed error rates from bias of over 85% in some samples, while technical variation was very low at less than 5% for most bacteria.

Analysis of mock communities can help assess bias and facilitate the interpretation of results from environmental samples.

Next-generation sequencing technology (NGS) allows a much deeper characterization of the structure of microbial communities using metagenomic approaches.

Metagenomic surveys often use a hypervariable region of the highly-conserved and universal 16S r RNA gene as a phylogenetic marker.

Characterizing microbial communities via next-generation sequencing is subject to a number of pitfalls involving sample processing.

The observed community composition can be a severe distortion of the quantities of bacteria actually present in the microbiome, hampering analysis and threatening the validity of conclusions from metagenomic studies.