High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) allows for high resolution, genome-wide mapping of RNA-binding proteins. This methodology is frequently used to validate predicted targets of microRNA binding, as well as direct targets of other RNA-binding proteins. Hence, the accuracy and sensitivity of binding site identification is critical.
We found that substantial mispriming during reverse transcription results in the overrepresentation of sequences complementary to the primer used for reverse transcription. Up to 45 % of peaks in publicly available HITS-CLIP libraries are attributable to this mispriming artifact, and the majority of libraries have detectable levels of mispriming. We also found that standard techniques for validating microRNA-target interactions fail to differentiate between artifactual peaks and physiologically relevant peaks.
Here, we present a modification to the HITS-CLIP protocol that effectively eliminates this artifact and improves the sensitivity and complexity of resulting libraries.