{"id":608711,"date":"2023-02-16T20:49:23","date_gmt":"2023-02-17T02:49:23","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/02\/16\/prediction-of-prime-editing-insertion-efficiencies-using-sequence-features-and-dna-repair-determinants\/"},"modified":"2023-02-16T20:49:23","modified_gmt":"2023-02-17T02:49:23","slug":"prediction-of-prime-editing-insertion-efficiencies-using-sequence-features-and-dna-repair-determinants","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/02\/16\/prediction-of-prime-editing-insertion-efficiencies-using-sequence-features-and-dna-repair-determinants\/","title":{"rendered":"Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants"},"content":{"rendered":"<p>Science &#038; Nature <\/p>\n<div>\n<div id=\"Sec1-section\" data-title=\"Main\">\n<h2 id=\"Sec1\">Main<\/h2>\n<div id=\"Sec1-content\">\n<p>The efficient insertion of short DNA sequences into genomes could change the course of biotechnology and medicine<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\" title=\"Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol. 40, 731\u2013740 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR1\" id=\"ref-link-section-d69891743e460\">1<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Yarnall, M. T. N. et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. Nat. Biotechnol. \n                https:\/\/doi.org\/10.1038\/s41587-022-01527-4\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR2\" id=\"ref-link-section-d69891743e463\">2<\/a><\/sup>. Small insertions can encode protein tags for purification and visualization, or manipulate protein function by altering protein localization, half-life or interaction profiles. Integrating sequences for transcription factor binding sites and splicing modulators provides control over gene expression while introducing structural elements or recombinase sites can change DNA conformation and provide a substrate for large-scale engineering. For therapeutic opportunities, over 16,000 small deletion variants have been causally linked to disease<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862\u2013D868 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR3\" id=\"ref-link-section-d69891743e467\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062\u2013D1067 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR4\" id=\"ref-link-section-d69891743e470\">4<\/a><\/sup>, and could in principle be restored by inserting the missing sequence<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Geurts, M. H. et al. Evaluating CRISPR-based prime editing for cancer modeling and CFTR repair in organoids. Life Sci. Alliance 4, e202000940 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR5\" id=\"ref-link-section-d69891743e474\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Schene, I. F. et al. Prime editing for functional repair in patient-derived disease models. Nat. Commun. 11, 5352 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR6\" id=\"ref-link-section-d69891743e477\">6<\/a><\/sup>. A prominent example is cystic fibrosis, where 70% of cases are caused by a three-nucleotide (nt) deletion<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Drumm, M. L., Ziady, A. G. &#038; Davis, P. B. Genetic variation and clinical heterogeneity in cystic fibrosis. Annu. Rev. Pathol. 7, 267\u2013282 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR7\" id=\"ref-link-section-d69891743e481\">7<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Zielenski, J. &#038; Tsui, L. C. Cystic fibrosis: genotypic and phenotypic variations. Annu. Rev. Genet. 29, 777\u2013807 (1995).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR8\" id=\"ref-link-section-d69891743e484\">8<\/a><\/sup>. To enable reversing these mutations in practice, a technology must integrate insertions efficiently, accurately and safely, avoiding the unintended outcomes and double-strand break stress that hampers existing Cas9-based therapies<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Leibowitz, M. L. et al. Chromothripsis as an on-target consequence of CRISPR-Cas9 genome editing. Nat. Genet. 53, 895\u2013905 (2021).\" href=\"http:\/\/www.nature.com\/#ref-CR9\" id=\"ref-link-section-d69891743e488\">9<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. \n                https:\/\/doi.org\/10.1038\/nbt.4317\n                \n               (2018).\" href=\"http:\/\/www.nature.com\/#ref-CR10\" id=\"ref-link-section-d69891743e488_1\">10<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Kosicki, M., Tomberg, K. &#038; Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765\u2013771 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR11\" id=\"ref-link-section-d69891743e491\">11<\/a><\/sup>.<\/p>\n<p>Prime editors can insert short DNA sequences without generating double-strand breaks or requiring an external template. They consist of a nicking version of Cas9 fused to a reverse transcriptase domain, which is complexed with a prime editing guide RNA (pegRNA)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e498\">12<\/a><\/sup>. The pegRNA comprises a primer binding site homologous to the sequence in the target, and a reverse transcriptase template that includes the intended edit, all in the 3\u2032 extension of a standard CRISPR\u2013Cas9 guide RNA. At the target site, Cas9 nicks one strand of the genomic DNA, which then anneals to the primer binding site on the pegRNA, and is extended by the Cas9-fused reverse transcriptase using the pegRNA-encoded template sequence. Next, DNA repair mechanisms resolve the conflicting sequences on the two DNA strands, ultimately writing the intended edit into the genome. Where CRISPR\u2013Cas9 was compared with molecular scissors capable of disrupting target genes, and base editors were seen as molecular pencils for their ability to substitute single nucleotides, prime editors can be described as molecular word processors, able to perform search and replace operations directly on the genome<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Anzalone, A. V., Koblan, L. W. &#038; Liu, D. R. Genome editing with CRISPR\u2013Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824\u2013844 (2020).\" href=\"http:\/\/www.nature.com\/#ref-CR13\" id=\"ref-link-section-d69891743e502\">13<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liu, G., Lin, Q., Jin, S. &#038; Gao, C. The CRISPR-Cas toolbox and gene editing technologies. Mol. Cell 82, 333\u2013347 (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR14\" id=\"ref-link-section-d69891743e502_1\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, P. J. &#038; Liu, D. R. Prime editing for precise and highly versatile genome manipulation. Nat. Rev. Genet. \n                https:\/\/doi.org\/10.1038\/s41576-022-00541-1\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR15\" id=\"ref-link-section-d69891743e502_2\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"00 title=\"Doman, J. L., Sousa, A. A., Randolph, P. B., Chen, P. J. &#038; Liu, D. R. Designing and executing prime editing experiments in mammalian cells. Nat. Protoc. 17, 2431\u20132468 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR16\" id=\"ref-link-section-d69891743e505\">16<\/a><\/sup>.<\/p>\n<p>The prime editing system is complex, and the determinants of its efficiency are not fully understood. Several partly independent steps, including three DNA binding events and successful DNA repair, are needed to produce an edit, each potentially influenced by the introduced sequence. In the largest study so far to understand these biases, Kim et al. comprehensively tested the consequences of varying the reverse transcription templates and primer binding site lengths using a library of 55,000 pegRNAs. The editing rate increased with Cas9 guide RNA activity, as well as GC content and melting temperature of the primer binding site. While further optimization of sequences was possible, primer binding sites of 11\u201313\u2009nt and reverse transcriptase templates of 10\u201312\u2009nt had the highest average editing efficiencies<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"11 title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR17\" id=\"ref-link-section-d69891743e512\">17<\/a><\/sup>.<\/p>\n<p>The majority of libraries used by Kim et al. contained the same single-nucleotide substitution 5\u2009nt upstream of the nick site. Similarly, nearly all investigations of prime editing efficacy to date have predominantly focused on single-nucleotide substitutions<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"22 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e519\">12<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/#ref-CR17\" id=\"ref-link-section-d69891743e522\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Kweon, J. et al. Engineered prime editors with PAM flexibility. Mol. Ther. 29, 2001\u20132007 \n                https:\/\/doi.org\/10.1016\/j.ymthe.2021.02.022\n                \n               (2021).\" href=\"http:\/\/www.nature.com\/#ref-CR18\" id=\"ref-link-section-d69891743e522_1\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liu, Y. et al. Efficient generation of mouse models with the prime editing system. Cell Discov. 6, 27 (2020).\" href=\"http:\/\/www.nature.com\/#ref-CR19\" id=\"ref-link-section-d69891743e522_2\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635\u20135652.e29 (2021).\" href=\"http:\/\/www.nature.com\/#ref-CR20\" id=\"ref-link-section-d69891743e522_3\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"33 title=\"Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402\u2013410 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR21\" id=\"ref-link-section-d69891743e525\">21<\/a><\/sup>. Of the many possible useful sequences in molecular biology, only a handful have been introduced with prime editing<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"44 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e529\">12<\/a><\/sup>. Therefore, in contrast to a relatively deep understanding of Cas9 mutagenesis<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"55 title=\"Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. \n                https:\/\/doi.org\/10.1038\/nbt.4317\n                \n               (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR10\" id=\"ref-link-section-d69891743e533\">10<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184\u2013191 (2016).\" href=\"http:\/\/www.nature.com\/#ref-CR22\" id=\"ref-link-section-d69891743e536\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Meier, J. A., Zhang, F. &#038; Sanjana, N. E. GUIDES: sgRNA design for loss-of-function screens. Nat. Methods 14, 831\u2013832 (2017).\" href=\"http:\/\/www.nature.com\/#ref-CR23\" id=\"ref-link-section-d69891743e536_1\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"66 title=\"Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning\u2013based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR24\" id=\"ref-link-section-d69891743e539\">24<\/a><\/sup> and base editing outcomes<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Pallaseni, A. et al. Predicting base editing outcomes using position-specific sequence determinants. Nucleic Acids Res. 50, 3551\u20133564 (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR25\" id=\"ref-link-section-d69891743e543\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463\u2013480.e30 (2020).\" href=\"http:\/\/www.nature.com\/#ref-CR26\" id=\"ref-link-section-d69891743e543_1\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"77 title=\"Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037\u20131043 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR27\" id=\"ref-link-section-d69891743e546\">27<\/a><\/sup>, very little is known about how the inserted sequence affects efficiency, and the length range of insertions feasible by prime editing has not been defined.<\/p>\n<p>Here, we systematically measure the insertion efficiency of 3,604 sequences in several target sites and a variety of cellular and repair pathway contexts. We find that insertion sequence length, nucleotide composition and secondary structure all affect insertion efficiency. Moreover, we define the precise effect of mismatch repair (MMR) on thousands of insertion sequences and discover that overexpression of the 3\u2032 flap nucleases TREX1 and TREX2 abolished the insertion of longer sequences. Together, sequence features and repair pathway activity explain most of the variation in insertion rate. We then use these insights to train a sequence-based prediction model informed by MMR efficiency that predicts editing outcomes for novel sequences with high accuracy and demonstrate the model\u2019s usefulness for the selection of optimal reagents for new insertions.<\/p>\n<\/div>\n<\/div>\n<div id=\"Sec2-section\" data-title=\"Results\">\n<h2 id=\"Sec2\">Results<\/h2>\n<div id=\"Sec2-content\">\n<p>We sought to systematically characterize how the length and composition of inserted sequence, as well as cell line, target site and the version of the prime editor system, affect insertion rates. To do so, we designed 3,604 pegRNAs encoding insertions immediately upstream of the nick site. These comprise 270 sequences useful for molecular biology (for example, His-6 tag, recombinase sites and mNeonGreen-11 (ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"88 title=\"Feng, S. et al. Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 8, 370 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR28\" id=\"ref-link-section-d69891743e562\">28<\/a><\/sup>)); 1,957 eukaryotic linear motifs<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Dinkel, H. et al. The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res. 42, D259\u2013D266 (2014).\" href=\"http:\/\/www.nature.com\/#ref-CR29\" id=\"ref-link-section-d69891743e566\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Dinkel, H. et al. ELM 2016\u2014data update and new functionality of the eukaryotic linear motif resource. Nucleic Acids Res. 44, D294\u2013D300 (2016).\" href=\"http:\/\/www.nature.com\/#ref-CR30\" id=\"ref-link-section-d69891743e566_1\">30<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\"99 title=\"Puntervoll, P. et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 31, 3625\u20133630 (2003).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR31\" id=\"ref-link-section-d69891743e569\">31<\/a><\/sup>; 439 sequences with variable secondary structure; all single nucleotides, dinucleotides, trinucleotides and tetranucleotides; and 100 random sequences of each length between 5 and 10\u2009nt (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1a<\/a>). Insertions ranged from the length of 1 to 69\u2009nt, and varied in GC content (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1b<\/a>), while the primer binding site and homology arm lengths in the pegRNA were fixed to 13 and 34\u2009nt, respectively. We used lentiviruses to deliver the libraries against four target sites (three previously tested: <i>HEK3<\/i>, <i>EMX1<\/i>, <i>FANCF<\/i><sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"00 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e588\">12<\/a><\/sup>, and the safe-harbor <i>CLYBL<\/i> locus<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"11 title=\"Cerbini, T. et al. Transcription activator-like effector nuclease (TALEN)-mediated CLYBL targeting enables enhanced transgene expression and one-step generation of dual reporter human induced pluripotent stem cell (iPSC) and neural stem cell (NSC) lines. PLoS ONE 10, e0116032 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR32\" id=\"ref-link-section-d69891743e595\">32<\/a><\/sup>) in two cell lines (HEK293T and HAP1), followed by transient transfection of the prime editor 2 plasmid (HEK293T cells) or doxycycline induction of PiggyBac transposase integrated prime editor (HAP1 cells), five\u2009d of selection and sequencing of two amplicons from the cell pool, one of the targeted locus and one of the pegRNA locus (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1c<\/a>). We calculated insertion efficiencies as the fraction of reads in the target site amplicon with a given insertion divided by the fraction of reads for the pegRNA encoding it in the pegRNA amplicon, and analyzed them as the main statistic in the rest of the study.<\/p>\n<div data-test=\"figure\" data-container-section=\"figure\" id=\"figure-1\" data-title=\"High-throughput measurement of prime insertion efficiencies.\">\n<figure><figcaption><b id=\"Fig1\" data-test=\"figure-caption-text\">Fig. 1: High-throughput measurement of prime insertion efficiencies.<\/b><\/figcaption><div>\n<div><a data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/1\" rel=\"nofollow\"><picture><source type=\"image\/webp\" ><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"http:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41587-023-01678-y\/MediaObjects\/41587_2023_1678_Fig1_HTML.png\" alt=\"Science &amp; Nature figure 1\" loading=\"lazy\" width=\"685\" height=\"383\"><\/picture><\/a><\/div>\n<p><b>a<\/b>, Screen setup. Set 1 and Set 2 libraries were screened separately and data merged (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec8\">Methods<\/a>); panels <b>d<\/b>\u2013<b>f<\/b> reflect Set 1 results only. <b>b<\/b>, Library composition. The number of sequences in the library (<i>y<\/i> axis) with different insert sequence lengths (<i>x<\/i> axis, top panel) and %GC content (<i>x<\/i> axis, bottom panel). <b>c<\/b>, Experimental design. NGS, next generation sequencing. <b>d<\/b>, Editing frequencies. Average mutation frequency (<i>y<\/i> axis) for different screens (<i>x<\/i> axis) stratified by mutation type (blue, insertions; gray, unintended outcomes). Markers represent one replicate and bars the average across <i>n<\/i>\u2009=\u20093 biological replicates. <b>e<\/b>, Replicate concordance. Pearson\u2019s <i>R<\/i> between insertion rates in two screens (<i>x<\/i> axis) for different comparisons (<i>y<\/i> axis, colors). Markers, correlation value of one pair of screens (for replicate correlations, mean of pairwise comparison across <i>n<\/i>\u2009=\u20093 biological replicates); line and whiskers, mean and s.e.m. <b>f<\/b>, Representative examples of categories from <b>e<\/b>. Percentage insertion in the <i>HEK3<\/i> locus in HEK293T cells (<i>y<\/i> axis) compared with values (<i>x<\/i> axis) in other contexts (panels, colors) for insertion sequences (markers). Left panel, comparison of biological replicates; other panels, comparison of replicate averages. Label, <i>R<\/i> of values in linear scale. Colors as in <b>e<\/b>.<\/p>\n<\/div>\n<p xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\"><a data-test=\"article-link\" data-track=\"click\" data-track-label=\"button\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/1\" data-track-dest=\"link:Figure1 Full size image\" aria-label=\"Reference 3\"22 rel=\"nofollow\"><span>Full size image<\/span><\/a><\/p>\n<\/figure>\n<\/div>\n<p>Insertion efficiencies of sequences varied widely. The top 5% of templates were inserted 27\u2013134 times more efficiently than the bottom 5% across the various target site and cell line combinations (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1a,b<\/a>), indicating substantial sequence-dependent variation. The insertion rates were highly consistent across biological replicates (median <i>R<\/i>\u2009=\u20090.70; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1c\u2013i<\/a>), but differed in magnitude across screens (average across pegRNAs, 0.18% for the <i>CLYBL<\/i> locus in HEK293T to 6.7% for the <i>HEK3<\/i> locus in HEK293T cells; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1d<\/a>). Unintended editing outcomes we observed included single-base mutations, small insertions and deletions around the nicking site, deletions overlapping primer binding site and reverse transcription template, insertion of mutated library sequences, duplications of the reverse transcription template, as well as partial scaffold integrations (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1d<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2a\u2013c<\/a>). These outcomes were rare overall (0.06\u20130.45%). Base changes at the target site were infrequent in reads with and without insertions (0.038% versus 0.030%), but slightly elevated upon insertion immediately downstream of the nick site and for the first nucleotides after the end of the homology arm (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2d\u2013f<\/a>). Overall, the intended insertions were the dominant mutations generated, and we do not consider the unintended edits further.<\/p>\n<p>To understand the consistency of insertion efficiencies across contexts, we next compared them between replicates, cell lines and target sites. Insertion rates into the same target site in different cell lines were more correlated (mean <i>R<\/i>\u2009=\u20090.52) than into different target sites in the same line (mean <i>R<\/i>\u2009=\u20090.38). The correlation was weakest when both the target site and cell line were different (mean <i>R<\/i>\u2009=\u20090.17; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1e,f<\/a>), demonstrating both target sequence-specific and cell line-dependent biases on insertion.<\/p>\n<h3 id=\"Sec3\">Insert size and MMR activity effects<\/h3>\n<p>Given the repeatable sequence-dependent variation in insertion rates that spans over three orders of magnitude, we sought to understand the responsible features, starting with insert length. Insertion frequency did not decrease monotonically with insert length in HEK293T cells, but instead, had two modes of high values. First, sequences of 3 and 4\u2009nt were inserted on average 2.0\u20134.1 times more efficiently than others across the four targeted sites (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2a<\/a>). Second, sequences between 15 and 21\u2009nt were inserted on average 1.3\u20131.6 times more efficiently than 10\u201314-nt ones, and 1.5\u20132.0 times more efficiently than sequences longer than 21\u2009nt (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2a<\/a>). These relative biases in efficiency were shared between all target sites, despite a 20-fold range of their average insertion rates. Inserts longer than 45\u2009nt were incorporated less frequently, at a screen average rate that is 4\u20138 times lower than that of sequences shorter than 45\u2009nt. The longest sequence that was inserted at >1% frequency (1.4%, <i>HEK3<\/i> site in HEK293T cells) was 66\u2009nt, demonstrating that integration of moderately long sequences is feasible with prime editing.<\/p>\n<div data-test=\"figure\" data-container-section=\"figure\" id=\"figure-2\" data-title=\"Prime insertion efficiency depends on insert length and MMR.\">\n<figure><figcaption><b id=\"Fig2\" data-test=\"figure-caption-text\">Fig. 2: Prime insertion efficiency depends on insert length and MMR.<\/b><\/figcaption><div>\n<div><a data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/2\" rel=\"nofollow\"><picture><source type=\"image\/webp\" ><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"http:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41587-023-01678-y\/MediaObjects\/41587_2023_1678_Fig2_HTML.png\" alt=\"Science &amp; Nature figure 2\" loading=\"lazy\" width=\"685\" height=\"507\"><\/picture><\/a><\/div>\n<p><b>a<\/b>, Insertion rate in HEK293T cells. Percentage of reads with insertion (<i>y<\/i> axis, cut-off at 3\u2009s.d. above mean) for different insert sizes (<i>x<\/i> axis) of individual sequences (blue markers) and averages for lengths with at least 30 measured sequences (dark blue line and markers) at different target sites (panels). Data represent the average of <i>n<\/i>\u2009=\u20093 biological replicates. <b>b<\/b>, As <b>a<\/b>, but for HAP1 cells. <b>c<\/b>, As <b>a<\/b>, but for HAP1 <i>\u2206MLH1<\/i> cells. <b>d<\/b>, Insertion rate in one cell context (<i>y<\/i> axis) compared with in another context (<i>x<\/i> axis) at the <i>HEK3<\/i> target of individual sequences (markers), comparing HEK293T with HAP1 cells (left panel) and HEK293T cells with HAP1 <i>\u2206MLH1<\/i> cells (middle panel). Red, short sequences (up to 4\u2009nt); blue, medium sequences (5\u201313\u2009nt); teal, longer sequences (>13\u2009nt). Label, <i>R<\/i> between rates. The data are an average from <i>n<\/i>\u2009=\u20093 biological replicates (HEK293T) or <i>n<\/i>\u2009=\u20092 biological replicates (HAP1). <b>e<\/b>, Average insertion rates (<i>y<\/i> axis) across insert lengths (<i>x<\/i> axis) with at least 30 measured sequences in various cell line contexts (colors). Data are presented as mean\u2009\u00b1\u2009s.e.m. <i>n<\/i>\u2009=\u20093 biological replicates (HEK293T) or <i>n<\/i>\u2009=\u20092 biological replicates (HAP1). <b>f<\/b>, The ratio of relative insertion rates (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec8\">Methods<\/a>) at the <i>HEK3<\/i> locus between HAP1 <i>\u2206MLH1<\/i> and HAP1 cells (<i>y<\/i> axis) for different lengths (<i>x<\/i> axis) stratified by colors as in <b>d<\/b>. Box, median and quartiles; whiskers, least extreme of 1.5 times the interquartile range from the quartile and most extreme values. Line, fit from an exponential model (ratio\u2009\u2248\u2009<i>a<\/i>\u2009\u00d7 exp(\u2212<i>b<\/i>\u2009\u00d7\u2009length)\u2009+\u20091). <i>n<\/i>\u2009=\u20092 biological replicates.<\/p>\n<\/div>\n<p xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\"><a data-test=\"article-link\" data-track=\"click\" data-track-label=\"button\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/2\" data-track-dest=\"link:Figure2 Full size image\" aria-label=\"Reference 3\"33 rel=\"nofollow\"><span>Full size image<\/span><\/a><\/p>\n<\/figure>\n<\/div>\n<p>In contrast to HEK293T cells, the insertion frequency of the short 1\u20134-nt sequences was not substantially higher than that of longer ones in HAP1 cells (0.60\u20131.27 times; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2b<\/a>). This reduced the concordance of insertion rates in the two cell lines at the same site (<i>R<\/i>\u2009=\u20090.41 for <i>FANCF<\/i> and 0.54 for <i>HEK3<\/i>; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2d<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3a<\/a>) compared with replicates (median <i>R<\/i>\u2009=\u20090.78; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig1\">1e<\/a>). One possible explanation is MMR proficiency, since HEK293T cells are partly MMR deficient due to promoter methylation of <i>MLH1<\/i> (ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"44 title=\"Trojan, J. et al. Functional analysis of hMLH1 variants and HNPCC-related mutations using a human expression system. Gastroenterology 122, 211\u2013219 (2002).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR33\" id=\"ref-link-section-d69891743e914\">33<\/a><\/sup>), while HAP1 cells are not. The MMR pathway recognizes and excises short mismatches of less than 13\u2009nt and could therefore remove short insertions in HAP1 cells before the nicked strand is re-ligated<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"55 title=\"Gupta, S., Gellert, M. &#038; Yang, W. Mechanism of mismatch recognition revealed by human MutS\u03b2 bound to unpaired DNA loops. Nat. Struct. Mol. Biol. 19, 72\u201378 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR34\" id=\"ref-link-section-d69891743e918\">34<\/a><\/sup>. Indeed, MMR antagonizes prime editing for substitutions and short insertions<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"66 title=\"Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635\u20135652.e29 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR20\" id=\"ref-link-section-d69891743e923\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"77 title=\"Ferreira da Silva, J. et al. Prime editing efficiency and fidelity are enhanced in the absence of mismatch repair. Nat. Commun. 13, 760 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR35\" id=\"ref-link-section-d69891743e926\">35<\/a><\/sup>. Consistent with this explanation, we observed strong correlations between insertion rates in HAP1 and HEK293 cells for sequences longer than 13\u2009nt that are not affected by MMR (<i>R<\/i>\u2009=\u20090.78 for the <i>FANCF<\/i> locus and 0.91 for the <i>HEK3<\/i> locus; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2c<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3a<\/a>).<\/p>\n<p>To experimentally test the hypothesis that rates of insertion of short sequences differ between cell lines due to MMR activity, we screened the <i>HEK3<\/i>&#8211; and <i>FANCF<\/i>-targeted libraries in HAP1 cells that are knocked out for <i>MLH1<\/i> (HAP1 <i>\u2206MLH1<\/i>; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2d<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3b,c<\/a>). We found that the average insertion rates of 1\u20134-nt sequences were most affected by the knockout, increasing by 7.2\u201311-fold, while the rates of 5\u201313-nt sequences increased 2.1\u20132.7-fold (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2e<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3d<\/a>). Overall, 66% (<i>HEK3<\/i>) and 67% (<i>FANCF<\/i>) of the variance in the fold changes (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2f<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3d<\/a>) was explained by a model where the loss of MMR increases the insertion rate of 1-nt sequences by 23\u201328-fold, with the increase in insertion efficiency dropping 40\u201348% for every additional nucleotide. The low correlations of insertion rates between HEK293T and wild-type HAP1 cells (<i>R<\/i>\u2009=\u20090.41\u20130.54) also improved to close to replicate concordance when matching MMR status (<i>R<\/i>\u2009=\u20090.73\u20130.96 between HEK293T and HAP1 <i>\u2206MLH1<\/i> cell lines; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2c<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3a,e<\/a>). In summary, our findings highlight that MMR proficiency is the major source of independent variation between the tested cellular contexts for prime insertion of short sequences.<\/p>\n<h3 id=\"Sec4\">Effects of prime editing steps<\/h3>\n<p>Having confirmed MMR as a length-dependent determinant of insertion efficiency, we next sought to understand how different steps of prime editing affect insertion rates of our library sequences. Specifically, we dissected the contributions of (1) pegRNA expression, (2) reverse transcription by two different reverse transcriptases, (3) presence of a nicking guide and (4) overexpression of 3\u2032 and 5\u2032 flap nucleases (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3a<\/a>).<\/p>\n<div data-test=\"figure\" data-container-section=\"figure\" id=\"figure-3\" data-title=\"Effects of prime editing steps.\">\n<figure><figcaption><b id=\"Fig3\" data-test=\"figure-caption-text\">Fig. 3: Effects of prime editing steps.<\/b><\/figcaption><div>\n<div><a data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/3\" rel=\"nofollow\"><picture><source type=\"image\/webp\" ><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"http:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41587-023-01678-y\/MediaObjects\/41587_2023_1678_Fig3_HTML.png\" alt=\"Science &amp; Nature figure 3\" loading=\"lazy\" width=\"685\" height=\"871\"><\/picture><\/a><\/div>\n<p><b>a<\/b>, Schematic of molecular steps involved in prime editing. <b>b<\/b>, Normalized pegRNA count derived from sequencing of PCR amplicons from genomic DNA (<i>x<\/i> axis) or PCR amplicons from RNA (<i>y<\/i> axis) for the <i>HEK3<\/i> site in HEK293T cells for individual pegRNAs (markers). Pink, inserts with four or more consecutive adenines. Data represent the average of <i>n<\/i>\u2009=\u20093 biological replicates. <b>c<\/b>, Top panel, average insertion rate relative to length bin median (<i>y<\/i> axis) for inserts stratified by the longest consecutive run of adenines (<i>x<\/i> axis). Bottom panel, instead showing transcription rate (read counts from RNA\/read counts from DNA) on the <i>y<\/i> axis. Data are presented as mean\u2009\u00b1\u2009s.e.m. <i>n<\/i>\u2009=\u20093 biological replicates. <b>d<\/b>, Insertion frequencies at the <i>HEK3<\/i> site in HEK293T using the standard MMLV reverse transcriptase (PE2, <i>x<\/i> axis) and the FeLV reverse transcriptase (PE-FeLV, <i>y<\/i> axis) for different insertion sequences (markers). Colors, number of neighboring points. <i>n<\/i>\u2009=\u20093 biological replicates. <b>e<\/b>, As <b>d<\/b>, but comparing PE3 and PE2 at the <i>EMX1<\/i> site. <b>f<\/b>, Schematic of screens with overexpression constructs. <b>g<\/b>, Insertion frequencies for different overexpressions (<i>y<\/i> axis and panels) compared with no overexpression (<i>x<\/i> axis) for three biological replicate screens (markers) stratified by insertion sequence lengths (colors). <b>h<\/b>, Average insertion rates (<i>y<\/i> axis) across insert lengths (<i>x<\/i> axis) with at least 30 measured sequences for overexpression constructs (colors). Data are presented as mean\u2009\u00b1\u2009s.e.m. <i>n<\/i>\u2009=\u20093 biological replicates. <b>i<\/b>, As <b>h<\/b>, but instead displaying the insertion rate fold changes of screens with overexpressions compared with no overexpression (<i>y<\/i> axis), calculated from the ratio of sums of all sequences (lines) or of ten randomly sampled sequences. <b>j<\/b>, Top, average insertion frequency (grayscale) of four sequences with varying lengths (<i>x<\/i> axis) when overexpressing eGFP stratified by homology arm lengths (panels). Bottom, insertion rate fold changes compared with eGFP (<i>y<\/i> axis) when overexpressing TREX1 and TREX2 (colors). <i>n<\/i>\u2009=\u20092 biological replicates. <b>k<\/b>, Fraction of the nontemplated adenine allele at the +9 position (<i>y<\/i> axis) for cells with overexpression constructs (<i>x<\/i> axis and colors) stratified by experiment and homology arm lengths (panels). Markers show screen averages from three biological replicates for the pooled screen or from separate pegRNAs for the individual validation experiment.<\/p>\n<\/div>\n<p xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\"><a data-test=\"article-link\" data-track=\"click\" data-track-label=\"button\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/3\" data-track-dest=\"link:Figure3 Full size image\" aria-label=\"Reference 3\"88 rel=\"nofollow\"><span>Full size image<\/span><\/a><\/p>\n<\/figure>\n<\/div>\n<p>We first assessed expression levels of pegRNAs targeting the <i>HEK3<\/i> site in HEK293T cells using deep sequencing. Abundance in the transcriptome was well correlated between replicates (median <i>R<\/i>\u2009=\u20090.97; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">4a<\/a>) and with the DNA-derived read count frequency (<i>R<\/i>\u2009=\u20090.56; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4b<\/a>). The exceptions were sequences that resulted in four or more consecutive thymines on the pegRNA cassette (adenines in the inserted DNA), which act as transcription terminators for RNA polymerase III (refs. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\"99 title=\"Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR\/Cas system. Cell 155, 1479\u20131491 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR36\" id=\"ref-link-section-d69891743e1168\">36<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"00 title=\"Porrua, O., Boudvillain, M. &#038; Libri, D. Transcription termination: variations on common themes. Trends Genet. 32, 508\u2013522 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR37\" id=\"ref-link-section-d69891743e1171\">37<\/a><\/sup>). Upon removing pegRNAs with terminator motifs, the correlation between measured DNA and RNA sequence coverage increased to 0.59 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3b<\/a>). Sequences with four or more consecutive adenines were 4.8-fold less expressed and, accordingly, their average insertion rate was 4.8-fold lower compared with other sequences (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3c<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">4b<\/a>). Overall, 23 of the 24 inserts (96%) that were not observed in any screen contained at least one run of four or more adenines, highlighting this feature as a useful filter in pegRNA design.<\/p>\n<div data-test=\"figure\" data-container-section=\"figure\" id=\"figure-4\" data-title=\"Cytosine content and secondary structure of the insert sequence are positively correlated with the insertion rate.\">\n<figure><figcaption><b id=\"Fig4\" data-test=\"figure-caption-text\">Fig. 4: Cytosine content and secondary structure of the insert sequence are positively correlated with the insertion rate.<\/b><\/figcaption><div>\n<div><a data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/4\" rel=\"nofollow\"><picture><source type=\"image\/webp\" ><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"http:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41587-023-01678-y\/MediaObjects\/41587_2023_1678_Fig4_HTML.png\" alt=\"Science &amp; Nature figure 4\" loading=\"lazy\" width=\"685\" height=\"615\"><\/picture><\/a><\/div>\n<p><b>a<\/b>, Correlation of length-normalized insertion rate with nucleotide frequency in the insert (colors) for each nucleotide (<i>y<\/i> axis) in each screen (<i>x<\/i> axis). Data represent the average of <i>n<\/i>\u2009=\u20093 (HEK293T) or <i>n<\/i>\u2009=\u20092 (HAP1) biological replicates. <b>b<\/b>, As <b>a<\/b>, but for a new set of screens with 18-nt inserts and 15-nt homology arms targeting five novel sites within 1\u2009kb of the <i>HEK3<\/i> site. <b>c<\/b>, Insertion rate at the <i>HEK3<\/i> site in HEK293T cells relative to length bin median (<i>y<\/i> axis) for inserts (markers) with different cytosine content (<i>x<\/i> axis). Line, linear regression fit; shaded area, 95% posterior confidence interval of the fit. Data represent the average of <i>n<\/i>\u2009=\u20093 biological replicates. <b>d<\/b>, Insertion rates at the <i>HEK3<\/i> site in HEK293T cells relative to length bin median (<i>y<\/i> axis) for inserts (markers) with calculated Gibbs free energy (\u2206<i>G<\/i>) from ViennaFold (<i>x<\/i> axis). Line, linear regression fit; shaded area, 95% posterior confidence interval of the fit. Data represent the average of <i>n<\/i>\u2009=\u20093 biological replicates. <b>e<\/b>, Correlation (<i>x<\/i> axis) between insertion rates and insert sequence free energy calculated from different parts of the 3\u2032 extension (<i>y<\/i> axis). Box, median and quartiles; whiskers, least extreme of 1.5 times the interquartile range from the quartile and most extreme values. <i>n<\/i>\u2009=\u20093 (HEK293T) or <i>n<\/i>\u2009=\u20092 (HAP1) biological replicates. <b>f<\/b>, Insertion rates for sequences (markers) at the <i>HEK3<\/i> site in HEK293T for pegRNAs (<i>x<\/i> axis) and epegRNAs (<i>y<\/i> axis). Data represent the average of <i>n<\/i>\u2009=\u20093 biological replicates. <b>g<\/b>, Percentage increase in insertion rate with each standard deviation increase in structure strength (colors) for different overexpression constructs (<i>x<\/i> axis) and insertion sequence lengths (<i>y<\/i> axis). <b>h<\/b>, Insertion rates relative to length bin median (<i>y<\/i> axis) for sequences that disrupt or preserve (<i>x<\/i> axis) scaffold loops (panels). Colored lines show screen medians and the thicker black lines and dots show the median across all screens. <b>i<\/b>, The predicted secondary structure of a 66-nt insert sequence (ELMI003108) with the <i>HEK3<\/i> homology arm.<\/p>\n<\/div>\n<p xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\"><a data-test=\"article-link\" data-track=\"click\" data-track-label=\"button\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/4\" data-track-dest=\"link:Figure4 Full size image\" aria-label=\"Reference 4\"11 rel=\"nofollow\"><span>Full size image<\/span><\/a><\/p>\n<\/figure>\n<\/div>\n<p>Second, to disentangle the contribution of the reverse transcription step, we made a prime editor construct with the nicking Cas9 fused to an engineered feline leukemia virus reverse transcriptase (MashUp RT: pipettejockey.com) with similar fidelity to the murine leukemia virus one used in prime editor 2. The average insertion rates observed using this construct were 6.7-fold lower compared with the standard PE2 (0.72% and 4.86%, respectively; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">5a\u2013d<\/a>), but highly correlated to PE2 (<i>R<\/i>\u2009=\u20090.80; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3d<\/a>). Therefore, the effects of the insert sequence on insertion are not specific to the murine reverse transcriptase used in PE2 and highlight the possibility to perform prime editing experiments with alternative constructs.<\/p>\n<p>The PE3 system includes an additional guide RNA to nick the nonedited strand, which increases editing efficiency as well as indel formation rate<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"22 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e1335\">12<\/a><\/sup>. We explored how the addition of this extra sgRNA affects the insertion frequencies of our library. We chose the <i>EMX1<\/i> locus in HEK293T cells where we observed poor insertion efficiencies of 0.28% on average without the nicking guide RNA and cotransfected a nicking guide RNA that targets 77\u2009nt downstream of the pegRNA target<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"33 title=\"Liu, P. et al. Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice. Nat. Commun. 12, 2121 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR38\" id=\"ref-link-section-d69891743e1342\">38<\/a><\/sup>. We found that the extra nick increased the average insertion rate by 5.6-fold to 1.5% (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">5d\u2013g<\/a>), and increased the indel rate by 2.3-fold to 0.31%, including deletions between the nick sites of the pegRNA and sgRNA that were not observed for PE2 (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">5h<\/a>). Importantly, the relative insertion rates for sequences in the library were highly concordant between PE2 and PE3 in HEK293T cells (<i>R<\/i>\u2009=\u20090.84; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4f<\/a>).<\/p>\n<p>An important step in prime editing is to resolve between the intermediates with a 5\u2032 flap (containing the wild-type sequence) or a 3\u2032 flap (containing the insertion) that compete. We speculated that the activity of the respective flap nucleases can steer the balance between the two outcomes. To test this, we overexpressed the 5\u2032 flap nuclease FEN1 and the 3\u2032 flap nucleases TREX1 and TREX2 in the context of the <i>HEK3<\/i> site-targeting screen in HEK293T cells. As a control, we overexpressed eGFP in the same backbone used for the nucleases (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3f<\/a>). The insertion rates after FEN1 or eGFP overexpression were highly correlated to those measured in screens without overexpression (<i>R<\/i>\u2009=\u20090.93 and 0.97; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3g<\/a>) with similar length dependence (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3h<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">6a\u2013d<\/a>). Intriguingly, TREX1 and TREX2 overexpression abolished the insertion of longer sequences. For cells that did not overexpress nucleases or overexpressed eGFP, the average insertion rate for sequences longer than 4\u2009nt was 4.4\u20136.0% which is 4.4\u20135.8 times less than for shorter sequences. This is in contrast to cells overexpressing TREX1 and TREX2, where the average insertion rate for sequences >4\u2009nt was only 0.66% or 0.97%, 25.3\u201326.7-fold lower than that of shorter ones (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3h,i<\/a>).<\/p>\n<p>We confirmed that TREX1 and TREX2 antagonize prime insertions in a length-dependent manner. To do so, we cotransfected HEK293T cells with overexpression constructs encoding eGFP, TREX1 or TREX2 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3f<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">6e<\/a>) and individual pegRNAs targeting the <i>HEK3<\/i> site encoding a 1-, 3-, 9- or 30-nt insertion (C, CAG, BCL6 binding site and Myc-tag) in the context of 25- or 34-nt homology arms (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3j<\/a>). Overexpressing TREX1 and TREX2 decreased editing rates across all insert and homology arm lengths, but disproportionately more for longer inserts (1.6\u20133.0-fold for the 1-nt insertion compared with 20\u2013108-fold for the 30-nt insertion; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3k<\/a>). This effect could be driven by the length of the insert sequence alone or of the entire 3\u2032 flap (corresponding to insertion\u2009+\u2009homology arm). In line with the results from our pooled screens (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3i<\/a>), we observed a strong correlation between the log fold change of insertion rates for TREX1\/2 over eGFP and the insert sequence length (<i>R<\/i>\u2009=\u20090.97) which decreased when considering the total extension length (<i>R<\/i>\u2009=\u20090.86\u20130.92; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">6f<\/a>), suggesting a more important role for the insertion length than the overall flap length.<\/p>\n<p>The <i>HEK3<\/i> locus in HEK293T contains a single-nucleotide variation at position 9 after the prime editor nick site. The pegRNA homology arm encodes a G for this position, while one of the three chromosome copies encodes an A. If a 3\u2032 flap containing the edit and at least 9\u2009nt of the homology arm was fixed into the genome, we would expect a decreased frequency of the A allele. Indeed, for both pooled and validation screen conditions without TREX1\/2 overexpression, we only observed 0.95\u20131.6% (screen averages) of reads with library insertions containing A in the +9 position compared with 33\u201336% for unedited reads (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3k<\/a>). This is in contrast to screens overexpressing TREX1\/2 where the percentage of the A allele increased to 3.4\u20136.9%, suggesting a higher proportion of flaps where the homology arm was digested to below 9\u2009nt (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig3\">3k<\/a>). Taken together, our data demonstrate that TREX1\/2 antagonize the insertion of longer sequences with prime editing, presumably by digesting the 3\u2032 flap intermediate containing the edit.<\/p>\n<h3 id=\"Sec5\">Sequence content effects on insertion efficiency<\/h3>\n<p>We next examined sequence content-dependent variation in insertion rate. To address this in a length-independent way, we calculated the insertion rate of each insert relative to sequences with the same or similar length (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec8\">Methods<\/a>) and then measured its correlation with sequence features, computed from the perspective of the written sequence (that is, the reverse complement of the pegRNA molecule sequence). We observed a consistent cytosine preference across all four target sites and cell lines (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4a<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">7a<\/a>), with each extra percentage of cytosine in the insert increasing the relative insertion rate by an average of 2.2%. Conversely, the percentages of adenine and thymine decreased insertion rates for all loci and cell lines (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4a<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">7a<\/a>).<\/p>\n<p>Our observations of nucleotide content effect were limited to four target sites, and moderately variable. To confirm whether the sequence influences hold more broadly, we performed an additional set of screens in HEK293T cells, targeting the original <i>HEK3<\/i> site and five novel sites within 1\u2009kilobase (kb) of the <i>HEK3<\/i> site (dubbed <i>HEK3-S2<\/i> to <i>HEK3-S6<\/i>) with pegRNA libraries encoding 356\u2013388 18-nt inserts on pegRNAs with 15-nt homology arms (average insertion rate 3.2%, median <i>R<\/i> between replicates 0.81; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">7b<\/a>). Reassuringly, the sequence preferences were recapitulated in this experiment, with a strong preference for cytosines (average <i>R<\/i> between insertion rate and cytosine fraction\u2009=\u20090.47; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4b<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">7c<\/a>).<\/p>\n<p>We next sought to understand how pegRNA secondary structure affects insertion rates. As the strength of the structure depends on the length of the insert, we calculated the secondary structure\u2019s free energy relative to a large sample of sequences of the same length (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec8\">Methods<\/a>). We observed that sequences with relatively stronger structures were more efficiently inserted (<i>R<\/i>\u2009=\u20090.46; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4d<\/a>). To better understand this effect, we considered which combination of the pegRNA parts (primer binding site, insert and homology arm) gives predicted free energies that best reflect insertion efficacy. We observed the strongest correlation when the structure was calculated from the reverse transcribed portion of the extension (that is, the combination of insert sequence and homology arm; average <i>R<\/i> across screens\u2009=\u20090.38), and the additional inclusion of the primer binding site sequence decreased correlation (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4e<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">8a,b<\/a>). Further, the free energies of pegRNA extensions designed for one target site always predicted insertion efficiency better at the same site than other target sites (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">8c<\/a>). Since the homology arm is specific to the target, this also explains some of the differences in insertion rates we observed across the target sites.<\/p>\n<p>Structure in the insert and homology arm could increase prime editing efficiency by protecting the pegRNA itself from nuclease degradation, a strategy explored in engineered pegRNAs (epegRNAs) which contain structured RNA elements to the 3\u2032 of the primer binding site<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"44 title=\"Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402\u2013410 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR21\" id=\"ref-link-section-d69891743e1510\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"55 title=\"Li, X. et al. Enhancing prime editing efficiency by modified pegRNA with RNA G-quadruplexes. J. Mol. Cell. Biol. 14, mjac022 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR39\" id=\"ref-link-section-d69891743e1513\">39<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"66 title=\"Zhang, G. et al. Enhancement of prime editing via xrRNA motif-joined pegRNA. Nat. Commun. 13, 1856 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR40\" id=\"ref-link-section-d69891743e1516\">40<\/a><\/sup>. However, we did not observe an increased abundance of more structured pegRNAs in the transcriptome (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">8d<\/a>), suggesting an alternative mechanism. To better understand the interplay of structure in various parts of the pegRNA and how it affects insertion rates, we screened 439 inserts of varying free energy from the original pegRNA library in the epegRNA construct, targeting the <i>HEK3<\/i> site in HEK293T cells (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">8e\u2013g<\/a>). We found that the additional structure in the insert and homology arm also increased insertion rates for epegRNAs (<i>R<\/i>\u2009=\u2009\u22120.34) but to a lesser extent than for regular pegRNAs (<i>R<\/i>\u2009=\u2009\u22120.53; Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">8h<\/a>), and that the insertion rates between regular and epegRNAs were highly correlated (<i>R<\/i>\u2009=\u20090.79; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4f<\/a>). Together, this implies that structure past the protective cap still influences insertion rates via ways beyond transcript abundance, and that our results on insertion efficiencies are relevant for epegRNAs as well.<\/p>\n<p>We further noticed that structure in the reverse transcribed portion of the pegRNA was not correlated to the insertion rates of sequences <5\u2009nt, but was well correlated for longer sequences (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4g<\/a>). Since insertion rates of longer sequences are more impacted by overexpression of TREX1 and TREX2, we speculated that the structure protects the reverse transcribed 3\u2032 DNA flap containing the edit from degradation. Indeed, we observed that structure has a 2.4\u20132.6-fold stronger effect for cells overexpressing TREX1 or TREX2 compared with cells overexpressing FEN1, eGFP or nothing (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4g<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">9a\u2013d<\/a>).<\/p>\n<p>Structure plays a role in other parts of the pegRNA molecule as well. For instance, the 13\u2009nt of the primer binding site are perfectly complementary to the protospacer (positions 5\u201317) and can therefore hybridize with each other. If the first nucleotides of the insert create further base pairing with the protospacer and scaffold, the strength of this structure is enhanced, and the protospacer could be sequestered from base pairing with the target site or ribonucleoprotein complex formation with Cas9 could be impaired. To test if this additional pairing affects insertion rates, we predicted minimum free energy configurations of the primer binding site and the first three insert nucleotides with the spacer and the first guanine of the scaffold and observed 27% lower editing rates for inserts with extended base pairing 3\u2009nt into the protospacer compared with no extension (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">10a<\/a>). Finally, we tested if the disruption of the structural scaffold loops, which are required for association with Cas9, by the insert sequence reduces insertion rates. We calculated the minimum free energy configuration of the insert with the scaffold and observed 26% lower average editing for the pegRNAs with the first scaffold loop disrupted (screen range 10\u201343%) and 20% with the second and third loops (screen range 11\u201335%) compared with other inserts of the same length (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4h<\/a>). This loop dependence is in agreement with recent findings that scaffold variants with additional point mutations to stabilize the stem-loops can increase prime editing efficiencies<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"77 title=\"Li, X. et al. Highly efficient prime editing by introducing same-sense mutations in pegRNA or stabilizing its structure. Nat. Commun. 13, 1669 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR41\" id=\"ref-link-section-d69891743e1567\">41<\/a><\/sup>.<\/p>\n<p>Combining effects of insert sequence length, cytosine content and structure explained why some sequences are inserted much better than others. For example, the long 66-nt ELMI003108 sequence that was inserted in the <i>HEK3<\/i> locus at 1.39% insertion frequency (0.66% on average for the other 10 sequences >66\u2009nt) formed a strong structure together with the <i>HEK3<\/i> homology arm (minimum free energy\u2009=\u2009\u221235.2\u2009kcal\u2009mol<sup>\u22121<\/sup>; 1.5\u2009s.d. lower than the average free energy of 66-nt sequences; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig4\">4i<\/a>). Other longer sequences that inserted frequently relative to their size were recombinase sites which are often near-palindromic and therefore form strong structures (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">10b,c<\/a>). Finally, our library included eight codon variations of the His-6 tag in forward and reverse orientations. The average insertion difference between the best codon variant and the worst was 13.3-fold, with the highest insertion rate for the cytosine-richest CAC histidine codons (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">10d<\/a>). This directly demonstrates the practical utility of this new understanding for guiding the codon choice for tags to insert (see the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">Supplementary Note<\/a> for a more thorough discussion).<\/p>\n<h3 id=\"Sec6\">Predicting insertion rates<\/h3>\n<p>Given our improved understanding of prime insertion rates, we next aimed to predict the relative efficiencies of inserting different sequences into the same site. We extracted 53 salient features such as insert length, nucleotide composition and folding energy for each pegRNA in eight screens (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5a<\/a>, Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">11<\/a>), and used tenfold cross-validation to select an accurate model (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec8\">Methods<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">12<\/a>). Based on feature correlations, their marginal effect we uncovered above and interpretability, we manually picked a final set of ten features, such that adding the remaining 43 extracted features did not improve the model performance further on the training data (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5b<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">12<\/a>). The contribution of individual features to prediction reflected the understanding developed above: insert sequence length, the secondary structure of the pegRNA and reverse transcribed sequence, sequence composition and MMR each had a substantial impact, and the direction of these effects was consistent with expectations (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5c<\/a> and Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1<\/a>). The final model trained on the full training set achieved a correlation of 0.68 on held-out sequences, with performance ranging from <i>R<\/i>\u2009=\u20090.44 to 0.92 when restricted to individual screens, exceeding correlation of individual biological replicates in noisier ones (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5d<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">12<\/a>). We call this method MinsePIE (Modeling insertion efficiency for Prime Insertion Experiments) and incorporated it into a package available at <a href=\"https:\/\/github.com\/julianeweller\/MinsePIE\">https:\/\/github.com\/julianeweller\/MinsePIE<\/a>, and produced a web application to predict prime editing insertion rates at <a href=\"https:\/\/elixir.ut.ee\/minsepie\/\">https:\/\/elixir.ut.ee\/minsepie\/<\/a>.<\/p>\n<div data-test=\"figure\" data-container-section=\"figure\" id=\"figure-5\" data-title=\"Predicting prime insertion efficiencies.\">\n<figure><figcaption><b id=\"Fig5\" data-test=\"figure-caption-text\">Fig. 5: Predicting prime insertion efficiencies.<\/b><\/figcaption><div>\n<div><a data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/5\" rel=\"nofollow\"><picture><source type=\"image\/webp\" ><img decoding=\"async\" aria-describedby=\"Fig5\" src=\"http:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41587-023-01678-y\/MediaObjects\/41587_2023_1678_Fig5_HTML.png\" alt=\"Science &amp; Nature figure 5\" loading=\"lazy\" width=\"685\" height=\"634\"><\/picture><\/a><\/div>\n<p><b>a<\/b>, Schematic representation of model features. <b>b<\/b>, Tenfold cross-validation model performance on the training set (<i>y<\/i> axis) using different feature sets. System: MMR proficiency and Oligo(A) length. Sequence effects: length, reverse transcriptase template (RTT) structure, nucleotide composition and all of them combined (\u2018Total\u2019). Model: combination of ten features. Extra: 53 features. Dashed line, median of \u2018Model\u2019. Box, median and quartiles. Whiskers, 1.5 times interquartile range. <b>c<\/b>, Feature importance. Left, distribution of SHAP values (<i>x<\/i> axis) for each feature (<i>y<\/i> axis, colors). Right, respective mean absolute SHAP values (<i>x<\/i> axis). <b>d<\/b>, Concordance of predicted (<i>y<\/i> axis) and observed (<i>x<\/i> axis) insertion efficiencies on the held-out test set (markers). Solid line, <i>y<\/i>\u2009=\u2009<i>x<\/i>. Label, Pearson\u2019s <i>R<\/i>. An additional 18 points are beyond the plot limits (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">12<\/a>). <b>e<\/b>, Concordance of predicted and observed values at new sites. Pearson\u2019s <i>R<\/i> between predicted and observed normalized insertion efficiencies (<i>y<\/i> axis) for 356\u2013388 18-nt sequences inserted into six different sites within the HEK3 locus (left bars) and 66 codon variants of six protein tags into nine sites in HEK293T cells (right bars). Line, performance on the dataset from <b>d<\/b>. <b>f<\/b>, Mean replicate correlation (light gray) \u00b1s.e.m. and concordance of predicted and observed rates (yellow) on 6- and 9-nt insertions (63 and 1,908 sequences, respectively) at the TAPE-1 target from (ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"88 title=\"Choi, J. et al. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 608, 98\u2013107 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR42\" id=\"ref-link-section-d69891743e1726\">42<\/a><\/sup>). <b>g<\/b>, Distribution of Pearson\u2019s <i>R<\/i> between observed and predicted insertion rates (<i>x<\/i> axis) of seven insertions into 134 loci from (ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\"99 title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR17\" id=\"ref-link-section-d69891743e1739\">17<\/a><\/sup>). Dashed line, median. <b>h<\/b>\u2013<b>j<\/b>, Measured insertion rates of predicted high- and low-inserting codon versions of six protein tags into nine sites. <b>h<\/b>, Measurements of insertion rate relative to mean insertion rate of codon sequences (<i>y<\/i> axis, colors) separated into predicted to be highly and lowly inserting (<i>x<\/i> axis). <b>i<\/b>, Insertion rates (<i>x<\/i> axis) of codon variants (markers) of six protein tags (<i>y<\/i> axis) into the NOLC1 site in HEK293T cells. Red, large predicted rate; blue, low predicted rate. Bar and whiskers, mean\u2009\u00b1\u2009s.e.m. <b>j<\/b>, Concordance of observed and predicted insertion rates of all sequences for all target sites and codon variants. <b>k<\/b>, Effect of padding. Insertion rates (<i>y<\/i> axis) of three sequences (<i>x<\/i> axis) inserted without modification (gray) and padded with optimally predicted sequences to 18\u2009nt (green).<\/p>\n<\/div>\n<p xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\"><a data-test=\"article-link\" data-track=\"click\" data-track-label=\"button\" data-track-action=\"view figure\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y\/figures\/5\" data-track-dest=\"link:Figure5 Full size image\" aria-label=\"Reference 5\"00 rel=\"nofollow\"><span>Full size image<\/span><\/a><\/p>\n<\/figure>\n<\/div>\n<p>After establishing and interpreting the model, we next tested whether its predictions extrapolate to observations beyond our original screening context (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">13<\/a>). We first measured insertion efficiencies of 356\u2013388 sequences of 18\u2009nt into the <i>HEK3<\/i> and five novel nearby sites, as well as insertions of 66 codon versions of different protein tags in nine novel sites. In spite of new insert sequences, previously unobserved target sites and shorter 15-nt homology arms, the MinsePIE model predicted relative insertion efficacies well, with Pearson\u2019s <i>R<\/i> of 0.46\u20130.95, compared with replicate reproducibility of <i>R<\/i>\u2009=\u20090.36\u20130.98 (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5e<\/a> and Supplementary Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">7b<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">14<\/a>). We then assessed generalizability on external datasets. A recent study by Choi et al. inserted 63 6-nt and 1,908 9-nt sequences (NNNGGA and NNNNNNGGA) into the synthetic, genome-integrated TAPE-1 target sequence using a 13-nt primer binding sequence and a 9-nt homology arm<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"11 title=\"Choi, J. et al. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 608, 98\u2013107 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR42\" id=\"ref-link-section-d69891743e1814\">42<\/a><\/sup>. MinsePIE prediction quality was close to measurement repeatability (<i>R<\/i>\u2009=\u20090.63 and 0.37 for 6-nt and 9-nt insertions, respectively, for prediction versus measurement; <i>R<\/i>\u2009=\u20090.73 and 0.33 for replicate versus replicate; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5f<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">15a,b<\/a>). Finally, to evaluate MinsePIE performance at many unseen target sites, we predicted the insertion rates of A, C, G, T, AG, AGGAA and AGGAATCATG sequences into 134 loci using pegRNAs with 13-nt primer binding sites and 14-nt homology arms as measured by Kim et al.<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"22 title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR17\" id=\"ref-link-section-d69891743e1831\">17<\/a><\/sup>. The median prediction accuracy for these sites was <i>R<\/i>\u2009=\u20090.68 (range 0.0\u20130.97; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5g<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">15<\/a>), which is consistent with the observed model performance on other internal and external datasets.<\/p>\n<p>A predictive model of insertion rate will be useful for experimental optimization, such as selecting the best nucleotide sequence to insert for the common task of tagging endogenous proteins. We used MinsePIE to predict high- and low-performing codon variants of six different protein tags frequently used in molecular biology: His-6, HiBiT<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"33 title=\"Oh-Hashi, K., Furuta, E., Fujimura, K. &#038; Hirata, Y. Application of a novel HiBiT peptide tag for monitoring ATF4 protein expression in Neuro2a cells. Biochem. Biophys. Rep. 12, 40\u201345 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR43\" id=\"ref-link-section-d69891743e1847\">43<\/a><\/sup>, glycine-rich linker, mNeongreen-11 (ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"44 title=\"Feng, S. et al. Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 8, 370 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR28\" id=\"ref-link-section-d69891743e1851\">28<\/a><\/sup>), mNeongreen-11 endowed with a linker and a drug-inducible superdegron<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"55 title=\"Jan, M. et al. Reversible ON- and OFF-switch chimeric antigen receptors controlled by lenalidomide. Sci. Transl. Med. 13, eabb6295 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR44\" id=\"ref-link-section-d69891743e1855\">44<\/a><\/sup>, to generate in-frame fusions for <i>ACTB<\/i>, <i>LMNB1<\/i>, <i>NOLC1<\/i>, <i>RNF2<\/i> and <i>TP53<\/i> using pegRNAs that targeted both the forward and the reverse strand. We then tested the predicted sequences experimentally and observed a higher relative insertion rate of codon variants predicted to insert well compared with variants predicted to insert at low rates (median fold increase of 1.63; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5h,i<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">14<\/a>). This demonstrates the advantage of codon-optimization with the MinsePIE model. Beyond grouping into highly and lowly predicted sets, the measured insertion rates of all sequences correlated well with model predictions (<i>R<\/i>\u2009=\u20090.78; Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5j<\/a>). Finally, since sequences between 15 and 21\u2009nt were inserted more efficiently than 10\u201314-nt ones, we hypothesized that padding shorter sequences to 18\u2009nt will increase their insertion rates. We used our model to predict optimal padding sequences for three 12\u201313-nt sequences: a BRE-TATA box element, an endoplasmic reticulum retention (ERret) signal and a consensus splice site, and observed an average increase of 1.4-fold in insertion efficiency when using the padded sequences over the unmodified ones (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig5\">5k<\/a>). Together, these results demonstrate that our computational model can generalize to novel target sites and can help choose the most efficient sequences to write into the genome.<\/p>\n<\/div>\n<\/div>\n<div id=\"Sec7-section\" data-title=\"Discussion\">\n<h2 id=\"Sec7\">Discussion<\/h2>\n<div id=\"Sec7-content\">\n<p>We presented a comprehensive analysis of prime editing insertion efficiencies using 3,604 pegRNAs and diverse follow-up experiments (summarized in Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">17<\/a>). We found that short sequences insert with predictable frequencies across cell lines, target sites, repair contexts and prime editor systems based on their length, cytosine content and tendency to form secondary structure. We discovered that overexpression of the 3\u2032 flap nucleases TREX1 and TREX2 inhibited the insertion of longer sequences, and confirmed that active MMR antagonizes the insertion of shorter ones. The sequence and repair features, through MinsePIE, enable accurate prediction of relative insertion rates for novel sequences, and facilitate optimal design choices for writing short stretches of DNA into genomes.<\/p>\n<p>We uncovered a complex relationship between insertion sequence features and efficiency that is shaped by DNA processing and repair mechanisms. For the shortest sequences of up to 10\u2009nt, it is increasingly appreciated that MMR proficiency is a strong factor<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"66 title=\"Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635\u20135652.e29 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR20\" id=\"ref-link-section-d69891743e1906\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"77 title=\"Ferreira da Silva, J. et al. Prime editing efficiency and fidelity are enhanced in the absence of mismatch repair. Nat. Commun. 13, 760 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR35\" id=\"ref-link-section-d69891743e1909\">35<\/a><\/sup>, and we directly and comprehensively reaffirm this connection here. Surprisingly to us, sequences between 15 and 21\u2009nt could insert at higher rates than shorter ones in MMR-proficient cells, and elongating the insertion can improve its insertion efficacy. This effect is likely due to a combination of antagonization by MMR for the shortest sequences, and the potential steric issues for the 10\u201314-nt ones.<\/p>\n<p>Sequences longer than 30\u2009nt are incorporated less frequently. This could partly be explained by our discovery that the 3\u2032 flap nucleases TREX1 and TREX2 antagonize prime editing in an insert sequence length-dependent way. One explanation, supported by our observation that more structured long sequences insert at higher frequencies due to factors beyond RNA stability, is that DNA flaps with longer insertions and less structure likely spend more time in a nonhybridized state and expose more single-stranded DNA even when hybridized, thus making them more vulnerable to nuclease degradation. This demonstrates that flap nucleases modulate prime editing, which motivates strategies for the next generation of long sequence insertions.<\/p>\n<p>We further discovered that stronger secondary structure of the pegRNA homology arm and insert sequence led to higher insertion efficiency. This effect was evident when comparing different inserts into the same target, but also explained variable rates when attempting to write the same sequence into different target sites. We observed strong correlations between structure and insertion rates in the context of epegRNAs as well, and correlation was highest when the structure was confined to the insert and the homology arm, indicating that the effects of structures in these two regions are separate. Therefore, we hypothesize that while the epegRNA structure improves editing rates by preventing degradation of the RNA 3\u2032 extension, structure in the transcribed template does so by preventing degradation of the single stranded DNA flap intermediate by flap nucleases. Indeed, flap nucleases had a smaller impact on insertions which resulted in more structured flaps. Alternatively, structured inserts could ease pairing of the edited strand with the nonedited strand due to being sterically smaller via folding onto themselves.<\/p>\n<p>Our improved understanding of insertion efficiency using the prime editing system naturally leads to recommendations for experimental design. First, we suggest choosing sequences with high cytosine content that are prone to form secondary structures. Inserts with runs of adenines should be avoided when using the U6 promoters for pegRNAs. For sequences shorter than 14\u2009nt, transiently inhibiting MMR (as implemented in PE4 or PE5 systems)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"88 title=\"Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635\u20135652.e29 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR20\" id=\"ref-link-section-d69891743e1923\">20<\/a><\/sup>, or knocking out <i>MLH1<\/i>, will drastically improve insertion rates in MMR-proficient cells. If MMR inhibition is undesired, padding the sequences to 18\u2009nt or installing additional silent mutations on the reverse transcriptase template can increase insertion rates.<\/p>\n<p>We put these recommendations to the test, and greatly improved the efficiency of protein tagging. For example, the His-6 tag, especially if choosing the CAC codon, inserts almost six times as well as the next best tag in our library (Myc-tag). To correct pathogenic deletions, our model can help prioritize targets and pick high-efficiency replacement sequences (for example, through codon variation). We provide empirical measurements on insertion efficiency into multiple target sites for over 100 useful sequences (Supplementary Data <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2<\/a>). For predicting the insertion efficiency of novel sequences, we provide the MinsePIE algorithm as a command-line package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\"99 title=\"Weller, J. et al. MinsePIE: Modelling insertion efficiency for Prime Insertion Experiments (Version 3.0). Zenodo \n                https:\/\/doi.org\/10.5281\/zenodo.7505816\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR45\" id=\"ref-link-section-d69891743e1936\">45<\/a><\/sup> and user-friendly website (<a href=\"https:\/\/elixir.ut.ee\/minsepie\/\">https:\/\/elixir.ut.ee\/minsepie\/<\/a>).<\/p>\n<p>Our study measures thousands of sequences in up to 18 target sites in three cell lines across four prime editor systems. Nevertheless, our insights and the models we built have limitations. First, we measured on-target insertion, and predicted the relative insertion rate of intended sequence, but did not assay genome-wide off-target editing, or model the insertion of nontemplated or mutated sequences that we observed to be rare. Other efforts have comprehensively characterized inserting a small number of edits into a large number of synthetic target sites<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"00 title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR17\" id=\"ref-link-section-d69891743e1950\">17<\/a><\/sup>, and our model performs well to predict the relative efficiency on the majority of these data. A few target sites remained where our model did not perform well and datasets with diverse insertions into many more target sites will be needed to improve the predictions further. While the small number of sites we included limits our ability to model the target site effect, and guide RNA efficacy scores did not account for the target site influence, we believe that some features we uncovered (structure in the reverse transcriptase template, percentage of cytosines, disruption of the scaffold<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"11 title=\"Li, X. et al. Highly efficient prime editing by introducing same-sense mutations in pegRNA or stabilizing its structure. Nat. Commun. 13, 1669 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR41\" id=\"ref-link-section-d69891743e1954\">41<\/a><\/sup> and so on) also explain differences between efficiencies of pegRNAs more broadly and for edits beyond insertions.<\/p>\n<p>The prime editing field is moving rapidly<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"22 title=\"Chen, P. J. &#038; Liu, D. R. Prime editing for precise and highly versatile genome manipulation. Nat. Rev. Genet. \n                https:\/\/doi.org\/10.1038\/s41576-022-00541-1\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR15\" id=\"ref-link-section-d69891743e1961\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"33 title=\"Scholefield, J. &#038; Harrison, P. T. Prime editing\u2014an update on the field. Gene Ther. 28, 396\u2013401 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR46\" id=\"ref-link-section-d69891743e1964\">46<\/a><\/sup>. Diverse applications are already emerging<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"44 title=\"Erwood, S. et al. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 40, 885\u2013895 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR47\" id=\"ref-link-section-d69891743e1968\">47<\/a><\/sup> and some of the most exciting ones are specifically built around the insertion of short sequences. Examples include insertion of recombinase sites using prime editing to enable directed insertion of large DNA cargo of up to 36\u2009kb (refs. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"55 title=\"Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol. 40, 731\u2013740 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR1\" id=\"ref-link-section-d69891743e1972\">1<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"66 title=\"Yarnall, M. T. N. et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. Nat. Biotechnol. \n                https:\/\/doi.org\/10.1038\/s41587-022-01527-4\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR2\" id=\"ref-link-section-d69891743e1975\">2<\/a><\/sup>), creating long deletions and insertions using paired pegRNAs<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"77 title=\"Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol. 40, 731\u2013740 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR1\" id=\"ref-link-section-d69891743e1979\">1<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Wang, J. et al. Efficient targeted insertion of large DNA fragments without DNA donors. Nat. Methods 19, 331\u2013340 (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR48\" id=\"ref-link-section-d69891743e1982\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Jiang, T., Zhang, X.-O., Weng, Z. &#038; Xue, W. Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol. 40, 227\u2013234 (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR49\" id=\"ref-link-section-d69891743e1982_1\">49<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Choi, J. et al. Precise genomic deletions using paired prime editing. Nat. Biotechnol. 40, 218\u2013226 (2022).\" href=\"http:\/\/www.nature.com\/#ref-CR50\" id=\"ref-link-section-d69891743e1982_2\">50<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"88 title=\"Kweon, J. et al. Targeted genomic translocations and inversions generated using a paired prime editing strategy. Mol. Ther. \n                https:\/\/doi.org\/10.1016\/j.ymthe.2022.09.008\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR51\" id=\"ref-link-section-d69891743e1985\">51<\/a><\/sup>, as well as clever utilization of short sequence insertion to generate a molecular recorder for sequential cellular events<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\"99 title=\"Choi, J. et al. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 608, 98\u2013107 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR42\" id=\"ref-link-section-d69891743e1989\">42<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"00 title=\"Loveless, T. B. et al. Molecular recording of sequential cellular events into DNA. Preprint at bioRxiv \n                https:\/\/doi.org\/10.1101\/2021.11.05.467507\n                \n               (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR52\" id=\"ref-link-section-d69891743e1992\">52<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"11 title=\"Chen, W. et al. Multiplex genomic recording of enhancer and signal transduction activity in mammalian cells. Preprint at bioRxiv \n                https:\/\/doi.org\/10.1101\/2021.11.05.467434\n                \n               (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR53\" id=\"ref-link-section-d69891743e1995\">53<\/a><\/sup>. A better understanding of how cellular determinants<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"22 title=\"Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635\u20135652.e29 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR20\" id=\"ref-link-section-d69891743e2000\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"33 title=\"Nambiar, T. S., Baudrier, L., Billon, P. &#038; Ciccia, A. CRISPR-based genome editing through the lens of DNA repair. Mol. Cell 82, 348\u2013388 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR54\" id=\"ref-link-section-d69891743e2003\">54<\/a><\/sup> and pegRNA features affect prime editing rates<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"44 title=\"Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198\u2013206 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR17\" id=\"ref-link-section-d69891743e2007\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"55 title=\"Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402\u2013410 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR21\" id=\"ref-link-section-d69891743e2010\">21<\/a><\/sup> provides a foundation for these advances. Our work adds the important dimension of short sequence insertion in different DNA repair contexts, which holds promise in enabling both sophisticated genome engineering and the correction of thousands of pathogenic mutations.<\/p>\n<\/div>\n<\/div>\n<div id=\"Sec8-section\" data-title=\"Methods\">\n<h2 id=\"Sec8\">Methods<\/h2>\n<div id=\"Sec8-content\">\n<h3 id=\"Sec9\">Mammalian cell culture<\/h3>\n<p>The human HEK293T cell line was purchased from AMS Biotechnology (EP-CL-0005). The HAP1 WT cell line was provided by Andrew Waters (Wellcome Sanger Institute) and the HAP1 <i>\u2206MLH1<\/i> cell line was purchased from Horizon Discovery (HZGHC000343c022). HEK293T cells were cultured in DMEM (Invitrogen) and HAP1 cells in IMDM (Invitrogen), both supplemented with 10% FCS (Invitrogen), 2\u2009mM glutamine (Invitrogen), 100\u2009U\u2009ml<sup>\u22121<\/sup> penicillin and 100\u2009mg\u2009ml<sup>\u22121<\/sup> streptomycin (Invitrogen) at 37\u2009\u00b0C and 5% CO<sub>2<\/sub>.<\/p>\n<h3 id=\"Sec10\">Primers<\/h3>\n<p>All primers used in this study are listed in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>.<\/p>\n<h3 id=\"Sec11\">Plasmid cloning<\/h3>\n<p>Plasmids generated in this study are listed in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">4<\/a>.<\/p>\n<p><i>pCMV-PE2-P2A-PuroR<\/i> was generated by replacing eGFP from pCMV-PE2-P2A-GFP (Addgene 132776) with PuroR. A gene fragment containing parts of the MMLV reverse transcriptase and the puromycin resistance gene was ordered from IDT (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM3\">5<\/a>). The gene fragment and pCMV-PE2-P2A-GFP were digested using AgeI, purified with the Monarch PCR &#038; DNA Cleanup Kit (NEB) and ligated with T4 DNA ligase (NEB). The ligation product was transformed into XL10-Gold Ultracompetent Cells (Agilent). Plasmid DNA was isolated using the Plasmid Plus Midi Kit (Qiagen).<\/p>\n<p><i>pCMV-PE-FeLV-P2A-EGFP<\/i> was generated by replacing the MMLV coding sequence between the XTEN linker and the 2A cleavage peptide with a synthesized gene fragment from IDT using Gibson Assembly which encodes an IDT human codon-optimized version of the MashUp reverse transcriptase (pipettejockey.com) that is engineered from the Feline Leukemia Virus (UniProt <a href=\"https:\/\/www.uniprot.org\/uniprot\/Q85521\">Q85521<\/a>).<\/p>\n<p><i>pLentiGuide-BlastR<\/i> was generated by replacing the puromycin resistance gene from Lenti_gRNA-Puro (Addgene 84752) with a blasticidin resistance gene. A gene fragment containing parts of the EF1a promoter and the blasticidin resistance gene was ordered from Twist Biosciences (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM3\">5<\/a>). The gene fragment and Lenti_gRNA-Puro were digested using FseI (NEB) and MluI-HF (NEB), purified with the Monarch PCR &#038; DNA Cleanup Kit (NEB), ligated with T4 DNA ligase (NEB) and transformed into XL10-Gold Ultracompetent Cells (Agilent). Plasmid DNA was isolated using the Qiagen Spin Miniprep Kit.<\/p>\n<p><i>pPB-TREG3G-PE2-rtTA3G-P2A-eGFP<\/i> was generated by fusing three gene fragments with restriction cloning. The first part contains the ITR sequences for the PiggyBac transposase, the second part contains prime editor 2 under the control of the third-generation doxycycline-inducible rtTA3G promoter and the third part was synthesized by Twist Biosciences and contains a PGK promoter followed by the rtTA3G protein, a P2A sequence and eGFP.<\/p>\n<p><i>pTwist_FEN1-T2A-tagBFP<\/i>, <i>TREX1-T2A-mScarlet<\/i>, <i>TREX2-T2A-emiRFP670<\/i> and <i>Acceptor-T2A-eGFP<\/i> were ordered from Twist Biosciences in a pTwist EF1 Alpha cloning vector. The protein sequences encoded by the primary transcripts of FEN1, TREX1 and TREX2 were identified on ensembl.org (July 2022), fused with the T2A sequence and the respective fluorophores, and reverse translated into codon-optimized nucleotide sequences (Twist Biosciences).<\/p>\n<p>The pCMV-PE2-P2A-PuroR, pLentiGuide-BlastR and pPB-TREG3G-PE2-rtTA3G-P2A-eGFP plasmids will be made available on Addgene.<\/p>\n<h3 id=\"Sec12\">Generating HAP1 cell lines that stably express prime editor<\/h3>\n<p>HAP1 cell lines expressing prime editors were generated by cotransfecting pCMV-hyPBase<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"66 title=\"Yusa, K., Zhou, L., Li, M. A., Bradley, A. &#038; Craig, N. L. A hyperactive piggyBac transposase for mammalian applications. Proc. Natl Acad. Sci. USA 108, 1531\u20131536 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR55\" id=\"ref-link-section-d69891743e2116\">55<\/a><\/sup> and pPB-TREG3G-PE2-rtTA3G-P2A-eGFP. First, 500,000 HAP1 WT and 500,000 HAP1 <i>\u2206MLH1<\/i> cells were each seeded into one well of a six-well plate one\u2009d before transfection. For each transfection, 3\u2009\u00b5g of each plasmid was mixed with 6\u2009\u00b5l of Plus reagent and 7.5\u2009\u00b5l of Lipofectamine LTX (Invitrogen) reagent, incubated for 30\u2009min and then added to the cells. At two\u2009weeks post transfection, cells were sorted into single clones based on eGFP expression. Two different individual clones were used for each screen.<\/p>\n<h3 id=\"Sec13\">Library design<\/h3>\n<p>Set 1: The insert sequence libraries contained 2,666 unique sequences, made up of useful molecular biology sequences, the eukaryotic motif library (eukaryotic linear motif, ELM) and sequences with strong secondary structure. We designed four separate versions of this library with identical insert sequences to target the <i>CLYBL<\/i>, <i>EMX1<\/i>, <i>FANCF<\/i> and <i>HEK3<\/i> sites. The pegRNAs contained a 13-nt PBS and a 34-nt homology arm on the reverse transcriptase template. The utility sequences were hand-picked for their usefulness in molecular biology. The ELM instances library with the corresponding fasta file of the genes was downloaded from elm.eu.org\/instances.html?q\u2009=\u2009* (refs. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"77 title=\"Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463\u2013480.e30 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR26\" id=\"ref-link-section-d69891743e2144\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"88 title=\"Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037\u20131043 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR27\" id=\"ref-link-section-d69891743e2147\">27<\/a><\/sup>) on 19 November 2020 and filtered to only contain sequences from \u2018homo sapiens\u2019 that are longer than one\u2009amino acid. The amino acid motifs were extracted from the fasta file based on the indicated start and end sites. Finally, the amino acid motifs were reverse translated into DNA sequence using the \u2018reversetranslate\u2019 R package (v.1.0.0) and using the most frequent codon from the \u2018homo sapiens\u2019 codon table. For the secondary structure library, 100,000 random DNA sequences of 20- and 30-nt length were generated (RBioinf::randDNA function; v.1.48.0) and their secondary structure was calculated (see the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec23\">Data analysis and feature generation<\/a> section). The sequences were distributed into ten bins based on the strength of their secondary structure and 20 sequences were randomly picked from each structure bin to be included in the library. Finally, 30 random perfect 20- and 30-nt RNA hairpins were generated and amended to the secondary structure library. The combined library of insert sequences is included as Supplementary Data <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1<\/a>. The insert sequences were then flanked with primer binding sites, random nucleotide stuffer sequences for shorter inserts, BsmBI sites and target vector compatible overhangs, resulting in 11,166 sequences of 199\u2009nt. The oligonucleotide library was ordered from Twist Biosciences.<\/p>\n<p>Set 2: This set of insert sequences was focused on short sequences between 1 and 10\u2009nt. It included all 1-, 2-, 3- and 4-nt sequences and 100 random sequences (RBioinf::randDNA function; v.1.48.0), respectively, of 5\u201310\u2009nt, and 61 sequences <10\u2009nt from Set 1 for a total of 999 unique inserts (938 were recovered in screens). The libraries were endowed with target-site-specific adapter sequences and ordered the same way as Set 1.<\/p>\n<p>Eighteen-nt insert sequence libraries: This set of sequences consisted of six sublibraries that were designed to target the <i>HEK3<\/i> site and five additional nearby sites (within 1\u2009kb), dubbed <i>HEK3-2<\/i>, <i>HEK3-3<\/i>, <i>HEK3-4<\/i>, <i>HEK3-5<\/i> and <i>HEK3-6<\/i>. The sublibraries shared 100 identical, randomly generated (RBioinf::randDNA function; v.1.48.0) 18-nt insert sequences and 256\u2013288 sublibrary-specific 18-nt insert sequences that were picked based on their ability to form secondary structure in the reverse transcriptase template. In contrast to Set 1 and Set 2, we ordered oligos for this set of sequences that already included the spacer (20\u2009nt), improved scaffold (86\u2009nt, sequence: gtttaagagctatgctggaaacagcatagcaagtttaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgc), PBS (13\u2009nt), insert (18\u2009nt) and homology arm (HA) (15\u2009nt). The oligos were endowed with BsmBI sites, overhangs for cloning and primer binding sites for amplification of the oligo pool. The oligonucleotide library was ordered from Twist Biosciences.<\/p>\n<p>Codon variation library: six protein tags, His-6 (HHHHHH), Flag (DYKDDDDK), a glycine-rich linker (GSSGGSSG), the HiBiT tag (VSGWRLFKKIS)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\"99 title=\"Oh-Hashi, K., Furuta, E., Fujimura, K. &#038; Hirata, Y. Application of a novel HiBiT peptide tag for monitoring ATF4 protein expression in Neuro2a cells. Biochem. Biophys. Rep. 12, 40\u201345 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR43\" id=\"ref-link-section-d69891743e2186\">43<\/a><\/sup>, mNeongreen-11 (TELNFKEWQKAFTDMM)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"00 title=\"Feng, S. et al. Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 8, 370 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR28\" id=\"ref-link-section-d69891743e2190\">28<\/a><\/sup> mNeongreen with a linker (GSSGTELNFKEWQKAFTDMM) and a drug-inducible superdegron (LQCEICGFTCRQKGNLLRHIKLH)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"11 title=\"Jan, M. et al. Reversible ON- and OFF-switch chimeric antigen receptors controlled by lenalidomide. Sci. Transl. Med. 13, eabb6295 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR44\" id=\"ref-link-section-d69891743e2194\">44<\/a><\/sup>; were used to tag <i>ACTB<\/i>, <i>LMNB1<\/i>, <i>NOLC1<\/i>, <i>RNF2<\/i> and <i>TP53<\/i> genes, and to insert into the <i>HEK3<\/i> site. We chose <i>ACTB<\/i>, <i>LMNB1<\/i>, <i>NOLC1<\/i> and <i>RNF2<\/i> because they have been successfully edited in the other publications<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"22 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e2230\">12<\/a><\/sup> and <i>TP53<\/i> for its relevance in health and disease. <i>ACTB<\/i>, <i>LMNB1<\/i>, <i>NOLC1<\/i> and <i>TP53<\/i> were tagged at their N termini; an in-frame, internal fusion was made for <i>RNF2<\/i>. For the <i>ACTB<\/i>, <i>LMNB1<\/i> and <i>TP53<\/i> targets, two independent pegRNAs were used that target both the forward and reverse strands (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM3\">6<\/a>). Because we decided to make in-frame fusions, the position of the insert sequence was shifted up to 6\u2009nt downstream on the reverse transcriptase template relative to the nick. Together, this resulted in nine target sites.<\/p>\n<p>For the His-6 tag and the glycine-rich linker, all possible codon combinations were generated in silico. For the remaining, longer tags, all possible codon variations were generated using only the top two most frequent human codons. MinsePIE was used to predict the insertion efficiencies for the generated codon variants and ten codon variants with both high and low predicted insertion rates were included in the final library. The codon-optimization webtool from Eurofins Genomics (<a href=\"https:\/\/eurofinsgenomics.eu\/en\/gene-synthesis-molecular-biology\/geneius\/sequence-optimisation\/\">https:\/\/eurofinsgenomics.eu\/en\/gene-synthesis-molecular-biology\/geneius\/sequence-optimisation\/<\/a>) was used to design an additional version of each tag. This resulted in 594 sequences in total (Supplementary Data <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM4\">1<\/a>). The oligos for this set of sequences contained spacer (20\u2009nt), improved scaffold (86\u2009nt, gtttaagagctaagctggaaacagcatagcaagtttaaataaggctagtccgttatcaactcgaaagagtggcaccgagtcggtgc<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"33 title=\"Jost, M. et al. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol. 38, 355\u2013364 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR56\" id=\"ref-link-section-d69891743e2280\">56<\/a><\/sup>), PBS (13\u2009nt), insert and HA (34\u2009nt). The oligos were endowed with BsmBI sites, overhangs for cloning and primer binding sites for amplification of the oligo pool, and were ordered from Twist Biosciences.<\/p>\n<h3 id=\"Sec14\">Library cloning<\/h3>\n<p>Set 1 and Set 2: First, a separate, site-specific backbone was cloned for each target site. A gene fragment was ordered containing the protospacer, guide RNA scaffold, parts of the reverse transcriptase template and primer binding site, a stuffer sequence flanked with BsmBI sites for insert library insertion and the T7 terminator motif (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM3\">5<\/a>). Then, 100\u2009ng of the gene fragments was digested with BsaI-HFv2 (NEB) and purified with the Monarch PCR &#038; DNA Cleanup Kit (NEB). The pLentiGuide-BlastR plasmid was digested with BsmBI-V2 (NEB) at 55\u2009\u00b0C for 8\u2009h followed by 20\u2009min of heat inactivation at 80\u2009\u00b0C, and gel purified using the QIAEX II Gel Extraction Kit (Qiagen). The gene fragments were ligated into the backbone using T4 DNA ligase (NEB) and transformed into XL10-Gold Ultracompetent bacteria (Agilent). The plasmids were purified with Qiagen Spin Miniprep Kit.<\/p>\n<p>Second, pegRNA insert libraries were inserted into the site-specific backbones. The insert libraries were synthesized as oligonucleotide pools and amplified using KAPA HiFi HotStart ReadyMix (Roche). Libraries for individual target sites were amplified with separate primers (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>). The products were purified using the Monarch PCR &#038; DNA Cleanup Kit, digested with BsmBI-v2 at 55\u2009\u00b0C for 4\u2009h and heat-inactivated at 80\u2009\u00b0C for 20\u2009min alongside 5\u2009\u03bcg of site-specific plasmids. The digested oligos were purified using the Monarch PCR &#038; DNA Cleanup Kit. The vectors were treated with quick CIP (NEB) for 15\u2009min at 37\u2009\u00b0C and then purified using QIAquick PCR Purification Kit (Qiagen). Inserts were ligated into vectors using Golden Gate assembly. A 1:3 molar ratio of insert and vector was mixed with BsmBI-v2 and T4 DNA ligase and incubated in a thermocycler for 30 cycles, alternating between five\u2009min at 42\u2009\u00b0C and five\u2009min at 16\u2009\u00b0C and finishing with a heat inactivation step at 60\u2009\u00b0C for five\u2009min. The ligation products were purified with Monarch PCR &#038; DNA Cleanup Kit and electroporated into MegaX DH10B T1R Electrocomp Cells (Thermo Fisher). The bacteria were grown overnight in liquid culture and plasmid was extracted using the Plasmid Plus Midi Kit. The pegRNA sequences are shown in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM3\">6<\/a>.<\/p>\n<p>epegRNA libraries were cloned by first generating a <i>HEK3<\/i> site-specific epegRNA backbone with a stuffer sequence for the insert libraries (as above). The tevoprep sequence was added to the fragment containing the protospacer, guide RNA scaffold, parts of the reverse transcriptase template and primer binding site, a stuffer sequence flanked with BsmBI sites for insert library insertion and the T7 terminator motif by PCR (using P42, P43; Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>). Next, the 379 sequences with strong structure were amplified from the Set 1 oligo pool by PCR and cloned into the epegRNA <i>HEK3<\/i> backbone as described above.<\/p>\n<p>Eighteen-nt inserts and codon variation libraries: pLentiGuide-BlastR plasmid was digested with BsmBI-V2 (NEB) at 55\u2009\u00b0C for eight\u2009h followed by 20\u2009min of heat inactivation at 80\u2009\u00b0C and gel purification of the vector using the QIAEX II Gel Extraction Kit (Qiagen). Amplification, purification, digestion and repurification were performed as described above. The oligo sequences were ligated into pLentiGuide-BlastR using Golden Gate assembly, the ligation product was purified and transformed into bacteria, and the plasmid was extracted after an overnight culture as above.<\/p>\n<h3 id=\"Sec15\">Lentivirus production<\/h3>\n<p>Lentivirus was produced in HEK293FT cells that were transfected with Lipofectamine LTX (Invitrogen). First, 5.4\u2009\u03bcg of a lentiviral vector, 5.4\u2009\u03bcg of psPax2 (Addgene 12260) and 1.2\u2009\u03bcg of pMD2.G (Addgene 12259) were mixed in 3\u2009ml of Opti-MEM together with 12\u2009\u03bcl of PLUS reagent and incubated for five min at room temperature. Next, 36\u2009\u03bcl of the LTX reagent was added and the mix was incubated for another 30\u2009min at room temperature. Then, 3\u2009ml of the transfection mix was added to 80% confluent cells in 10\u2009ml of DMEM medium in a 10-cm dish. After 48\u2009h the supernatant was collected and stored at 4\u2009\u00b0C. Fresh medium was added to the cells and collected 24\u2009h later. The two collections were kept separate. For virus titration, Lenti-X GoStix Plus (Takara) was used following the manufacturer\u2019s protocol.<\/p>\n<h3 id=\"Sec16\">pegRNA insertion screens in HEK293T cells<\/h3>\n<p>Infection with pegRNA library: Cells were infected with the pegRNA library (separate infections for each target site and library set), aiming at a multiplicity of infection of 0.5 and a guide coverage of >1,000\u00d7. Each screen was performed in three biological replicates and independently infected. To achieve this, 6\u2009\u00d7\u200910<sup>6<\/sup> cells were plated in three wells of a six-well plate and spin-infected for 15\u201330\u2009min at 2,000\u2009r.p.m. Following infection, cells were resuspended and replated at 2\u2009\u00d7\u200910<sup>4<\/sup> cells per cm<sup>2<\/sup>. Cells were cultured for seven\u2009d and selected for pegRNA integration with 10\u2009\u00b5g\u2009ml<sup>\u22121<\/sup> blasticidin.<\/p>\n<p>Transfection with prime editors: HEK293T cells were seeded at a concentration of 6.9\u2009\u00d7\u200910<sup>4<\/sup> cells per cm<sup>2<\/sup> in a 15-cm dish. The next day, the medium was replaced with fresh medium and the cells were transfected using Lipofectamine LTX reagent. Then, 72\u2009\u00b5g of PE-Puro or PE-FeLV plasmid was mixed with 8\u2009\u00b5g of pCS2-GFP and 40\u2009\u00b5l of Lipofectamine P3000 (Invitrogen) in 3.2\u2009ml of Opti-Mem (Gibco). In another tube, 40\u2009\u00b5l of Lipofectamine 3000 and 160\u2009\u00b5l of Lipofectamine LTX were mixed in 3.2\u2009ml of Opti-Mem. The solutions were combined, incubated for 30\u2009min at room temperature and then added to the cells. For PE3, an additional 6\u2009\u00b5g of nicking guide RNA was added. For screens with nuclease overexpression, an additional 30\u2009\u00b5g of flap nuclease or eGFP plasmid in the pTwist vectors was added.<\/p>\n<h3 id=\"Sec17\">pegRNA insertion screens in HAP1 and HAP1 <i>\u2206MLH1<\/i> cells<\/h3>\n<p>Infection with pegRNA library: The pegRNA library viruses for all target sites and sets were individually quantified using the Lenti-X GoStix Plus (Takara) kit and then combined into one virus pool. The HAP1 and HAP1 <i>\u2206MLH1<\/i> cells with PiggyBac-integrated PE2 were infected with the virus pool, aiming at a multiplicity of infection of 0.5 and a pegRNA coverage of >1,000\u00d7. Each screen was performed in two biological replicates with separate PiggyBac prime editor clones and independently infected. To achieve this, 6\u2009\u00d7\u200910<sup>6<\/sup> cells were plated in three wells of a six-well plate and spin-infected for 15\u201330\u2009min at 2,000\u2009r.p.m. Following infection, cells were resuspended and replated at 2\u2009\u00d7\u200910<sup>4<\/sup> cells per cm<sup>2<\/sup>. Cells were cultured for seven\u2009d and selected for pegRNA integration with 10\u2009\u00b5g\u2009ml<sup>\u22121<\/sup> blasticidin.<\/p>\n<p>For each replicate, 30\u2009million cells were seeded into five-layer flasks and induced with 1\u2009\u00b5M doxycycline. The cells were split once at day four and the doxycycline was refreshed. Finally, cells were collected on day seven post induction.<\/p>\n<h3 id=\"Sec18\">DNA extraction and library preparation for next-generation sequencing<\/h3>\n<p>Genomic DNA extraction and sequencing library preparation for screens were done as described by Allen et al.<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"44 title=\"Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. \n                https:\/\/doi.org\/10.1038\/nbt.4317\n                \n               (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR10\" id=\"ref-link-section-d69891743e2384\">10<\/a><\/sup>. Briefly, cell pellets were resuspended in TAIL BUFFER A (100\u2009mM Tris-HCl, 5\u2009mM EDTA, 200\u2009mM NaCl) and then mixed with 1 volume of TAIL BUFFER B (100\u2009mM Tris-HCl, 5\u2009mM EDTA, 200\u2009mM NaCl, 0.4% SDS) supplemented with freshly thawed Proteinase K (20\u2009mg\u2009ml<sup>\u22121<\/sup> final). The lysate was incubated overnight at 56\u2009\u00b0C. On the next day, RNase A was added to a final concentration of 10\u2009\u00b5g\u2009ml<sup>\u22121<\/sup> and incubated at 37\u2009\u00b0C for 30\u2009min to four\u2009h. Then, 1 volume of isopropanol was added and the DNA spooled on a sterile inoculation loop. The DNA was washed three times by dipping it into consecutive 5-ml tubes containing 70% ethanol. The DNA was air-dried for 5\u201310\u2009min and resuspended in TE buffer (pH 8.0).<\/p>\n<p>For each screen, two independent amplicons were generated by PCR using Q5 HotStart High-Fidelity 2X Master Mix (NEB). One amplicon was for the targeted locus and one amplicon for the pegRNA locus (primers in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>). To maintain high coverage for each sample, 40\u2009\u03bcg of genomic DNA was used as the template and each PCR reaction was run in 50-\u03bcl aliquots containing no more than 5\u2009\u03bcg of genomic DNA. The PCR reactions were column-purified using the QIAquick PCR Purification Kit (Qiagen). Sequencing adapters and barcodes were added with a second round of PCR using the KAPA HiFi HotStart ReadyMix (Roche), primers P3 and P4 (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>) and 1\u2009ng of template DNA. Amplicons were purified with Agencourt AMPure XP beads in a 0.7:1 ratio (beads to PCR reaction volume) and quantified with the Quant-iT High-Sensitivity dsDNA Assay Kit (Invitrogen). The amplicons were pooled together and sequenced on the Illumina HiSeq 2500 using HiSeq Rapid SBS Kit v2 (500 cycles, 250 paired-end).<\/p>\n<h3 id=\"Sec19\">Reverse transcription of pegRNA libraries<\/h3>\n<p>Frozen cell pellets containing 4.5\u20136.1\u2009million cells from screens targeting the <i>HEK3<\/i> site in HEK293T cells were washed with 500\u2009\u00b5l of PBS and the RNA was extracted using the mirVana miRNA Isolation Kit (Invitrogen). Then, 8.4\u201316.6\u2009\u00b5g of template RNA split across eight reactions was used for genomic DNA digestion and complementary DNA synthesis with the SuperScript IV VILO Master Mix with ezDNase (Invitrogen). For cDNA synthesis, a primer was used that was reverse complementary to the 13-nt PBS with extra nucleotides on the 5\u2032 end (italic) to provide additional base pairing for PCR amplification (<i>ATCGAGTTT<\/i>CAGACTGAGCACG; Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>). pegRNAs were amplified from the cDNA mixture by 27 cycles of PCR using KAPA HiFi HotStart ReadyMix (Roche) and primers P39 and P40 (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">3<\/a>). Library preparation and sequencing were performed as described in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Sec18\">DNA extraction and library preparation for next-generation sequencing<\/a> section.<\/p>\n<h3 id=\"Sec20\">Generating read count tables<\/h3>\n<p>Paired forward and reverse reads from Illumina sequencing were merged using PEAR v.0.9.11. Data for the same screen but different sequencing lanes were concatenated. The resulting merged fastq files were processed using a custom R script (read_match_pegRNAs.R, GitHub<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"55 title=\"Weller, J. et al. MinsePIE: Modelling insertion efficiency for Prime Insertion Experiments (Version 3.0). Zenodo \n                https:\/\/doi.org\/10.5281\/zenodo.7505816\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR45\" id=\"ref-link-section-d69891743e2433\">45<\/a><\/sup>). First, DNA sequences were trimmed to contain the 10\u2009nt up- and downstream of the nick site (for target site amplicon) or to contain 15\u2009nt up- and downstream of the nick site (pegRNA amplicon). On average, 98% of reads were matched for the target site amplicon and 84% for the pegRNA amplicon. The trimmed sequences were then matched to each insert in the pegRNA library flanked by 10\u2009nt of target site sequence (for target site amplicon) or flanked by 15\u2009nt of pegRNA plasmid sequence (pegRNA amplicon), requiring 0 mismatches. Adding the flanking sequences ensures that only insertions at the correct locations are considered. On average, 92% of reads were matched to the unedited locus or an insertion for both the target site amplicon and the pegRNA amplicon.<\/p>\n<h3 id=\"Sec21\">Combining replicates<\/h3>\n<p>pegRNAs where any replicate had fewer than 20 reads in the pegRNA amplicon mapping to it were filtered out. Insert counts were normalized to frequencies by dividing the reads for each insert by the number of reads in each screen. Insertion efficiencies were calculated for each replicate and screen by dividing the target insert frequency by the pegRNA insert frequency. (Note: calculating insertion frequencies this way likely underestimates them, as it does not take cells that were not infected with the library into account. In addition, an average of 16% of reads in the pegRNA amplicons did not match to any sequence in the library.) Finally, insertion efficiencies were averaged across replicates. The script used to combine replicates is available on GitHub<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"66 title=\"Weller, J. et al. MinsePIE: Modelling insertion efficiency for Prime Insertion Experiments (Version 3.0). Zenodo \n                https:\/\/doi.org\/10.5281\/zenodo.7505816\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR45\" id=\"ref-link-section-d69891743e2445\">45<\/a><\/sup> as \u2018combine_replicates.R\u2019. The processed read count tables are shown in Supplementary Data <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2<\/a>.<\/p>\n<h3 id=\"Sec22\">Mutation rates around the insertion site and indel detection<\/h3>\n<p>The fastq reads of the target sites were trimmed by matching a stretch of 10\u2009nt directly upstream of the PBS and 60\u2009nt downstream of the insertion site (<i>CLYBL<\/i>: CTGAATGGTG, CAGAGTTCCA; <i>EMX1<\/i>: GGGCCTGAGT, ATGGGGAGGA; <i>FANCF<\/i>: CCTCATGGAA, AGCACCTGGG; <i>HEK3<\/i>: CCTTGGGGCC, AGCTTTTCCT). The occurrence of library insertions was detected by pattern matching the trimmed reads for library sequences. Indel detection: The trimmed reads were filtered in a series of steps. First, sequences with insertions at the nick site that perfectly match a sequence in the insert libraries were removed (this also means that our method cannot detect single\/double\/triple-nucleotide insertions at the nick site because our library contains all possible singlets\/doublets\/triplets). Second, sequences that contained \u2018N\u2019 were removed. Third, sequences with a perfectly preserved sequence around the cut site were removed. Fourth, sequences that were 83-nt long were removed (83\u2009nt corresponds to the length of a sequence without indels). The remaining sequences were annotated according to the indel type. Scaffold integrations were sequences that contained five or more nucleotides of the scaffold (GCACC) directly downstream of the reverse transcriptase template. Mutated insertions were sequences that matched any sequence >10\u2009nt in the library with no more than three mismatches (fuzzyjoin R package v.0.1.6, optimal string alignment method). Duplications were sequences that contained two or more copies of the homology arm sequence. Deletions at the target sites were deletions that overlapped up to 10\u2009nt up- and\/or downstream with the nick site. Other deletions were deletions that did not overlap with the nick site and all remaining sequences are classified as \u2018other\u2019. The scripts used to call mutation rates and indels are available on GitHub<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"77 title=\"Weller, J. et al. MinsePIE: Modelling insertion efficiency for Prime Insertion Experiments (Version 3.0). Zenodo \n                https:\/\/doi.org\/10.5281\/zenodo.7505816\n                \n               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR45\" id=\"ref-link-section-d69891743e2472\">45<\/a><\/sup> as \u2018find_mutations.R\u2019.<\/p>\n<p>SNV detection: Going from the outside to the inside of the trimmed sequence (with the nicking site being between the two innermost nucleotides), the occurrence of the four nucleotides was counted at every position. Nonreference nucleotides were classified as mutations with the exception of a nonreference SNP (A) in HEK293T cells for one of three alleles at position +9. The reverse transcriptase template on the pegRNA corresponds to the sequence of the major allele (G).<\/p>\n<h3 id=\"Sec23\">Data analysis and feature generation<\/h3>\n<p>Merging data from Set 1 and Set 2: For each target site and cell line, the insertion rates in Set 2 were multiplied by the ratio of the mean insertion rate of the shared sequences in Set 1 and the mean insertion rate in Set 2. For the 140 shared insert sequences, the mean insertion rate between both sets was calculated. Length-normalized insertion rates: Length residuals were calculated by dividing the insertion rate by the median insertion rate for sequences of the same length (for sequences <10\u2009nt) or by dividing sequences into length bins. The length bins consisted of sequences of 10\u201314, 15\u201319, 20\u201324, 25\u201329, 30\u201339, 40\u201349, 50\u201359 and 60\u201369 (sequences with lengths above 30\u2009nt were divided into length bins of 10\u2009nt because there were fewer longer sequences in the library). The melting temperature for the insert sequence was calculated using SeqUtils.MeltingTemp.Tm_NN from biopython. The RNA fold (v.2.4.16) algorithm of the ViennaRNA (v.2.5.0a) package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"88 title=\"Gruber, A. R., Lorenz, R., Bernhart, S. H., Neub\u00f6ck, R. &#038; Hofacker, I. L. The Vienna RNA websuite. Nucleic Acids Res. 36, W70\u2013W74 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR57\" id=\"ref-link-section-d69891743e2487\">57<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\"99 title=\"Hofacker, I. L. Vienna RNA secondary structure server. Nucleic Acids Res. 31, 3429\u20133431 (2003).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR58\" id=\"ref-link-section-d69891743e2490\">58<\/a><\/sup> was used to calculate the tendency of insert sequences (alone or in the context of PBS and\/or HA) to form secondary structures. The free energy was normalized to the mean and standard deviation (<i>z<\/i> score) of 1,000 random sequences with the same length and in the same context.<\/p>\n<p>The 6-nt and 9-nt insertion data from Choi et al.<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\"00 title=\"Choi, J. et al. A time-resolved, multi-symbol molecular recorder via sequential genome editing. Nature 608, 98\u2013107 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR42\" id=\"ref-link-section-d69891743e2500\">42<\/a><\/sup> were filtered for sequences with more than 20 sequencing reads for each pegRNA replicate and more than 30 sequencing reads for the plasmid reads, followed by feature calculation as described above. The insertion and plasmid read frequencies were calculated as the fraction of insertion mapping reads in all reads, and the normalized insertion rate as the ratio of insertion read frequency to the plasmid read frequency normalized to the mean and standard deviation of each dataset (<i>z<\/i> score). The data from Kim et al. were filtered to contain target sites with all seven insertions and no other edits, followed by feature calculation as described above. Edit rates were normalized to the mean and standard deviation of editing rates at each target site.<\/p>\n<h3 id=\"Sec24\">Comparison of HAP1 and HAP1 <i>MLH1<\/i> lines<\/h3>\n<p>To account for screen batch effects for direct comparisons (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#Fig2\">2f<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2d<\/a>), the mean insertion rates across wild-type and <i>MLH1<\/i> knockout HAP1 cell lines were scaled to be identical for >13-nt sequences that are not affected by MMR. The fold changes of the scaled insertion efficiencies between HAP1 <i>\u2206MLH1<\/i> and HAP1 lines were then calculated for each sequence in the library.<\/p>\n<h3 id=\"Sec25\">Validation of nuclease overexpression with individual pegRNAs<\/h3>\n<p>We chose four different insertions (C, CAG, a BCL6 recognition sequence: TTCTAGGAA and a Myc-tag: GAGCAGAAGCTGATCAGCGAAGAGGACCTC) from our pooled library for validation and cloned them into <i>HEK3<\/i> site-targeting pegRNAs endowed with 25- or 34-nt homology arms. At one\u2009d before transfection, HEK293T cells were seeded in two 24-well plates at 50,000 cells per well. All transfections were done in replicates and each well was transfected with 500\u2009ng of pCMV_PE2_P2A_PuroR, 150\u2009ng of pTwist nuclease or eGFP overexpression constructs, and 100\u2009ng of pegRNA using Lipofectamine LTX according to the manufacturer\u2019s protocol. Successful transfection one\u2009d later was confirmed by fluorescence microscopy and 2\u2009\u00b5g\u2009ml<sup>\u22121<\/sup> puromycin was added one\u2009d later. Cells were collected five\u2009d post transfection by direct lysis of cell pellets using home-made quick extract buffer (1\u2009mM CaCl<sub>2<\/sub>, 3\u2009mM MgCl<sub>2<\/sub>, 1\u2009mM EDTA, 1% Triton X-100, 10\u2009mM Tris pH 7.5) with freshly added proteinase K (0.2\u2009mg\u2009ml<sup>\u22121<\/sup>) followed by 15\u2009min of incubation at 65\u2009\u00b0C and 20\u2009min of incubation at 95\u2009\u00b0C. Then, 1.5\u2009\u00b5l of the lysate was directly added to 25\u2009\u00b5l of amplicon PCRs. Sequencing adapters and barcodes were added by a second round of PCR and the purified products were sequenced on an Illumina Miseq (300 cycles). Correctly edited reads were identified by pattern matching for the insert sequence flanked by 10\u2009nt of the target site to each end. Unedited sequences were detected by matching the 20\u2009nt of wild-type sequence around the nick site. The insertion rate was calculated by dividing the number of edited reads by the number of wild-type reads.<\/p>\n<h3 id=\"Sec26\">Modeling<\/h3>\n<p>Insertion efficiencies were normalized (<i>z<\/i> score) between screens and replicates by subtracting the corresponding mean insertion efficiency from each individual insertion efficiency and dividing it by the standard deviation of the insertion efficiency. Categorical features were one-hot encoded. Hyperparameters were tuned for each model by evaluating average model performance after fivefold cross-validation using each combination of hyperparameters, then choosing the parameter combination resulting in the best cross-validation performance. The Lasso and Ridge regressions were tested with alpha values of 0, 0.00001, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1. The Random Forest regressor was tested with n_estimators of 5, 10, 50, 100, 500 and 1000; max depth of 2, 5, 7, 10 and None; and min_samples_leaf of 1, 5 and 10. The Multilayer perceptron regressor was tested with hidden_layer_size of (10), (100), (100, 10), (1000, 100) and (1000, 100, 10); and alpha of 0.01, 0.1, 0.5 and 1. The gradient boosted tree from XGBoost<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\"11 title=\"Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR\/Cas system. Cell 155, 1479\u20131491 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR36\" id=\"ref-link-section-d69891743e2562\">36<\/a><\/sup> was tested with n_trees of 1, 5, 10, 50, 100, 500 and 1000; max_depth of 1, 2, 3, 4, 5, 7 and 10; l1_penalty and l_2 penalty of 0, 0.001, 0.01, 0.1, 0.5 and 1; colsample of 0.1, 0.3, 0.5, 0.7, 0.9 and 1; gamma of 0 .001, 0.01, 0.1, 0.5 and 1; and learning_rate of 0.0001, 0.001, 0.01, 0.1, 0.3 and 0.5. The scikit-learn models were trained using parameters obtained from hyperparameter tuning: Lasso regression was performed with alpha\u2009=\u20090.1; Ridge regression was performed with alpha of 0.01; Random forest had no maximum depth, 1000 estimators and min_samples_leaf of 5; Multilayer perceptron regressor was trained with alpha\u2009=\u20091, 200 maximum iterations at a constant learning rate of 0.001, a hidden layer size of (1000, 100) and \u2018lbfgs\u2019 solver. Gradient boosted tree from XGBoost<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\"22 title=\"Chen, T. &#038; Guestrin, C. XGBoost. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B.et al.) 785\u2013794 (ACM, 2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR59\" id=\"ref-link-section-d69891743e2566\">59<\/a><\/sup> was trained with a minimum loss reduction of 0.1, 100 trees, a learning rate of 0.1, maximum depth of 4, 0.00001 L1 regularization on weights, 0.1 L2 regularization on weights and a subsample ratio of one per column when constructing each tree.<\/p>\n<p>The final model was trained with XGBoost using the features length; normalized secondary structure of the reverse transcriptase template; MMR proficiency; percentage of the nucleotides C, A and T; the number of paired bases between the first 3\u2009nt of the insert and the last 3\u2009nt of the spacer in addition to the first nucleotide of the scaffold; complementarity between the first nucleotide of the insert and the nucleotide at the nicking site; the maximum number of consecutive adenines in the insert; and the intactness of loop1. Features in each set are summarized in Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">1<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">2<\/a>.<\/p>\n<p>For training, unique insert sequences were split randomly into training and test sequences at a ratio of 0.7 (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">10a<\/a>). Measurements for different target sites and cell lines were assigned to training and test data based on the grouping of insert sequences. The model was trained and predictions were evaluated using Pearson\u2019s <i>R<\/i> based on the correlation between test data and corresponding predictions. SHapley Additive exPlanations (SHAP) values for the model and feature importance for the prediction of specific outcomes were calculated using the SHAP TreeExplainer and explainerModel<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\"33 title=\"Lundberg, S. &#038; Lee, S.-I. A unified approach to interpreting model predictions. Available at arXiv [cs.AI] \n                https:\/\/doi.org\/10.48550\/arXiv.1705.07874\n                \n               (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR60\" id=\"ref-link-section-d69891743e2588\">60<\/a><\/sup>.<\/p>\n<h3 id=\"Sec27\">Statistics and reproducibility<\/h3>\n<p>The <i>n<\/i> numbers denoted in the figure legends refer to independent experiments that were separately infected with the pegRNA library. Measurements were always taken from distinct samples. No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Wherever correlations were indicated, Pearson\u2019s <i>R<\/i> was used. The <i>t<\/i>-tests (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">5a,b<\/a>) were performed as two-sided tests. Normal distribution of the underlying data was assumed and no adjustments for multiple comparisons were made.<\/p>\n<h3 id=\"Sec28\">MinsePIE website<\/h3>\n<p>The MinsePIE website uses the MinsePIE package available at <a href=\"https:\/\/github.com\/julianeweller\/MinsePIE\">https:\/\/github.com\/julianeweller\/MinsePIE<\/a> to serve as a user-friendly and interactive way to predict insertion efficiency (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM1\">10b<\/a>). There are three main modes, with standard highlighting all relevant sequence features, manual allowing more advanced usage where the user can adjust relevant parameters (for example, mean and s.d. of editing rate) and batch mode allowing to upload a set of sequences for analysis. A table highlighting insert sequences, respective <i>z<\/i> scores and insertion prediction scores is given in each usage mode. For ease of analysis, color codes are used in the table and the following distribution graph to highlight the sequences with the highest insertion efficiency scores. MinsePIE web application makes use of Vue.js (v.2.6.11), D3.js (v.3.5.17) and agGrid (v.24.1.1) libraries and the Flask framework (v.2.0.2). Genomic data are retrieved via <a href=\"https:\/\/api.genome.ucsc.edu\">https:\/\/api.genome.ucsc.edu<\/a>.<\/p>\n<h3 id=\"Sec29\">Padding of shorter insert sequences<\/h3>\n<p>Three sequences between 12 and 13\u2009nt (an endoplasmic reticulum retention signal, AAGGACGAGCTG; a BRE-TATA element, CCACGCCTATAAA; and a consensus splice motif, TTTTTTTCAGGTT) were chosen for padding. The sequences were padded to 18\u2009nt with all possible nucleotide combinations. MinsePIE was used to predict the insertion rates for these variants at the <i>HEK3<\/i> site. The sequences with highest predicted efficiencies were picked for testing: CAAGGACGAGCTGTCCAC, CCCACGCCTATAAAGGCC and GCTTTTTTTCAGGTTCTC. The padded and original inserts were endowed with a 13-nt PBS and 34-nt reverse transcriptase template and cloned into the pU6-pegRNA-GG-acceptor (Addgene no. 132777) as described previously<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\"44 title=\"Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149\u2013157 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#ref-CR12\" id=\"ref-link-section-d69891743e2651\">12<\/a><\/sup>. Editing efficiencies were assessed by transient transfection in an arrayed format. Therefore, 10,000 HEK293T cells were seeded into a 96-well plate in triplicates. On the following day, 50\u2009ng of pegRNA plasmids and 200\u2009ng of pCMV-PE2-PuroR were transfected using 0.3\u2009\u00b5l of Lipofectamine LTX (Thermo Fisher Scientific) and 0.1\u2009\u00b5l of Plus reagent per well according to the manufacturer\u2019s instructions. After one\u2009d, 2\u2009\u00b5g\u2009ml<sup>\u22121<\/sup> Puromycin was added. Cells were collected four\u2009d post transfection by direct lysis of cell pellets using home-made quick extract buffer (1\u2009mM CaCl<sub>2<\/sub>, 3\u2009mM MgCl<sub>2<\/sub>, 1\u2009mM EDTA, 1% Triton X-100, 10\u2009mM Tris pH 7.5) with freshly added proteinase K (0.2\u2009mg\u2009ml<sup>\u22121<\/sup>) followed by 10\u2009min of incubation at 65\u2009\u00b0C and 15\u2009min of incubation at 95\u2009\u00b0C. Then, 3\u2009\u00b5l of the lysate was directly added to amplicon PCRs. Sequencing adapters and barcodes were added by a second round of PCR and the purified products were sequenced on an Illumina Miseq (300 cycles). Correctly edited reads were identified by pattern matching for the insert sequence flanked by 10\u2009nt of the target site to each end. Unedited sequences were detected by matching the 20\u2009nt of wild-type sequence around the nick site. The insertion rate was calculated by dividing the number of edited reads by the number of wild-type reads.<\/p>\n<h3 id=\"Sec30\">Software<\/h3>\n<p>The software used comprised BaseSpaceCLI (v.1.4.0); Geneius codon-optimization webtool from Eurofins Genomics (accessed 2022); PEAR (v.0.9.11); Python (v.3.8.10); Python packages: Biopython (v.1.79), more-itertools (v.8.5.0), pandarallel (v.1.6.1), scikit-learn (v.0.24.2), scipy (v.1.5.3), shap (v.0.39.0), statannot (v.0.2.3) and XGBoost (v.1.4.0); R (v.4.0.2); ViennaRNA (v.2.5.0); and R packages: Broom (v.0.7.9), fuzzyjoin (v.0.1.6), ggpointdensity (v.0.1.0), RBioinf (v.1.48.0), reversetranslate (v.1.0.0), ShortRead (v.1.46.0), spgs (v.1.0\u20133), Tidyverse (v.1.3.1) and Viridis (v.0.6.1).<\/p>\n<h3 id=\"Sec31\">Reporting summary<\/h3>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-023-01678-y#MOESM2\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n<\/div>\n<\/div><\/div>\n<p><a href=\"https:\/\/www.nature.com\/articles\/s41587-023-01678-y\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Jonas Koeppel<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MainThe efficient insertion of short DNA sequences into genomes could change the course of biotechnology and medicine1,2. Small insertions can encode protein tags for purification and visualization, or manipulate protein function by altering protein localization, half-life or interaction profiles. Integrating sequences for transcription factor binding sites and splicing modulators provides control over gene expression while<\/p>\n","protected":false},"author":1,"featured_media":608712,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28724,4746,536],"tags":[],"class_list":["post-608711","post","type-post","status-publish","format-standard","has-post-thumbnail","category-prediction","category-prime","category-science-nature"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/608711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=608711"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/608711\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/608712"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=608711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=608711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=608711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}