This filtering was imposed to remove reads with low-quality scores that may be erroneously counted as false-positive mutations. Stringent filtering was applied using CASAVA v1.8 quality scores (Q38) that predict accuracy 99.98% for the substitution and the flanking 11 nucleotides, further reducing the pool to only ∼1% of original reads ( Fig. In each of the four replicates, between 6.9% and 9.5% of 160–220 million total reads of 50-nt length were found to contain exactly one single-nucleotide substitution representing a prospective mutation. Deep sequencing of the DNA derived from these mutagenized surviving phage progeny allowed us to map and count HA-induced mutations at every G/C position in the T7 genome, and thus measure the mutability across each protein coding sequence. Because the mutagenized phage particles were recovered after growth on a bacterial host, we envisioned that only viable replication-proficient phages were sequenced. We generated and sequenced ∼1.5 million randomly mutagenized plaque-forming units derived from a stock of 10 billion plaque-forming units. This specific mode of chemical mutagenesis allowed us to titrate the level of mutagenesis accurately, as well as provide a signature of induced mutations and separate these from sequencing errors. HA-treatment allows mutation of the phage genome but DNA is still packaged in the intact virion before genome internalization and replication in the cell. We used the chemical mutagen hydroxylamine (HA), which specifically induces transitions of GC base pairs to AT base pairs. We initially tested Mut-seq on bacteriophage T7 of Escherichia coli, a podophage with a genome size of ∼40 kb and JSF7, an uncharacterized Vibrio cholerae podophage. Mut-seq involves operationally the following steps: ( i) mutagenesis of a genome or gene ( ii) recovery of a bank of mutagenized targets under a positive selection condition ( iii) deep sequencing of the entire bank and ( iv) alignment of sequence reads to identify and quantify base substitutions within the genomes or genes that represent mutations. Results Sequencing Mutated Phage and Stringent Filtering of Reads to Identify Single-Base Substitutions. These insights into the essential residues contributing to the functionality of proteins may provide a new dataset useful to the development of small molecule inhibitors of essential proteins, and may also inform efforts to suppress drug resistance through evolved mutation. The less or nonmutable residues are of special interest in that they may play pivotal roles in a protein’s activity as contributors to an enzymes active site, essential functional motifs, or as structural elements or linkers between domains that confer proper protein folding and conformation. In brief, we show that a large mutagenized population of viral genomes can be selected for growth fitness and then sequenced as a pool to define which amino acid residues can be changed to one or more other residues, and which cannot tolerate changes at all. Our method (Mut-seq) takes advantage of the depth afforded by next-generation sequencing to characterize complex pools of mutated genomes and genes for functionality, and in doing so maps coding sequences for residues that show statistically high or low rates of mutability. Here we define a unique highly parallel approach to defining functional residues of proteins based on their mutability alone. Such residues may control conformation of a protein only in the context of its own unique polypeptide sequence or in the milieu of a complex of coevolved interacting partners. However, residues that support the folding and basic structure of the protein may not be as conserved and thus may not be predicted to be essential by such in silico analyses. Programs such as SIFT (Sorting Intolerant From Tolerant) use amino acid conservation to predict tolerated from deleterious substitutions ( 4). Highly conserved residues have been documented that correspond to the core catalytic and active sites or protein–protein interaction surfaces ( 3). This compilation is exemplified in the Pfam database within 14 y of its inception, there are now more than 12,000 conserved protein families, some represented by over 100,000 sequences ( 2). Afforded by dramatic progress in DNA sequencing cost reduction and increased output that has grown at a rate exceeding that of Moore’s law ( 1), the compilation of deposited sequences now provides a vast database for identifying proteins and motifs at an increasingly high resolution.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |