Available Common Primers (Complete List in Excel)
Name | Description | Sequence |
---|---|---|
3'AOX1 | For Pichia vectors with AOX1 terminator, reverse primer | GCAAATGGCATTCTGACATCC |
5'AOX1 | For Pichia vectors with AOX1 promoter, forward primer | GACTGGTCCAATTGACAAGC |
Alpha-factor | Alpha factor signal sequence, forward primer | TACTATTGCCAGCATTGCTGC |
Amp-R | 5' end of ampiciltrn resistance gene, reverse primer | ATAATACCGCGCCACATAGC |
CAT-R | 5' end of chloramphenicol resistance gene, reverse primer | GCAACTGACTGAAATGCCTC |
CMV Forward | Human CMV immediate early promoter, forward primer | CGCAAATGGGCGGTAGGCGTG |
CRE-R | 5' end of Cre recombinase, reverse primer | GCAAACGGACAGAAGCATTT |
EF-1a Forward | Human elongation factor-1a promoter, forward primer | TCAAGCCTCAGACAGTGGTTC |
GAL1 | S. cerevisiae GAL1 promoter, forward primer | AATATACCTCTATACTTTAACGTC |
Gal10pro-F | S. cerevisiae GAL10 promoter, forward primer | GGTGGTAATGCCATGTAATATG |
Gal4 N-term | 3' end of Gal4 DNA binding domain, forward primer | GAGTAGTAACAAAGGTCAA |
Gal4-AD | 3' end of Gal4 activation domain, forward primer | AATACCACTACAATGGAT |
GFP-F | 3' end of GFP, forward primer | GGTCCTTCTTGAGTTTGTAAC |
GFP-R | 5' end of GFP, reverse primer | CCATCTAATTCAACAAGAATTGGGACAAC |
IRES-F | 3' end of IRES, forward primer | TGGCTCTCCTCAAGCGTATT |
IRES-R | 5' end of IRES, reverse primer | CCTCACATTGCCAAAAGACG |
LacI-R | 5' end of LacI, reverse primer | GGCATACTCTGCGACATCGT |
LacZ-R | 5' end of LacZ, reverse primer | GACAGTATCGGCCTCAGGAA |
M13 (-21) Forward | In lacZ gene | TGTAAAACGACGGCCAGT |
M13 (-40) | In lacZ gene | GTTTTCCCAGTCACGAC |
M13 Reverse | In lacZ gene | CAGGAAACAGCTATGAC |
M13/pUC Forward | In lacZ gene | CCCAGTCACGACGTTGTAAAACG |
M13/pUC Reverse | In lacZ gene | AGCGGATAACAATTTCACACAGG |
pBAD Forward | For vectors with E. cotr araBAD promoter, forward primer | ATGCCATAGCATTTTTATCC |
pBAD Reverse | For vectors with E. cotr araBAD promoter, reverse primer | GATTTAATCTGTATCAGG |
T3 | T3 promoter, forward primer | GCAATTAACCCTCACTAAAGG |
T7 | T7 promoter, forward primer | TAATACGACTCACTATAGGG |
T7 Terminal | T7 terminator, reverse primer | GCTAGTTATTGCTCAGCGG |
Frequently Asked Questions
Overview
- I'm a new customer, how do I open an account?
- What are the steps for DNA sequencing?
- Why should I use Quintara Bio?
- How can I test your service?
- What equipment do you use?
Service Information
- What is the price?
- What kind of QC do you offer?
- How do I confirm my order was placed?
- Can I open an account and submit samples if I live outside of the U.S.?
- Hours and holiday schedule?
- Where and when are the pickups?
Samples
- How should I prepare the samples?
- How do I prepare my samples so I will get the best DNA sequencing results?
- What types of DNA templates can be sequenced?
- May I submit 8-strip tubes if I am submitting less than 8 samples?
- Can you sequence very short (~100 bp) PCR products?
- Can you help me if my sequence is GC-rich and difficult to amplify?
- How much DNA and primer should I submit?
- How long do you keep the samples?
- How should I label my 8-strip tubes?
- What do I need to provide for you to perform PCR cleanup on my samples?
- Does my vector have a binding site for a universal primer?
- How do I determine my DNA concentration?
Primers
- What is the primer storage policy
- What tips do you have for designing primers?
- Can you design and order primers for me?
- Where do I get primers?
- What universal primers do you have?
- What amount of primer is needed?
Results
- How soon can I get the sequencing results?
- How do I get the results?
- How do I view the sequencing data?
- What's the average read length?
- How come the repeats work when there were no changes to the samples or primer?
Troubleshooting
What should I do about failed reactions?
What is your repeat policy?
Why did my reactions fail?
Why does a poly A-T region resulting in poor sequencing mean?
How can I optimize my reactions?
Other
Do you still have my template from order #?
Who should I talk to in regards to my invoice?
How can I pay for the service?
I have an order to pick up, not an online order.
How do I know if my sample was picked up?
I missed the drop off time; can you pick up my sample?
I'm a new customer, how do I open an account?
On our homepage, click on the Order or Login tab. Then under the log in box, click Register, fill in the new user registration form, then click Register User at the bottom of the page. You will now be able to log into our website, manage your account, and place orders.
What are the steps for DNA sequencing?
If you want to place an order for our DNA sequencing service, first you will need to make sure you have created an account on our website. Once you are logged into your account, you can fill an online order form, or import the order information from an Excel file.
For your sample, you will want to follow our Sample Preparation Guidelines.
Once your samples are prepared, you can ship your order to our facilities on the west or east coast. We also offer free local pickups in the Bay Area, Boston, Colorado, and Wisconsin. Contact us for our local pickup schedule and locations.
Why should I use Quintara Bio?
Quintara Bio is continuously striving to provide the highest quality DNA sequencing service, fastest turnaround times, and overall value for our customers. Our experienced staff and state-of-the-art equipment run 24/7 to provide you with the superior results you deserve and optimize the most complex reactions.
Other companies may offer lower prices for similar sequencing, but they can only do so by cutting corners, using cheap reagents, and offering no Quality Control. All too often mistakes on their part will leave you with the wrong data, or have you wasting time solving their issues by yourself.
How can I test your service?
We offer 5 free reactions to all new customers that want to try our DNA sequencing service. Just set up an online account, submit an order, and experience the quality Quintara Bio provides.
What equipment do you use?
We use ABI 3730xl DNA Analyzers, the Gold Standard for high throughput genetic analysis. Along with our experienced technicians and QC staff, we deliver the highest quality data.
What is the price?
Our general pricing for our DNA sequencing services can be found on the DNA sequencing service page. We offer many academic and regional discounts, please contact our sales team for a more accurate quote based on your specific order.
What kind of QC do you offer?
We provide two tiers of QC, first our proprietary analytical software will check for all basic parameters of the reactions, and then the QC manager will go through each reaction in trouble shooting for reactions that are not optimal, and you will receive a personalized summary of your order. We will work closely with you to optimize any reaction that you need. Visit our Data Sample page to see the value in our service.
How do I confirm my order was placed?
Once you place your online order you will receive an automated message that will confirm your order. However if you feel the need to double check, feel free to contact us by phone or email and we are happy to ensure your order confirmation.
Can I open an account and submit samples if I live outside the US?
Of course, we serve customers from all over the globe. When you set up your online account, under the State tab, just leave it blank. Ship your samples to our Boston or San Francisco laboratory, and receive your data online.
Hours and holiday schedule?
Our CA Bay Area lab open 24 hours Monday through Saturday every week, and our Boston facility processes orders 7 days a week. Here is our holiday schedule through the year.
- New Year's Day
- President's Day
- Memorial Day
- Independence Day
- Labor Day
- Thanksgiving
- Christmas Day
Where and when are the pickups?
Our pickup schedule varies by territory. We offer free sample pickups in Boston metropolitan area, the San Francisco Bay Area, Denver, Fort Collins, Boulder, and Aurora, CO, Madison, WI, and Indianapolis, IN. Please contact us by email or phone to find out the schedule by territory.
How should I prepare the samples?
To properly prepare your samples for sequencing, please follow our guideline here.
How do I prepare my samples so I will get the best DNA sequencing results?
To improve the sequencing results of your samples, ensure that all information on the sample sheet is correct, and include any important information, e.g. hairpins or high Tm primers. In the event some reactions fail, providing excess sample can improve the turnaround time for repeat reactions.
What types of DNA templates can be sequenced?
We accept plasmids, purified and unpurified PCR, cosmid, BAC samples, and we will accept agar plates with bacterial colonies and bacterial cultures to purify the samples at our lab.
May I submit 8-strip tubes if I am submitting less than 8 samples?
Of course, we encourage the use of strip tubes over individual tubes regardless of the amount of samples.
Can you sequence very short (~100 bp) PCR products?
Yes, however we recommend sequencing PCR products that are at least 200 bp.
Can you help me if I my sequence is GC-rich and difficult to amplify?
Our experts will work closely with you to optimize complex GC-rich samples, and troubleshoot using proprietary protocols specifically designed for such cases.
How much DNA and primer should I submit?
It will depend on the type of DNA template and the size of the sample, please use our chart for submission guidelines. If you are submitting primers, the recommend concentrations can be found on the same chart.
How long do you keep the samples?
We store all samples for one month after sequencing. We also store customer primers free of charge until completely finished.
How should I label my 8-strip tubes?
Please label each strip with your initials, and sample number on the side of each tube. We also recommend labeling the top of the tubes.
What do I need to provide for you to perform PCR cleanup on my samples?
Just submit your unpurified PCR products with primers in separate labeled tubes. We will perform the clean-up, run each samples on gel, then optimize and sequence each reaction.
Does my vector have a binding site for a universal primer?
Please look at your vector map or the manufacturer’s procedure to see which primer site is present or what primer is recommended for sequencing. We offer 5000+ free primers, check here for a list of our primers.
How do I determine my DNA concentration?
We recommend using gel electrophoresis or a spectrophotometer. If using a spectrophotometer like a NanoDrop, check the A260/A280 to ensure your sample is not contaminated.
What is the primer storage policy?
We will store all of the primers you send us free of charge for 1 year, upon request to use for future orders.
What tips do you have for designing primers?
It is ideal to design primers that have a GC content of 40-60% and Tm of 50-60oC.
Can you design and order primers for me?
We are happy to work with you to design primers to optimize your sequencing. We will design and synthesize primers for you; just contact us at oligos@quintarabio.com for more info.
Where do I get primers?
We synthesize oligoes in QuintaraBio Cambridge site, pleact contact oligo@quintarabio.com for your oligo needs.
What universal primers do you have?
We offer 1000+ free primers, check here for a list of our universal primers.
What amount of primer is needed?
We need 1 ul of 5 uM primer for each sequencing reaction.
How soon can I get the sequencing results?
Once we receive the samples in our lab, the average turnaround is 8 hours to complete the sequencing. It will depend on your location for the exact data delivery time, but in most cases it will be next morning delivery. We also offer an express service if you are in a rush, contact us for more info.
How do I get the results?
Data is automatically emailed once the sequencing is complete, then run through our rigorous QC procedure. You can directly download your data in a secure attachment, and you will receive a custom sequence summary from a member of our QC team.
How do I view the sequencing data?
In order to view chromatogram files, you will need to download a suitable viewer (check here). The sequence .txt file can be read with any available text software such as Microsoft Word or Notepad.
What’s the average read length?
The average read length for the sequencing instrument is 800-1000bp. For longer reads ask us about our primer walking service
How come the repeats work when there were no changes to the samples or primer?
There are always possibilities for machine or human error, however we do often change the conditions of failed reaction based on our interpretation of the original results. Please contact us if you have specific questions regarding a particular order.
What should I do about failed reactions?
There are many reasons a reaction could fail. Our QC team will closely analyze your data, and suggest the most probable reason for the reactions that failed and the best course of action, either rerun the sample, or repeat the entire reaction.
What is your repeat policy?
We will optimize samples from orders that didn’t sequence correctly and rerun them later that day. If a reaction needs to be repeated from scratch due to an error on our part, it will be done free of charge. We will ask if you want to rerun any other incomplete reactions.
Why did my reactions fail?
There are many possibilities for failed reactions, and our dedicated QC team will examine each failed reaction to determine the most likely reason for each failed reaction, and relay their findings in their personalized order summary.
What does a poly A-T region resulting in poor sequencing mean?
If a DNA sample contains a homopolymeric region such as poly A, 5 or more in concession, the enzyme can slip from the strand and result in a bad read. We would then recommend sequencing the DNA in the opposite direction to improve results.
How can I optimize my reactions?
It will largely depend on what was the reason for the poor results. We will recommend the best options for optimization based on our expertise, and work closely with you to sequence the most complex reactions.
Do you still have my template from order #?
Please email us your specific order numbers, and we will quickly get back to you informing you if we still have the sample. We typically store all samples for one month.
Who should I talk to in regards to my invoice?
Please contact us by phone (1-415-738-2509) or email(info@quintarabio.com) if you have any questions about your invoice.
How can I pay for the service?
We accept payment by credit card or Purchase Order (PO). Include the PO# for your group or institution in account info.
I have an order to pick up, not an online order.
We prefer our customers to place their orders online, however if you normally submit orders offline, contact your regional account manager to arrange a pick up.
How do I know if my sample is picked up?
If you submitted your sample after the pickup time, it may not have been picked up. Please contact our lab and we will gladly check for you if we have received your samples.
I missed the drop off time; can you pick up my sample?
If you have missed your local drop-off time, we may still be able to pick it up for you. Call or email the local manager, and we will try our best to arrange a pick up.
Nanopore (Whole Plasmid Sequencing) FAQ
- Switching from Sanger to nanopore sequencing can be challenging for those not accustomed to using next-generation sequencing technologies. That's why we designed this FAQ to answer questions not only about our service, but also about the technical aspects of Oxford Nanopore (ONT) platform.
- This resource is divided into Service/Logistics and Technical sections.
- If your question isn't covered in the FAQ, please reach out to us directly at nanopore@quintarabio.com.
Q: What is the cost of whole plasmid sequencing?
The cost is $15 per DNA sample and $20 per colony sample.
Q: What is the expected turnaround time?
For samples deposited in our dropboxes by 5 PM, you should expect sequencing results the following morning for DNA samples. Colony samples typically require an extra day for processing due to an additional miniprep step.
Q: Does Quintara process samples on the weekends?
All our sequencing services are available 7 days a week at our Boston facility and 6 days a week (excluding Sundays) at our California facility.
Q: What types of samples can I submit for nanopore sequencing?
We accept plasmid, RCA, amplicon (2kb+), and colony samples. For optimal results, we highly recommend submitting samples as either purified plasmids or colonies. These sample types have the highest reaction success rate. Colony samples are cultured and miniprepped prior to sequencing.
Q: What are the requirements for submitting plasmid samples?
For high-copy plasmids, we kindly request a minimum volume of 5 µL with DNA concentrations ranging between 100-200 ng/µL. We recommend submitting at least double the minimum volume so that we have enough DNA to repeat any reactions if necessary.
For low-copy plasmids, please provide us with the highest DNA concentration you can obtain from a miniprep with a minimum volume of 10 µL. This ensures we have the best chance of accurately sequencing your sample despite the lower amount of starting material.
Q: What are the requirements for submitting linear DNA samples (RCA/PCR)?
Please provide us with 10-20 µL of your undiluted sample. Giving us 20 µL ensures we have enough material to accurately sequence your DNA and perform repeats if necessary.
Prior to submission, please verify the quality of your PCR/RCA products by running them on a gel.
Though magnetic beads or column-based purification is not a mandatory step, we strongly encourage purifying your samples before submission as it significantly reduces the likelihood of reaction failure.
High concentrations are a must when sequencing linear DNA. We don’t recommend submitting your sample if you notice a dim band or multiple bands on the gel.
Q: What are the requirements for submitting colony/bacterial samples?
We accept colony samples in various formats: glycerol stock, agar plate, suspension in water/media, and bacterial culture.
To ensure that we accurately and efficiently process your agar plate orders, please carefully follow the instructions below:
- Specify in Order Notes: Clearly indicate in your order notes that you are submitting multiple colonies from one or more agar plates. For example, if you have an agar plate labeled "Plate A" and wish to sequence 4 colonies from it, your note might say, "Sequence 4 colonies from Plate A."
- Match Reactions to Colonies: The number of reactions you request should correspond to the number of colonies you want sequenced. This alignment is crucial for ensuring that we sequence each colony as intended.
- Naming Conventions: When naming your
reactions, incorporate the agar plate name along with a unique identifier for
each colony. This method helps us precisely identify which colony corresponds
to which sequencing reaction. For instance:
- Plate A_colony1
- Plate A_colony2
- Plate A_colony3
- Plate A_colony4
Q: Are colony samples sequenced directly?
We use your colony/bacterial sample to grow a culture followed by a miniprep prior to sequencing. Therefore, colony samples typically have a 2-3 day turnaround time.
While sequencing of colony samples using quicker methods is possible, these alternatives result in a higher failure rate and lower quality data. Purified plasmid DNA consistently yields the most accurate and reliable sequencing data. By starting with purified plasmid, we significantly reduce the risk of sequencing errors and reaction failures, ensuring that your data is both high-quality and dependable.
Q: Can I receive my leftover minipreps from the colony order?
Yes, we offer the option to return the remaining miniprep samples after sequencing. There is a delivery fee of $10 per order for this service. If you're interested in having your samples returned, please check the appropriate box on the order submission form.
Q: What if I don’t have enough DNA to meet the minimum sample requirement?
If your sample is a high-copy plasmid with a clean gel band and minimal genomic DNA contamination, concentrations as low as 20 ng/µL may still sequence well.
While nanopore sequencing can often succeed even at lower DNA concentrations, adhering to our recommended guidelines maximizes the chances of a successful reaction outcome. Please note that Quintara cannot be held responsible for reaction failures attributable to insufficient sample quality or quantity.
Q: What kind of tubes should I use for sample submission?
For sample submission, we highly recommend 8-well strip tubes or a 96-well PCR plate for those with 48 or more samples. The use of 1.5mL microfuge tubes is strongly discouraged due to the increased processing time they require and the higher risk of sample mix-ups, which could lead to delays in receiving your results.
Q: How do I place an order?
After logging into your Quintara account, please find the black panel located on the left side of the main dashboard. You should see “Whole Plasmid Sequencing” as an option under the “Nanopore” section. Click this to access the order form.
In the order form, please indicate the sample type, and fill in the mandatory sections (sample name, copy number for plasmids, amplicon size for PCR products, and culture type/antibiotics for colony samples). Other sections are optional (concentration, notes).
Q: Who do I contact if I have questions about my order?>
If you have any questions or need further assistance with your order, our technical support team is here to help. Please reach out to us at nanopore@quintarabio.com for prompt and detailed support.
Q: What is Quintara’s sequencing rerun policy?
If you encounter any issues with your sample results, do not hesitate to contact us at nanopore@quintarabio.com to discuss a rerun. For non-RCA samples, We offer one reaction repeat free of charge for any results that don't meet your expectations. Currently there is a basecaller problem with RCA samples that results in high reaction failure rates (currently being investigated by ONT). We therefore only resequence RCA samples on a case by case basis.
Q: When should I choose nanopore over sanger sequencing?
Nanopore sequencing is the preferred choice in several scenarios:
- Large, Complex Regions: If you're sequencing a region that requires three or more primers to cover effectively.
- Challenging Sequences: For regions of high GC content or strong secondary structures that are difficult for Sanger sequencing to navigate.
More details are available in the technical questions section of the FAQ.
Q: When should I choose sanger sequencing over nanopore?
While nanopore sequencing offers many advantages, there are instances where Sanger sequencing may still be more suitable:
- Small Regions: For sequencing small regions of interest, Sanger sequencing can be more cost-effective.
- Homopolymer Regions: Sanger sequencing is objectively better at resolving long homopolymer stretches (8bp or more), such as polyA tails, with higher accuracy.
Q: Does Quintara offer any discounts?
We do offer discounts based on order volume. Please email us at nanopore@quintarabio.com for further information!
Q: What is the maximum DNA length that can be effectively sequenced using this technology?
Our sequencing workflow is optimally designed to handle plasmids up to 30 kb in size.
For plasmids larger than 30 kb, it's important to note that while sequencing can still be performed, problems often occur during assembly of the consensus sequence. However, individual FASTQ reads can still be aligned to your reference map. This enables you to nanopore sequence larger plasmids, although with some considerations for data interpretation.
Q: Do you accept nanopore projects beyond whole plasmid sequencing?
Absolutely! Our team is actively working on developing new nanopore services to expand the range of applications and projects we support.
For custom projects that leverage the unique capabilities of the nanopore sequencing platform, we encourage you to contact us at nanopore@quintarabio.com to discuss your project ideas and how we can tailor our services to meet your needs.
Q: How can I provide feedback on your services?>
We highly value your input. For any feedback or suggestions, please email us at nanopore@quintarabio.com. Your insights help us serve you better.
Q: Can I have the raw reads associated with my order?
Certainly! This is not a typical request, so please email us directly at nanopore@quintarabio.com to receive your raw reads in FASTQ file format.
Q: Can I also have the FAST5/POD5 files associated with my order?
We can certainly provide those as well, although this is an even less common request. Please reach out to us at nanopore@quintarabio.com.
Q: How does whole plasmid (nanopore) sequencing work?
Whole plasmid sequencing is performed by tiny protein channels known as nanopores. As a DNA strand passes through a nanopore, it disrupts an electrical current. These current disruptions are picked up by a sensor and translated into a DNA sequence. Each DNA strand generates a read representing the sequence of a single molecule in your sample. By sequencing numerous molecules and pooling the data, we can reconstruct the complete sequence of your plasmid.
Q: What kinds of data are included in my results?
Your sequencing results are organized into folders, each dedicated to a specific type of data.
Here's what you can expect:
- Chromatogram (ab1) Files: For those accustomed to Sanger sequencing, these files simulate nanopore data in a Sanger-like trace format for familiar analysis.
- FASTA Files: These text-based files contain your sequence data in a universally recognized format, ready for alignment with your reference sequence.
- Genbank Files: Enhanced with annotations from the pLannotate library, these files detail plasmid features. Tools like Snapgene Viewer allow you to visualize the elements on your plasmid map or perform direct sequence alignments.
- Per Base Breakdown: This CSV data file offers detailed base-by-base information about your sequence. A separate, supplementary CSV file highlights any low-confidence bases that may require your attention.
- QC Reports: A comprehensive PDF report that includes essential quality control information about your sample (i.e. sample purity).
Q: What sequence files can I use to align directly against my reference sequence?
For aligning sequence data, you can utilize the following files:
- Chromatogram (ab1) Files: These offer a Sanger-like representation of your nanopore data.
- FASTA Files: A standard text-based format containing the raw sequence data, ready for direct alignment tasks.
- Genbank Files: Annotated with plasmid features, these files not only provide sequence data but also contextual information about the plasmid, which can be particularly useful for visualization and alignment in tools like Snapgene Viewer.
Q: Which sequence file SHOULD I use for alignment purposes?
All three file types—ab1, FASTA, and Genbank—are suitable for alignment tasks. However, we prefer the ab1 file since it also includes detailed per-base information. For example, positions of low confidence are identified as heterozygous (mixed) peaks, akin to Sanger mixed peaks and low coverage positions appear as shorter peaks with heights proportional to coverage depth.
It's important to note that the ab1 format is artificially generated since nanopore sequencing does not produce actual chromatogram data. Consequently, not all alignment software support this file type. If you encounter compatibility issues, we suggest using Snapgene Viewer, which is available for free download.
Q: Why are your sequence files labeled with the word “contig”? What does this mean?
The term "contig" refers to a consensus sequence file generated by our whole plasmid sequencing pipeline. This label signifies that the sequence you're viewing is constructed from numerous reads aligned and merged to form a single, consensus sequence.
When your sample contains multiple DNA species, our pipeline may produce several contigs. These are systematically numbered to differentiate between them.
By labeling these sequences as "contigs," we aim to provide a clear indication of their origin.
Q: What are the advantages of nanopore over sanger sequencing?
Nanopore sequencing offers a range of advantages that address some of the limitations encountered with traditional Sanger sequencing. Key advantages include:
- Lower Failure Rate: Nanopore sequencing exhibits a significantly reduced failure rate for plasmid samples.
- No Primers Required: This technology is primer-free, simplifying the preparation process and reducing the overall cost and time involved in sequencing.
- Cost-Effective for Unknown Plasmids: Since primers aren’t required, nanopore sequencing is an effective way to verify unknown plasmids.
- Complex Sequence Compatibility: It excels at sequencing through challenging regions such as GC-rich sequences, repetitive areas (excluding homopolymers), and strong secondary structures like hairpins and inverted terminal repeats (ITRs), where Sanger sequencing often struggles.
- Sample Quality Insights: Nanopore technology can also provide valuable sample quality information, including the percentage of E-coli genomic DNA contamination, sample purity (identification of other DNA species), and multimerization.
Q: What are the disadvantages of nanopore over sanger sequencing?
While nanopore sequencing offers numerous advantages, it also has a few, specific limitations when compared to traditional Sanger sequencing:
- Long Homopolymers: Nanopore sequencing faces challenges in accurately sequencing long homopolymer regions (sequences of the same nucleotide repeated more than 8 times). These regions can sometimes result in insertion-deletion (indel) artifacts.
- Artifacts in Methylated GATC Regions: Occasional sequencing artifacts can occur in GATC sequences, caused by dam methylation.
We expect these issues to become less relevant as the technology progresses.
Q: What is multimerization?
Multimerization refers to the process whereby copies of the same plasmid within a bacterial cell undergo recombination, leading to the formation of concatenated plasmids. These can range from dimers and trimers to tetramers and beyond, resulting in a portion or all of the sample consisting of these multi-copy plasmids. This phenomenon came to light once long-read sequencing became more widely used.
Multimers are undetectable using Sanger sequencing. They're also hard to identify with gel electrophoresis since digesting a multimer would cause it to resemble its monomer form. For a deeper understanding of plasmid multimerization, check out the Addgene article: Plasmids 101: Dimers and Multimers (addgene.org)
Q: How accurate are the sequencing results?
Outside of dam methylated and homopolymer regions, nanopore sequencing is very accurate; we almost never see any artifacts provided there is sufficient depth of coverage.
If you encounter any unexpected or questionable results, or suspect a mis-assembly of your plasmid sequence, we encourage you to contact us directly at nanopore@quintarabio.com. We will perform a deeper analysis of the results to determine whether there are any potential artifacts present in the data.
Q: How much coverage do I get per sample?
We aim for 500+ reads per sample, but this can vary greatly depending on sample quality, plasmid copy-number, sample type, and even what samples from other customers are being run alongside yours. If the sample quality is good, we can typically achieve accurate results with fewer than 200 reads.
Q: IMPORTANT: HOW DO I ANALYZE THE PDF REPORT?
We strongly urge all our clients to examine the report file that comes with each sequencing reaction. The information contained within could save you countless hours of time.
The report is broken down into several sections:
Section 1 - Sample Name and Processing Date:
This section displays the sample name along with the date on which the data was processed.
Section 2 - Read Count Table:
This table contains information about the total number of reads obtained from the sequencing reaction as well as the percentage of host genomic DNA in the sample.
In the example table below, a total of 1,673 reads were obtained for this sample, corresponding to 6,314,277 bp of sequence data. This latter number is derived from the sum of all reads multiplied by their respective read lengths.
For this example, reads originating from the host genome comprise 0.48% of all reads and 0.73% of all bp sequenced.
When estimating the amount of genomic contamination within your sample, we recommend using the percentage of total bases rather than the percentage of total reads.
Note that shorter (highly fragmented) E-coli genomic reads may not be properly categorized. This is because setting more aggressive filtering conditions could cause actual plasmid reads to be wrongly categorized as genomic DNA. We advise you treat the genomic contamination values as an underestimate.
Section 3 - Contig Details Table:
This table contains information about each consensus (contig) sequence generated by the assembly pipeline. Details include sample name, sequence length, the percentage and number of reads that map to that particular contig, and whether the contig is circular.
In the example below, there is one consensus sequence with a length of 5431 bp. 1442 reads mapped to it. The percentage of bases mapped is 95% and the consensus represents a circular DNA (plasmid).
You’ll notice that 95% of bases mapped to the consensus. In other words, 5% didn't map. The missing 5% are likely E-coli genomic reads that weren't properly filtered out or could be some unknown DNA.
Although the assembly software is able to classify and filter out most genomic DNA reads, there is a minimum read-length cutoff to ensure that reads from non-genomic sources aren’t accidentally misclassified as genomic DNA. As a result, a certain percentage of shorter genomic DNA reads aren’t properly categorized. Of course there may also be some reads that are truly of an unknown origin.
If your sample consists of one species, you should expect to see a single contig displayed on this table. If multiple contigs are listed, there is a good chance that your sample contains multiple DNA species.
For a pure plasmid sample, you should expect the mapping percentage to be greater than 80%. Percentages below that tend to correlate with high levels of genomic DNA in the sample.
If coverage depth is inadequate, sometimes the assembly pipeline will miscategorize a circular DNA as linear.
It should be noted that even if multiple species are present in your sample, the assembly software may not detect all of them, or the assembly could fail as a result. It’s important to scrutinize the read-length histogram to figure out if multiple species are indeed present (details on this in a later section).
Section - 4: Coverage Map:
The coverage map is a graph that plots the coverage depth (the number of reads that map to each base of your consensus sequence) on the Y-axis and the base position of the consensus sequence on the X-axis. You should look for any sudden, sharp changes in read depth as that’s a possible indication that there are several related species within your sample.
The base position axis on the graph is relative to your sequence file, NOT your reference map! For example, if you see a sharp drop in read depth from positions 2000-3000, you should open the genbank or FASTA sequence file and analyze the sequence from base positions 2000-3000. DO NOT open your own reference map and look at positions 2000-3000.
The coverage map is a great tool for figuring out if the contaminant species within your sample are derived from or related to the consensus sequence. For example, if you have a 10 kb plasmid and somehow 20% of the plasmid population lost 2kb of the insert during the culture process, you should see a sharp 20% drop in coverage depth spanning 2 kb around the region of the sequence where the deletion occurred.
Any low confidence positions will be marked with an orange “x” on the coverage map.
Section 5 - Read-Length Distribution:
The read-length distribution displays the number of reads of each read-length used for the contig assembly.
This is by far the most important section of the entire pdf report. You should at the very least check the read-length distribution for all of your samples.
Disclaimer: The read-length distribution is only useful for circular DNA samples. You should not use it to draw conclusions about sample purity of linear DNA (RCA/PCR). This is because any linear DNA will be fragmented during the library prep process, resulting in a wide range of peaks in the read-length distribution. This makes it indistinguishable from indicators of potential contamination.
For clean plasmid samples, you should see either a single tall peak or two adjacent tall peaks in the graph. Examples below indicate pure samples:
The read-length(s) of the tall peak(s) should be roughly equal to the expected size of your plasmid.
If you come across multiple non-adjacent tall peaks in the read-length distribution, one of two scenarios likely applies to your sample. The first is somewhat benign while the second requires your immediate attention.
Scenario 1: Some degree of multimerization is present in your sample.
If you see a tall peak corresponding to the expected plasmid size and additional tall peaks corresponding to a multiple of your expected plasmid size, your sample most likely contains multimers. An example is shown below:
This phenomenon is quite common and shouldn’t really affect your experiments.
Scenario 2: One or more DNA contaminants are present in your sample.
If there are multiple tall peaks and scenario 1 doesn’t apply, then it’s almost certain that the sample contains multiple DNA species. Below is an example:
You can see that there are three tall peak clusters. One at ~9 kb, one close to ~10 kb, and one at ~12 kb. This indicates that there are three species of those particular sizes in the sample.
You can ignore the tall peak at 1000bp, as very short reads (1kb or less) are the byproducts of the library prep and should not be factored into the analysis.
Q: Can I use the read-length distribution to analyze non-plasmid samples?
Nope. This is because any linear DNA will be fragmented during the library prep process, and this is captured as a wide range of peaks in the distribution, which is indistinguishable from indicators of potential contamination. The read-length histogram is only valid for determining sample purity for CIRCULAR DNA (i.e. plasmids, but circular viral DNA may work as well).
Q: I noticed there’s sometimes two adjacent tall peaks in my read-length distribution. Is this an indication of contamination?
Your sample is fine. In reality, most of the reads from your sample will lose a couple bases from the ends during the library prep process. As a result, read lengths typically fall within a size range. The read-length distribution generated in the report may have different bin widths from sample to sample and this causes the plasmid peak to appear as either a single peak (if the histogram bin width is large), or two adjacent peaks (if the bin width is small). However, please make sure the two peaks are NOT separated by one or more bin widths. Otherwise that is very likely an indication that your sample contains two different plasmids of similar sizes.
Q: What is considered a high level of genomic DNA contamination?
We’ve noticed that the extent of genomic DNA contamination can depend on the process used to purify the plasmids. Cesium chloride-based purification methods produce DNA with the lowest levels of E-coli genomic contamination (~0%). Silica column preps (typically minipreps) also produce relatively clean preps (0.1-2% genomic DNA contamination). Anion exchange preps typically have higher levels of E-coli contamination (1-20%), though this could be due to longer culture times during scale-up.
Copy-number also impacts the level of genomic contamination. Low-copy plasmids tend to have very high levels of genomic contamination while high-copy plasmids are much lower.
Most plasmid samples fall within the 0-20% genomic contamination range. If your sample is higher, that may be a cause for concern.
Q: I received more than one sequence (contig) for my sample. What does this mean?
If your sample is supposed to be a plasmid, then there's probably some other contaminant present, likely another plasmid species. We recommend analyzing the read-length distribution in the report pdf to check for peaks corresponding to each consensus sequence. If each consensus has a corresponding peak in the histogram, that's a strong indication that a contaminant DNA species is present.
If your sample is linear DNA (RCA/PCR), it’s more likely that the additional sequences you received are assembly artifacts generated during downstream processing. For linear DNA, assembly is much more challenging and oftentimes will generate multiple sequence contigs even if the sample contains only one species. We advise that you ignore all the contigs that don’t match your reference sequence. Typically, the longest contig is the correct one. At present it's not possible to draw any conclusions about sample purity for linear DNA samples.
Q: I see multiple peaks in my read-length distribution. Does that mean there may be contaminants?
If there are peaks with sizes corresponding to double, triple, or even quadruple the expected size of your plasmid, your sample is likely affected by multimerization.
If your sample is not a purified plasmid, the read-length distribution can’t be used for purity analysis.
If you see two adjacent peaks with sizes roughly corresponding to the expected size of your plasmid, that’s a sign your sample has no issues.
Otherwise, yes. Your sample probably is contaminated.
Q: How reliable and accurate is the information displayed in the read-length distribution?
These results are highly reliable, virtually on par with gel electrophoresis for determining DNA length. Here’s how it’s measured: the DNA strand is guided by a motor protein when passing through the nanopore and always travels at a constant rate of 400 bp/second. Taking this information and the time it takes for the DNA strand to fully pass through the nanopore (which is measured by the sensor), we can accurately determine the length of the DNA strand/read.
There are two main caveats:
The sample MUST be a circular plasmid.
Even if the sample IS a circular plasmid, there can’t be too many peaks in the read-length distribution, which is an indication that there’s a lot of highly-fragmented genomic DNA present that weren't properly filtered out. This usually correlates with a high percentage of genomic DNA in the Read Count Table of the PDF report.
Q: Can I sequence my sample if I know it contains a mixture of plasmids, amplicons, etc?
This is not recommended as the assembly workflow is geared towards clonal samples. Sometimes, the assembly software will correctly identify some if not all of the species within a mixture, but the reaction is more likely to fail due to the inability to reconcile all the disparate sequences originating from different sources. If you're troubleshooting to figure out if something is wrong with your sample, the read-length distribution would still be very useful even if your reaction failed.
Q: My plasmid contains very large repeats (1kb+). Would this affect contig assembly?
Assembly works fine for repeats less than 1 kb. However for larger repeats, especially very large ones (4kb+), problems may occur during the assembly process. If your sample contains a large repeat, please reach out to us at nanopore@quintarabio.com so that we can apply special conditions to your reactions during the assembly step.
Q: The length of my consensus sequence doesn’t match up with the tall peak’s corresponding read length in the distribution. What does this discrepancy mean?
This section applies only to plasmid samples.
Assuming your sample is a plasmid, you should first check if the peak in the read-length distribution is a multiple of the size of your consensus sequence. If so, then most likely the majority/all of the DNA in your sample have multimerized.
Otherwise, there are two points to consider:
The consensus sequence is derived from a complex computational process that assembles together many reads. The process is not entirely foolproof. Assembly errors do happen from time to time, though it is typically quite rare for clonal plasmid samples.
On the other hand, the read-length distribution is derived from a precise measurement made during the sequencing process and is highly reliable. The read-length distribution is not determined by the assembly process.
Given these two points, if there is a discrepancy between the consensus sequence and read-length distribution, you should always trust the read-length distribution. You can also reach out to us and ask for a re-assembly of your data. Sometimes, just reprocessing your data will produce the correct consensus sequence.
Q: Why are there ambiguous (non-ATCG) bases in my ab1 (chromatogram) file?
When designing the chromatogram file, we made the conscious choice to mark any heterozygous/non-clonal peaks with an ambiguous base so that it’s easier to catch during sequence alignment.
A heterozygous peak occurs when a significant proportion of reads map to two or more unique nucleotides at a specific position in the consensus sequence. For example, if 70% of reads map to a C and 30% of reads map to a T at position 2182 of the consensus sequence, you will see a “mixed” peak consisting of a taller C (blue) peak and a shorter T (red) peak. You will also notice the base is called as a “Y” instead of a “C”. The “Y” allows for easier detection during analysis and represents the ambiguity of a C + T.
Below is a reference table for ambiguous bases:
Ambiguous Base | Meaning |
R | A or G |
Y | C or T |
S | G or C |
W | A or T |
K | G or T |
M | A or C |
Q: Why are there sometimes multiple ab1 files associated with a single contig? Should there be any overlap between the ab1 files?
Chromatogram (ab1) files are designed for Sanger sequencing, which generates reads up to only 1000 bp in length. Due to this inherent limitation in the ab1 file format, we must divide longer sequences, such as full plasmid sequences, into smaller segments. This division is necessary to accommodate the file type's read-length restriction. Consequently, each segment should be distinct, without overlap in the traces. If you observe overlapping between nanopore ab1 traces, that indicates the presence of an insertion in your contig sequence relative to your reference.
Q: I notice mismatches/low confidence positions outside of homopolymer regions. Are these real or artifactual?
If the coverage depth is low (<100), the mismatches could be real or artifactual; it's difficult to tell.
For linear DNA samples, artifacts may occur at the ends of the consensus sequence even with high coverage depth.
For plasmids, there is a frequent occurrence of a particular type of artifact:
If mismatches are occurring on or immediately adjacent to the “GATC” sequence pattern, these are almost always artifacts caused by dam methylation. These artifacts follow a very distinct pattern which make them easy to spot:
The C base at the end may be miscalled as a T or there may be a mixture of C + T (indicated as an ambiguous Y base in the ab1 file).
Example of an artifact in the last position:
The G base at the beginning of the “GATC” pattern may be miscalled as an A base or there may be a mixture of G + A (indicated as an ambiguous R base in the ab1 file). If the base that comes before the G is an A, that may also be affected.
Example of an artifact in the first position:
Example of artifacts in the first position + preceding A base:
Lastly, an example of all artifacts present in the same “GATC” stretch:
Note that dam methylation artifacts do not affect PCR/RCA samples as they aren’t methylated.
Q: How do I analyze the per-base CSV data?
Most of the information found in these CSV files are already incorporated into the ab1 file type, so if you’re using ab1 files for sequence alignment, there isn’t really much of a reason to check the per-base data.
The “per_base_details” CSV file contains a table with each row corresponding to a specific base of the contig sequence.
For example:
Below is a description of the information found in each column (numbered from 1-13):
- pos: indicates the base position within the contig sequence.
- base: identifies the nucleotide base that is most likely to be correct.
- depth: the number of reads that map to the base position
- match: the number of reads that match the likely correct base from column 2.
- vaf: refers to the “Variant Allele Frequency” - the proportion of total reads that map to the likely correct base. This is calculated by dividing the match value (column 4) by the depth value (column 3).
- G: the number of reads that map to the G base.
- A: the number of reads that map to the A base.
- T: the number of reads that map to the T base.
- C: the number of reads that map to the C base.
- ins: the number of reads that map to a point insertion.
- del: the number of reads that map to a point deletion.
- homo: indicates whether the base position is part of a homopolymer stretch and identifies the homopolymer base.
- confidence: marks any low-confidence base position with the word “low”.
- In a separate “low_confidence_bases” CSV are all the corresponding base positions marked as low confidence in column 13 of the “per_base_details” CSV.
Chromatogram Viewers
For Windows and Mac OS
- ApE, A Plasmid Editor, allows alignment with GenBank files
- Finch TV (Geospiza),
- Sequencher (Genecodes)
- Staden Package, a free open source genomic analysis package
Windows Specific
Mac OS Specific
- 4Peaks , supports Mac OS X 10.3 or above
Primer Design
- Primer-BLAST, an online primer design on DNA template
- PrimerX, an automated design of mutagenic primers for site-directed mutagenesis based on DNA or protein sequence
Utilities
- 7-Zip, to decompress zip file for Mac OS and Windows
Other Tools
- Oligo Calculator, to calculate Tm, GC content, and Molecular Weight of your oligo
- PrimerBank, a primer database for human and mouse genes