I am new here and interested in running the deblur plugin on my Illumina sequencing data. However, I am a little confused with the set-up...so far I've created an environment with python 3.8 and numpy, but I am unsure about how to install the deblur plugin and how to proceed from there. I am new to programming and am feeling slightly lost in the support documentation. If anyone has some straightforward advice, I'd really appreciate it!
Apologies for my ignorance!
The q2-deblur plugin (and deblur itself) get installed as part of the base qiime2 enviroment. So, you'll want to follow the normal qiime2 installation instructions (as appropriate for your system). If you have a mac or linux machine, my recommendation would be a native installation. If you want a virtual box, you might want to wait a few weeks since the new virtual boxes will be out soon.
Once you have the enviroment installed (and activated!), try typing
Into the command line. You should see the deblur plugin show up without having to do any additional installation.
I am going to refer you to a couple of tutorial sections here.
Okay, so I had a look at both of the awesome tutorials you brought to my awearness in your previous post; however, I am not actually 100% certain which one pertains to my data... I am working with paired reads that are barcoded. Each read (forward and backward) provides half a barcode, and the 200bp gene sequence on read1, I think, is enough for my analysis.
So I am considering using my read1 as a single read - but let me know if you think this would not be the correct approach. Also, let me know if anything is unclear or if I can give more information!
I think then maybe we need to take a step back, and that's okay!
When you got your data, how many files did you get? Was it two files (forward and reverse)? Three files (forward, reverse, and index )? Four files (one forward, one reverse, forward index, reverse index)? Two files per sample? Have you imported the data into QIIME 2?
Was there any clean up done to the data before you got it other than splitting it? If you had it sequenced through a facility, you might need to reach out to check these details. (Talk to your sequencing facility, make friends with your sequencing facility).
If the primers haven't been removed, you'll need to use cutadapt to trim them.
Which hypervariable region did you target and how long are your reads? If your reads are too short, then you want to process your forward reads as single end. If they're long enough to join, then you probably want paired end. It sounds like your data is 2x200, which isn't a common read length (usually we see 2x100, 2x150, 2x250 or 2x300). It's not impossible, but it's not common.
Thank you for your continued support! I am excited to dive further into the project and so thankful to have your help!
Okay, so, right now, I am working with 2 fasta files. I have not yet attempted to import them into QIIME2 because I was hesitant about what specifications I should make to so. As I understand, QIMME deals with data using 'artifacts' , and these artifacts understand certain ways of interacting with data - right now I am unclear how to have QIMME interpret my data properly.
And I do think there has been some 'cleaning'. The files I am working with have been dereplicated and sorted. How might this change what I do when importing the files? Also, what is the reason for trimming the barcode primers, if present? Sorry if these questions are very 'low-level', I am very new to all this!
For more on the specifics of my data, I've included screenshots of what the files look like when opened in Text-Edit (mac). The first is for the AD domain target region. As you can see here, the annotations suggest the location tag (where the bacteria was sampled) and the read length. The second file is for the KS domain target region.
I have some good news and some bad news... with a heavy caveat.
So, the bad news first:
QIIME 2 will not support applying deblur to your data because it no longer has quality filtering. I'm not entirely sure if deblur will work as a base algorithm (although its within the qiime2 enviroment). Typically, dereplicated sequences don't go into denoising, since quality information is an important part of the process.
There are absolutely ways to move forward with this data, as long as you also have the table that associates counts of dereplicated sequences to your samples; I'm happy to talk you through this if its the best route.
Your other option (if you cna manage it) would be to reach out to the person who provided the data and see if they have fastq files for you. Your sequencing provider might have access to them, or the person who did dereplication for you. This would be my recommendation if you can get access to the data. I think having raw data makes it easier to know what you did and how that affects your results.
Yay! Your next step will be figuring out whether or not you need to demultiplex the files.
Usually if they're demultiplexed (I'm assuming Illumina 2x???), you'll have 2 files per sample labeled R1 and R2. In that case, you the manifest format to import them. I think there's a plugin that helps you build a manifest, but you can also just add it to your metadata file.
If you recieved 3 files (R1, R2, and R3 or I), you need to use the EMP demultiplexing. The moving pictures tutorial has an example.
If it's two files, you may need to go through cutadapt; I think the tutorial in in the tutorials section of the forum although I can never remember.
Once your files are converted in an artifact and demultiplexed, I recommend removing the primers using cutadapt. This step helps make your ASVs more uniform. If you have paired end reads, I would recommend using the trim-paired function if you've got paired end reads, and the trim-single if you don't. I tend to use the defaults, only specifying the primer pair. Occasionally, you may want to discard the untrimmed reads (--discard-untrimmed) if, for some reason, you need to demultiplex or there's some other related quality issue. Without knowing the data, it's hard to advise about whether you should discard untrimmed reads; I tend to just test this and see. Or, you can always verify with the people upstream of you!
At this point, you'll have reads that are ready for denosing with the tutorials I linked above.
If you're not sure about read overlaps (or really reads in any steps), I suggest summarizing your reads (qiime feature-table summarize) so you can see what you're analyzing and check if it makes sense.
Thank you so much for getting back to me - and don’t fret about the slight delay!
Okay, great. I think I follow.
So, from what I understand, I essentially need to create a qiime2 artifacts of my data (demultiplexed fastaq files) using the manifest format as you suggest; then, import these artifacts into qiime2 so that I can first clean the data up (i.e trimming) and run it through deblur.