Denoising and trimming already demultiplexed paired end data

Hello. I am new to using Qiime2 but attended a Qiime2 workshop in June. I successfully downloaded my demultiplexed sequences into Qiime2 from the sequencing facility website using the following command where “699” was the name of my flowcell:
wget -nH --cut-dirs=1 -r -np -A ‘fq.gz,fastq.gz,txt’ --accept-regex ‘(flowcell|Undetermined)’ http://illumina.bioinfo.ucr.edu/illumina_runs/699/
Now I don’t know how to proceed with the denoising and trimming step because (1) the command to download my data (provided by the sequencing facility) is different than that specified in the tutorial and (2) my data is already demulitplexed…

Thanks for any advice that you can provide,
Lisa

Hi @Lisa_Crummett!

No worries about this --- everyone's data is different, and as such, is fetched in its own special way. All that matters is that you ultimately wind up with your raw data in a local directory on the machine that you will be running QIIME 2 on.

Have you had a chance to look at our Importing Tutorial? We have two relevant sections here: Importing Casava 1.8 paired-end demultiplexed data and Importing generic Fastq data. I took a quick peek at the link you provided, and your data doesn't appear to be in the Casava 1.8 format (but I could be wrong here), so it probably makes sense for you to start with the second option, and create and Fastq Manifest File. This is basically like a phonebook of sample identifiers + read direction to filename. When you import this, QIIME 2 will massage your data to get things set up for all of your downstream steps (e.g. denoising/trimming). Give the Fastq manifest format a shot and let us know if you get stuck. Thanks!

1 Like

Having just gone through this steps recently myself I thought I’d share quick tip. You can quickly find the absolute paths to your files that you need in the manifest file with:

find 'pwd' /path/to/your/folder/containing/fastqs .htaccess

That lists the absolute paths of all samples that you can easily copy & paste into your csv manifest file. Beats doing them one by one.

I’m sure it wouldn’t be too hard scripting the manifest creation process from files in a target folder but this is easy as well.

On another related note, is it possible to create a fastq variant detection script in the event that you don’t know where the inherited files have come from?

1 Like

Hi @Mehrbod_Estaki, thanks for posting that tip!

Regarding a fastq variant detection script — there are many one-liner scripts floating around on forums like biostars and seqanswers — I would suggest looking there. If you have some ideas about implementing variant detection in QIIME 2, please start a new thread in Ideas and Suggestions, that way we can get a discussion rolling there. Thanks!

2 Likes

Great! Good to know that there’s maybe a way of doing this. I have no suggestions as how this can be done myself as its outside my capabilities but maybe I’ll post it as a suggestion as you recommended and hope it will generate some momentum for others to take on the task. I know lots of novice users in our department who either don’t know about the existent of the different variants at all or maybe they inherit some older files which they don’t even know how to track back to the original sequencing machines/platforms.

2 Likes

Hello Mathew. I wish that I could chat with you or another Qiime2 tech over the phone… I have spent hours just trying to import my paired end data by using the command in the “Importing generic Fastq data” tutorial and I think the problem lies with Qiime2 not “seeing” my fasta files, which are stored in a folder embedded in a shared folder that I created between Qiime2 and my windows machine. My manifest is also stored in that same folder. I tried using various commands of the sort:

qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path se-33-manifest.csv
–output-path paired-end-demux.qza
–source-format PairedEndFastqManifestPhred33

I get the following error message:

Exception: No transformation from <class ‘q2_types.per_sample_sequences._format.PairedEndFastqManifestPhred33’> to <class ‘q2_types.per_sample_sequences._format.SingleLanePerSampleSingleEndFastqDirFmt’>

I have also tried putting in the pathway to my fasta file[Sequences with Quality] insead of "SampleData[SequencesWith Quality] but that doesn’t work either…

Also, when I try to cd into this shared folder, it says that such a folder or directory does not exist and yet that folder is clearly present when I click on the folders icon on the right hand side in Qiime2.

Can you please advise… would it be possible to do “teamviewer” with someone over on your end?

Thank you,
Lisa

An off-topic reply has been split into a new topic: DADA2 denoise-paired Exit Code -9

Please keep replies on-topic in the future.

Note to anyone following along, @Lisa_Crummett and I coordinated via DM today and sorted out the import issues by updating the manifest file (mainly excel was adding extra columns to the CSV), and tweaking the import command to reflect paired-end sequences:

qiime tools import 
  --type ‘SampleData[PairedEndSequencesWithQuality]’ \
  --input-path se-33-manifest.csv \
  --output-path paired-end-demux.qza \
  --source-format PairedEndFastqManifestPhred33
2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.