hey,
I have downloaded all the raw American Gut Project data and want to start analysis on the data. I saw that I got many barcode files, and their corresponding reverse and forward fastq files.
If I do analysis on the fastq files without adding the barcode files, what will happen to the results? Will they be inaccurate or will they be missing information?
any insights on this will be helpful
thank you in advance
Nadav
Hello, and welcome to the forum!
It is not clear what you mean. But let's try to clarify some things.
- Barcodes are short unique DNA sequences that are added to amplicons to be able to pool different samples into one tube for sequencing and then divide sequences according to the barcode to differentiate samples.
- You don't barcode sequences if your sequences already demultiplexed (separated by samples).
So, if you mean that you downloaded many fastq files, one forward and one reverse for each sample, then you don't need barcodes.
If you have one forward and one reverse file, that contains a lot of samples, then you need to demultiplex it (separate samples). For that barcodes are required.
Thank you for the fast response!
I will clarify what I mean in my question. I have a single forward, and reverse file for each run (from what I understand they contain multiple samples) and a barcode file for those files.
I have multiple directories of data with this data (different fastq files in every one of those). I want to analyze that data.
The problem is that I don't know if I can analyze all the data together using the qiime tools import
, manifest file and get accurate results about the data, or I need to use the barcode file to do an intermediate step ( I am not sure if that is how you say it)
If I do need to use the barcode file how do I do it?
Thank you very much
That means that you have multiplexed reads.
So, each directory correspond to different run? For example, you have N sequencing runs and N folders with barcode file, and forwarded and reverse sequences?
If my assumptions above are correct, you need:
Here, if you want to metge data from different runs, the commands should be identical for each run!!!
- Import each folder/run separately to Qiime2, with manifest file or without.
- Demultiplex each run. For that you need barcodes. Check qiime2 plugins that can demultiplex data (cutadapt or another)
- Remove primers from each run
- Run Dada2 for each run
Merging data
- When you are done, merge feature tables and representative sequences from all runs (if needed)
- From here you can proceed with any other analyses with merged data (or for each run separately if you don't need to merge)
Yes, your assumptions are correct!
Thank you very much for your help,
Take care!