I am very new to bioinformatics and I am analysing datasets from differents studies. I am collecting these datasets from NCBI using SRA toolkit. My reads are paired-end and I have managed to split them, so I have the forward and reverse files.
I am going through the 'Importing files tutorial' and I am thinking I can just replicate the code line given under the demultiplexed Cassava data type. But, to be honest, I have no idea what to do. .
any suggestions on how I can go about this?
Also, I have created my metadata table on excel, saving as a tsv document. I became a bit confused when reading the metadata tutorial about the validation, is this the line I will need to use/
I think your best bet for importing the paired end SRA reads is a manifest. Your names do not match the specific casava format, and so rather than trying to make that work, you'll find the manifest easier.
This is just a notice to let you know there's new functionality coming soon. It doesn't effect you now, but in the future, you can validate your metadata in the command line rather than having to use Keemei on google sheets - great for anyone who can't upload their metadata to google.
Thank you for you quick response.
I have created a manifest table on excel and saved as a tab delimited file (txt) containing all the filepaths for my reads.
You’re close! The QIIME command is warning you that it can’t find your arguments. you can fix this by wrapping the command using a \ character. so, your command should look like:
Another piece of (entirely unsolicited) advice: if. you replace the spaces () in your file paths with dashes (-) or underscores (_), it will make it easier to navigate on your linux system. Linux parses spaces as part of commands rather than a filepath. It wont affect the import, but it will make things harder later.
Thank you Justine, I have taken up on your advice and changed the file paths, and it seems to have worked, however I am getting a different error now, which I am not sure what it means.
It looks like both you and I missed something. The semantic type, SampleData[SequencesWithQuality] is only forward reads. I think you need SampleData[PairedEndSequencesWithQuality].