That all sounds good.
Of note, q2-sidle
recommends retaining sequences that do not have more than 3 ambiguous bases. Whereas, RESCRIPt's default is to remove sequences that contain 5 or more ambiguous bases. That is, RESCRIPt is allowing sequences with 4 or less ambiguous bases.
So, you can either add the following command after your rescript cull-seqs
command :
sidle filter-degenerate-sequences --p-max-degen 3 ...
or simply rerun the following RESCRIPt command, which is equivalent to the above q2-sidle command:
rescript cull-seqs --p-num-degenerates 4 ...
Yes, the 4
is correct as we are removing sequences with 4 or more ambiguous bases (i.e. allowing a maximum of 3
).
This may not solve the memory issue though.