Feature-classifier sklearn takes more times for Silva 132 database?

khaknasheen · November 30, 2018, 12:43am

Dear Sir,

I have got an error during using feature classifier. [Errno 28] No space left on device. The classifier has been trained from Silva 132 database. The classifier132.qza size is 196483 kb and rep sequences size is 424 Kb. The system is Linux and having following features.

Filesystem           Size  Used Avail Use% Mounted on
/dev/sda5            101G   66G   30G  69% /
tmpfs                190G  252K  190G   1% /dev/shm
/dev/sda1            477M  142M  310M  32% /boot
/dev/mapper/vg-vgoo  7.3T  6.9T   21G 100% /home
/dev/sda3             58G   43G   12G  79% /tmp
/dev/sdd1            939G  192G  700G  22% /biosoft
/dev/sdd2            6.3T  4.5T  1.5T  75% /data
/dev/sde1            7.2T  5.2T  1.7T  76% /data1
/dev/sdf1            7.2T  3.7T  3.2T  54% /data2
/dev/sdg1            7.2T  6.7T  152G  98% /data3
/dev/sdh1            7.2T  5.4T  1.5T  80% /data4

From the above directories, I use home and bio soft however got an error. [Errno 28] No space left on device. Then I read the forum questions and answers to solve this problem. I got an idea from a discussion of Leader Thermokarst at; No Space Left on Device for qiime feature-classifier. I used his commands at the same directory where I have got the feature table and feature sequence. The commands were as

export TMPDIR='/data'
echo $TMPDIR

Then I used the following classifier commands

qiime feature-classifier classify-sklearn \
--i-classifier classifier132.qza \
--i-reads rep-seqs270224.qza \
--o-classification taxonomyrep-seqs270224.qza

I don't get any error but It has been a long time. I have started it at 1:00 am and now it becomes 8:30 am. It is still working. I am really sorry. I don't know what to do, it takes so much time. Even I don't know the commands that I used are wrong or right. If the commands are right, how much time it takes? If you look at my directory and give me a suggestion to solve this problem I would be thankful. The commands are also attached here:

Yours Sincerely

thermokarst · November 30, 2018, 9:52pm

Please continue to wait --- these things can take a while. If you have the capability, you can specify the n_jobs parameter to a number greater than one to run multiple parallel jobs.

I recently ran this command on ~50,000 representative sequences, against a SILVA database, and it took 3 hours with 16 concurrent jobs (so, 48 hours total).

khaknasheen · December 1, 2018, 11:52pm

Sir,
I have become successful. It has started from 1:00 am and finished at 9:30 am. The commands were right. I have got the taxonomy file.

Thanks

system · January 2, 2019, 5:52am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.