python - ipcluster on Sun Grid Engine has only ranks 0 -


i set ipython parallel ipcluster use sun grid engine , things seem work fine:

ipcluster start -n 100 --profile=sge

2016-07-15 14:47:09.749 [ipclusterstart] starting ipcluster [daemon=false] 2016-07-15 14:47:09.751 [ipclusterstart] creating pid file: /home/username/.ipython/profile_sge/pid/ipcluster.pid 2016-07-15 14:47:09.751 [ipclusterstart] starting controller sgecontrollerlauncher 2016-07-15 14:47:09.789 [ipclusterstart] job submitted job id: u'6354583' 2016-07-15 14:47:10.790 [ipclusterstart] starting 100 engines sgeenginesetlauncher 2016-07-15 14:47:10.826 [ipclusterstart] job submitted job id: u'6354584' 2016-07-15 14:47:40.856 [ipclusterstart] engines appear have started 

then connect notebook using

rc = ipp.client(profile='sge')

but when use parallel magic

%%px mpi4py import mpi  comm = mpi.comm_world nprocs = comm.get_size() rank = comm.get_rank()  print('i #{} of {} , run on {}'.format(rank,nprocs,mpi.get_processor_name())) 

i processes return rank 0:

[stdout:0] #0 of 1 , run on compute-8-13.local [stdout:1] #0 of 1 , run on compute-8-13.local [stdout:2] #0 of 1 , run on compute-3-3.local [stdout:3] #0 of 1 , run on compute-3-3.local [stdout:4] #0 of 1 , run on compute-3-3.local ... 

here setup scripts:


  • ipcluster_config.py:

    c.ipclusterengines.engine_launcher_class = 'sgeenginesetlauncher' c.ipclusterstart.controller_launcher_class = 'sgecontrollerlauncher' c.slurmenginesetlauncher.batch_template_file = '/home/username/.ipython/profile_sge/sge.engine.template' c.slurmcontrollerlauncher.batch_template_file = '/home/username/.ipython/profile_sge/sge.controller.template' 
  • ipcontroller_config.py:

    c.hubfactory.ip = '*' 
  • sge.controller.template

    # /bin/sh #$ -s /bin/sh #$ -pe orte 1 #$ -q sthc.q #$ -cwd #$ -n ipyparallel_controller #$ -o ipyparallel_controller.log #$ -e ipyparallel_controller.err module load gcc/5.3/openmpi  source activate parallel ipcontroller --profile-dir={profile_dir} 
  • sge.engine.template

    # /bin/sh #$ -s /bin/sh #$ -pe orte {n} #$ -q sthc.q #$ -cwd #$ -n ipyparallel_engines #$ -o ipyparallel_engines.log #$ -e ipyparallel_engines.err  module load gcc/5.3/openmpi source activate parallel mpiexec -n {n} ipengine --profile-dir={profile_dir} --timeout=30 

found solution/bug myself:

in ipcluster_config.py, forgot rename cases of slurm->sge, should be

c.ipclusterengines.engine_launcher_class = 'sgeenginesetlauncher' c.ipclusterstart.controller_launcher_class = 'sgecontrollerlauncher' c.sgeenginesetlauncher.batch_template_file = '/home/username/.ipython/profile_sge/sge.engine.template' c.sgecontrollerlauncher.batch_template_file = '/home/username/.ipython/profile_sge/sge.controller.template' 

this lead ipcluster use kind of default sge template, submitted 100 separate jobs instead of 1 job 100 processes.

now desired:

[stdout:0] #5 of 100 , run on compute-5-17.local [stdout:1] #9 of 100 , run on compute-5-17.local [stdout:2] #1 of 100 , run on compute-5-17.local [stdout:3] #7 of 100 , run on compute-5-17.local [stdout:4] #2 of 100 , run on compute-5-17.local ... 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -