python - Multiprocessing for calculating eigen value -


i'm generating 100 random int matrices of size 1000x1000. i'm using multiprocessing module calculate eigen values of 100 matrices.

the code given below:

import timeit import numpy np import multiprocessing mp  def caleigen():   s, u = np.linalg.eigh(a)  def multiprocess(processes):  pool = mp.pool(processes=processes)  #start timing here don't want include time taken initialize processes  start = timeit.default_timer()  results = [pool.apply_async(caleigen, args=())]  stop = timeit.default_timer()  print (processes":", stop - start)    results = [p.get() p in results]  results.sort() # sort results    if __name__ == "__main__":   global  a=[]   in range(0,100):   a.append(np.random.randint(1,100,size=(1000,1000)))   #print execution time without multiprocessing  start = timeit.default_timer()  caleigen()  stop = timeit.default_timer()  print stop - start    #with 1 process  multiprocess(1)   #with 2 processes  multiprocess(2)   #with 3 processes  multiprocess(3)   #with 4 processes  multiprocess(4) 

the output

0.510247945786 ('process:', 1, 5.1021575927734375e-05) ('process:', 2, 5.698204040527344e-05) ('process:', 3, 8.320808410644531e-05) ('process:', 4, 7.200241088867188e-05) 

another iteration showed output:

 69.7296020985  ('process:', 1, 0.0009050369262695312)  ('process:', 2, 0.023727893829345703)  ('process:', 3, 0.0003509521484375)  ('process:', 4, 0.057518959045410156) 

my questions these:

  1. why doesn't time execution time reduce number of processes increase? using multiprocessing module correctly?
  2. am calculating execution time correctly?

i have edited code given in comments below. want serial , multiprocessing functions find eigen values same list of 100 matrices. edited code is-

import numpy np import time multiprocessing import pool  a=[]  in range(0,100):  a.append(np.random.randint(1,100,size=(1000,1000)))  def serial(z):  result = []  start_time = time.time()  in range(0,100):       result.append(np.linalg.eigh(z[i])) #calculate eigen values , append result list  end_time = time.time()  print("single process took :", end_time - start_time, "seconds")   def caleigen(c):    result = []          result.append(np.linalg.eigh(c)) #calculate eigenvalues , append result list  return result  def mp(x):  start_time = time.time()  pool(processes=x) pool:  # start pool of 4 workers   result = pool.map_async(caleigen,a)   # distribute work workers   result = result.get() # collect result mapresult object  end_time = time.time()  print("mutltiprocessing took:", end_time - start_time, "seconds" )  if __name__ == "__main__":   serial(a)  mp(1,a)  mp(2,a)  mp(3,a)  mp(4,a) 

there no reduction in time number of processes increases. going wrong? multiprocessing divide list chunks processes or have division?

you're not using multiprocessing module correctly. @dopstar pointed out, you're not dividing task. there 1 task process pool, not matter how many workers assigned, 1 job. second question, didn't use timeit measure process time precisely. use time module crude sense of how fast things are. serves purpose of time, though. if understand you're trying correctly, should single process version of code

import numpy np import time  result = [] start_time = time.time() in range(100):     = np.random.randint(1, 100, size=(1000,1000))  #generate random matrix     result.append(np.linalg.eigh(a))                 #calculate eigen values , append result list end_time = time.time() print("single process took :", end_time - start_time, "seconds") 

the single process version took 15.27 seconds on computer. below multiprocess version, took 0.46 seconds on computer. included single process version comparison. (the single process version has enclosed in if block , placed after multiprocess version.) because repeat calculation 100 times, it'd lot easier create pool of workers , let them take on unfinished task automatically manually start each process , specify each process should do. here in codes, argument caleigen call merely keep track of how many times task has been executed. finally, map_async faster apply_async, downside being consuming more memory , taking 1 argument function call. reason using map_async not map in case, order in result returned not matter , map_async faster map.

from multiprocessing import pool import numpy np import time  def caleigen(x):     # define work each worker     = np.random.randint(1,100,size=(1000,1000))        s, u = np.linalg.eigh(a)                             return s, u   if __name__ == "main":     start_time = time.time()     pool(processes=4) pool:      # start pool of 4 workers         result = pool.map_async(caleigen, range(100))   # distribute work workers         result = result.get()        # collect result mapresult object     end_time = time.time()     print("mutltiprocessing took:", end_time - start_time, "seconds" )      # run single process version comparison. has within if block well.      result = []     start_time = time.time()     in range(100):         = np.random.randint(1, 100, size=(1000,1000))  #generate random matrix         result.append(np.linalg.eigh(a))                 #calculate eigen values , append result list     end_time = time.time()     print("single process took :", end_time - start_time, "seconds") 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -