python - Multiprocessing for calculating eigen value -
i'm generating 100 random int matrices of size 1000x1000
. i'm using multiprocessing module calculate eigen values of 100 matrices.
the code given below:
import timeit import numpy np import multiprocessing mp def caleigen(): s, u = np.linalg.eigh(a) def multiprocess(processes): pool = mp.pool(processes=processes) #start timing here don't want include time taken initialize processes start = timeit.default_timer() results = [pool.apply_async(caleigen, args=())] stop = timeit.default_timer() print (processes":", stop - start) results = [p.get() p in results] results.sort() # sort results if __name__ == "__main__": global a=[] in range(0,100): a.append(np.random.randint(1,100,size=(1000,1000))) #print execution time without multiprocessing start = timeit.default_timer() caleigen() stop = timeit.default_timer() print stop - start #with 1 process multiprocess(1) #with 2 processes multiprocess(2) #with 3 processes multiprocess(3) #with 4 processes multiprocess(4)
the output
0.510247945786 ('process:', 1, 5.1021575927734375e-05) ('process:', 2, 5.698204040527344e-05) ('process:', 3, 8.320808410644531e-05) ('process:', 4, 7.200241088867188e-05)
another iteration showed output:
69.7296020985 ('process:', 1, 0.0009050369262695312) ('process:', 2, 0.023727893829345703) ('process:', 3, 0.0003509521484375) ('process:', 4, 0.057518959045410156)
my questions these:
- why doesn't time execution time reduce number of processes increase? using multiprocessing module correctly?
- am calculating execution time correctly?
i have edited code given in comments below. want serial , multiprocessing functions find eigen values same list of 100 matrices. edited code is-
import numpy np import time multiprocessing import pool a=[] in range(0,100): a.append(np.random.randint(1,100,size=(1000,1000))) def serial(z): result = [] start_time = time.time() in range(0,100): result.append(np.linalg.eigh(z[i])) #calculate eigen values , append result list end_time = time.time() print("single process took :", end_time - start_time, "seconds") def caleigen(c): result = [] result.append(np.linalg.eigh(c)) #calculate eigenvalues , append result list return result def mp(x): start_time = time.time() pool(processes=x) pool: # start pool of 4 workers result = pool.map_async(caleigen,a) # distribute work workers result = result.get() # collect result mapresult object end_time = time.time() print("mutltiprocessing took:", end_time - start_time, "seconds" ) if __name__ == "__main__": serial(a) mp(1,a) mp(2,a) mp(3,a) mp(4,a)
there no reduction in time number of processes increases. going wrong? multiprocessing divide list chunks processes or have division?
you're not using multiprocessing module correctly. @dopstar pointed out, you're not dividing task. there 1 task process pool, not matter how many workers assigned, 1 job. second question, didn't use timeit
measure process time precisely. use time
module crude sense of how fast things are. serves purpose of time, though. if understand you're trying correctly, should single process version of code
import numpy np import time result = [] start_time = time.time() in range(100): = np.random.randint(1, 100, size=(1000,1000)) #generate random matrix result.append(np.linalg.eigh(a)) #calculate eigen values , append result list end_time = time.time() print("single process took :", end_time - start_time, "seconds")
the single process version took 15.27 seconds on computer. below multiprocess version, took 0.46 seconds on computer. included single process version comparison. (the single process version has enclosed in if
block , placed after multiprocess version.) because repeat calculation 100 times, it'd lot easier create pool of workers , let them take on unfinished task automatically manually start each process , specify each process should do. here in codes, argument caleigen
call merely keep track of how many times task has been executed. finally, map_async
faster apply_async
, downside being consuming more memory , taking 1 argument function call. reason using map_async
not map
in case, order in result returned not matter , map_async
faster map
.
from multiprocessing import pool import numpy np import time def caleigen(x): # define work each worker = np.random.randint(1,100,size=(1000,1000)) s, u = np.linalg.eigh(a) return s, u if __name__ == "main": start_time = time.time() pool(processes=4) pool: # start pool of 4 workers result = pool.map_async(caleigen, range(100)) # distribute work workers result = result.get() # collect result mapresult object end_time = time.time() print("mutltiprocessing took:", end_time - start_time, "seconds" ) # run single process version comparison. has within if block well. result = [] start_time = time.time() in range(100): = np.random.randint(1, 100, size=(1000,1000)) #generate random matrix result.append(np.linalg.eigh(a)) #calculate eigen values , append result list end_time = time.time() print("single process took :", end_time - start_time, "seconds")
Comments
Post a Comment