python - multiprocessing module and distinct psycopg2 connections -
i puzzled behavior of multiprocessing code using psycopg2 make queries in parallel postgres db.
essentially, making same query (with different params) various partitions of larger table. using multiprocessing.pool fork off separate query.
my multiprocessing call looks this:
pool = pool(processes=num_procs) results=pool.map(run_sql, params_list)
my run_sql code looks this:
def run_sql(zip2): conn = get_connection() curs = conn.cursor() print "conn: %s curs:%s pid=%s" % (id(conn), id(curs), os.getpid()) ... curs.execute(qry) records = curs.fetchall() def get_connection() ... conn = psycopg2.connect(user=db_user, host=db_host, dbname=db_name, password=db_pwd) return conn
so expectation each process separate db connection via call get_connection()
, print id(conn)
display distinct value. however, doesn't seem case , @ loss explain it. print id(curs)
same. print os.getpid()
shows difference. somehow use same connection each forked process ?
conn: 4614554592 curs:4605160432 pid=46802 conn: 4614554592 curs:4605160432 pid=46808 conn: 4614554592 curs:4605160432 pid=46810 conn: 4614554592 curs:4605160432 pid=46784 conn: 4614554592 curs:4605160432 pid=46811
i think i've figured out. answer lies in fact multiprocessing in python shared-nothing entire memory space copied, functions , all. hence each process, though pid different, memory spaces copies of each other , address of connection within memory space ends being same. same reason why declaring global connection pool did useless, each process ended own connection pool 1 connection active @ time.
Comments
Post a Comment