{"id":336,"date":"2017-01-25T05:35:40","date_gmt":"2017-01-24T21:35:40","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=336"},"modified":"2026-03-17T01:27:18","modified_gmt":"2026-03-16T17:27:18","slug":"parallel-tasks-in-python-concurrent-futures","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/parallel-tasks-in-python-concurrent-futures.html","title":{"rendered":"Simple Parallel Tasks in Python: concurrent.futures"},"content":{"rendered":"<p>TL;DR: <code>concurrent.futures<\/code> is well suited to Embarrassingly Parallel tasks. You could write concurrent code with a simple <code>for<\/code> loop.<\/p>\n<p><code>executor.map()<\/code> runs the same function multiple times with different parameters and <code>executor.submit()<\/code> accepts any function with arbitrary parameters.<\/p>\n<h2>Install<\/h2>\n<p><code>concurrent.futures<\/code> is part of the standard library in Python 3.2+. If you're using an older version of Python, you need to install the <code>futures<\/code> package.<\/p>\n<pre class=\"line-numbers\"><code class=\"language-bash\">$ pip install futures<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/docs.python.org\/3\/library\/concurrent.futures.html\">https:\/\/docs.python.org\/3\/library\/concurrent.futures.html<\/a><\/p>\n<h2><code>executor.map()<\/code><\/h2>\n<p>You should use the <code>ProcessPoolExecutor<\/code> for CPU intensive tasks and the <code>ThreadPoolExecutor<\/code> is suited for network operations or I\/O. The <code>ProcessPoolExecutor<\/code> uses the <code>multiprocessing<\/code> module, which is not affected by GIL (Global Interpreter Lock) but also means that only picklable objects can be executed and returned.<\/p>\n<p>In Python 3.5+, <code>executor.map()<\/code> receives an optional argument: <code>chunksize<\/code>. For very long iterables, using a large value for <code>chunksize<\/code> can significantly improve performance compared to the default size of <code>1<\/code>. With <code>ThreadPoolExecutor<\/code>, <code>chunksize<\/code> has no effect.<\/p>\n<pre class=\"line-numbers\"><code class=\"language-python\">from concurrent.futures import ThreadPoolExecutor\nimport time\n\nimport requests\n\ndef fetch(a):\n    url = 'http:\/\/httpbin.org\/get?a={0}'.format(a)\n    r = requests.get(url)\n    result = r.json()['args']\n    return result\n\nstart = time.time()\n\n# if max_workers is None or not given, it will default to the number of processors, multiplied by 5\nwith ThreadPoolExecutor(max_workers=None) as executor:\n    for result in executor.map(fetch, range(42)):\n        print('response: {0}'.format(result))\n\nprint('time: {0}'.format(time.time() - start))<\/code><\/pre>\n<p>You might want to change the value of <code>max_workers<\/code> to <code>1<\/code> and observe the difference.<\/p>\n<p>ref:<br \/>\n<a href=\"https:\/\/docs.python.org\/3\/library\/concurrent.futures.html#module-concurrent.futures\">https:\/\/docs.python.org\/3\/library\/concurrent.futures.html#module-concurrent.futures<\/a><br \/>\n<a href=\"https:\/\/www.blog.pythonlibrary.org\/2016\/08\/03\/python-3-concurrency-the-concurrent-futures-module\/\">https:\/\/www.blog.pythonlibrary.org\/2016\/08\/03\/python-3-concurrency-the-concurrent-futures-module\/<\/a><br \/>\n<a href=\"https:\/\/masnun.com\/2016\/03\/29\/python-a-quick-introduction-to-the-concurrent-futures-module.html\">https:\/\/masnun.com\/2016\/03\/29\/python-a-quick-introduction-to-the-concurrent-futures-module.html<\/a><br \/>\n<a href=\"https:\/\/dongwm.com\/post\/use-concurrent-futures\/\">https:\/\/dongwm.com\/post\/use-concurrent-futures\/<\/a><\/p>\n<h2><code>executor.submit()<\/code><\/h2>\n<p><code>executor.submit()<\/code> returns a <code>Future<\/code> object. A <code>Future<\/code> is basically an object that encapsulates an asynchronous execution of a function that will finish (or raise an exception) in the future.<\/p>\n<p>The main difference between <code>map<\/code> and <code>as_completed<\/code> is that <code>map<\/code> returns the results in the order in which you pass iterables. On the other hand, the first result from the <code>as_completed<\/code> function is from whichever future completed first. Besides, iterating a <code>map()<\/code> returns results of futures; iterating an <code>as_completed(futures)<\/code> returns futures themselves.<\/p>\n<pre class=\"line-numbers\"><code class=\"language-python\">from concurrent.futures import ThreadPoolExecutor, as_completed\nimport time\n\nimport requests\n\ndef fetch(url, timeout):\n    r = requests.get(url, timeout=timeout)\n    data = r.json()['args']\n    return data\n\nstart = time.time()\n\nwith ThreadPoolExecutor(max_workers=20) as executor:\n    futures = {}\n    for i in range(42):\n        url = 'https:\/\/httpbin.org\/get?i={0}'.format(i)\n        future = executor.submit(fetch, url, 60)\n        futures[future] = url\n\n    for future in as_completed(futures):\n        url = futures[future]\n        try:\n            data = future.result()\n        except Exception as exc:\n            print(exc)\n        else:\n            print('fetch {0}, get {1}'.format(url, data))\n\nprint('time: {0}'.format(time.time() - start))<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/docs.python.org\/3\/library\/concurrent.futures.html#future-objects\">https:\/\/docs.python.org\/3\/library\/concurrent.futures.html#future-objects<\/a><\/p>\n<h2>Discussion<\/h2>\n<p>ref:<br \/>\n<a href=\"https:\/\/news.ycombinator.com\/item?id=16737129\">https:\/\/news.ycombinator.com\/item?id=16737129<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>concurrent.futures is well suited to Embarrassingly Parallel tasks. You could write concurrent code with a simple for loop.<\/p>\n","protected":false},"author":1,"featured_media":337,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[69,49,2],"class_list":["post-336","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-python","tag-concurrency","tag-io","tag-python"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/336","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=336"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/336\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/337"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}