主要介绍同步、异步、协程的概念;在实践部分给出了code 样本和网络下载图像的异步程序。异步的使用场景:network-based or file I/O-based.

概念

what a synchronous program

A synchronous program is executed one step at a time. Even with conditional branching, loops and function calls, you can still think about the code in terms of taking one execution step at a time. When each step is complete, the program moves on to the next one.

synchronous 程序就是 step by step,常见的两种形式:batch processing program and command-line program

what an asynchronous program

Asynchronous programming, or async for short, is a feature of many modern languages that allows a program to juggle multiple operations without waiting or getting hung up on any one of them. It’s a smart way to efficiently handle tasks like network or file I/O, where most of the program’s time is spent waiting for a task to finish.

异步编程的主要应用场景:network or file I/O;以下给出具体的使用场景:

Some examples of tasks that work well with async:

  • Web scraping, as described above.
  • Network services (e.g., a web server or framework).
  • Programs that coordinate results from multiple sources that take a long time to return values (for instance, simultaneous database queries).

what is coroutine?(协程)

You also need a coroutine. What is a coroutine? A coroutine in python a function or method that can pause it’s execution and resume at a later point. Any task that needs to be run asynchronously needs to be a coroutine. You define a coroutine with async def. Coroutines are awaitable and can not be executed by simply calling the function.

异步在 python 中的实现就是 coroutine

协程(coroutine)不能实现执行效率的提高和异步的支持,只不过它可以让原本支持异步的写法变得更加好写。

python 中有不同版本,对于异步的支持也是不一样的。这里使用 3.6 作为说明。

例子

实现数字相加,使用同步进程的思路实现。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import time
def sleep():
    time.sleep(1)

def sum(name, numbers):
    total =0
    for number in numbers:
        sleep()
        total += number
    print('Task {}: Sum = {} \n'.format(name, total))

starttime =time.time()

tasks =[ sum('A', [1, 2]), sum('B',  [1, 2, 3])]

print("Time: {} sec".format(time.time() - starttime))

异步实现同样的思路

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import asyncio
import time

async def sleep():
    print(f'Time: {time.time() - start:.2f}')
    await asyncio.sleep(1)

async def sum(name, numbers):
    total = 0
    for number in numbers:
        print(f'Task {name}: Computing {total}+{number}')
        await sleep()
        total += number
    print(f'Task {name}: Sum = {total}\n')

start = time.time()

loop = asyncio.get_event_loop()
tasks = [
    loop.create_task(sum("A", [1, 2])),
    loop.create_task(sum("B", [1, 2, 3])),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

end = time.time()
print(f'Time: {end-start:.2f} sec')
# Time: 3.01 sec 

实际场景中的一个例子:基于异步实现的下载图像的功能。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import os
import ast
import requests
import pandas as pd
import asyncio
import time

download_dir ="bg_dir2"

async def asy_down(lst):
    if not os.path.exists(download_dir):
        os.makedirs(download_dir)

    for ll in lst:
        if not isinstance(ll,str): continue

        ll =ast.literal_eval(ll)
        for url in ll:
            imgname =url.split('/')[-1]
            image_suffix =imgname.split(".")[-1]
            if len(image_suffix) < 5 and (image_suffix[-1] == 'g' or image_suffix[-1] == 'G'):

                if not os.path.exists(os.path.join(download_dir, imgname)):
                    try:
                        await open(os.path.join(download_dir, imgname), 'wb').write(requests.get(url, allow_redirects =True).content)
                    except:
                        pass


start= time.time()

loop = asyncio.get_event_loop()
tasks =[ loop.create_task(asy_down(ids))]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

print("async down images time: {} seconds".format(time.time() - start))