关于同步/异步 VS 阻塞/非阻塞的一点体会

长期以来,同步(synchronous),异步(asynchronous),阻塞(blocking)和非阻塞(non-blocking)这几个概念一直困扰着我。我以前一直简单地以为同步等于阻塞,异步等于非阻塞。直到最近读完UNIX Network Programming (Volume 1)的第6章之后,并查阅了大量网上的资料之后,我才对这个问题有了一个比较清楚的认识。

在讨论这四个概念的区别之前,我们首先要确定一下我们讨论的上下文(context),那就是Linux的network IO。对于一个网络IO来说(以read作为例子),其执行过程通常可以分为两个阶段。第一阶段,等待数据从网络中到达,并被拷贝到内核中某个缓冲区(Waiting for the data to be ready)。第二阶段,把数据从内核态的缓冲区拷贝到用户态的应用进程缓冲区来(Copying the data from the kernel to the process)。

1. 阻塞/非阻塞 IO
根据一个高票的知乎回答,阻塞和非阻塞关注的是程序在等待调用结果(消息,返回值)时的状态。阻塞调用是指调用结果返回之前,当前线程会被挂起。调用线程只有在得到结果之后才会返回。非阻塞调用指在不能立刻得到结果之前,该调用不会阻塞当前线程。

在Linux下,一个socket的文件描述符(file descriptor)默认就是阻塞模式的。在这种模式下,即便这个socket压根没有收到任何数据,我们的read调用也会一直阻塞在那里,无法返回,直到有数据到达为止。

如果我们把这个socket的文件描述符用fcntl设置为非阻塞的。在这种模式下,如果这个socket没有收到任何数据,我们的read调用会立刻返回一个错误。这个时候,我们的程序就知道目前没法从这个socket里读到数据了,索性去干点别的事情,过段时间再调用read。当一个应用进程对一个非阻塞的文件描述符循环调用read时,我们称之为轮询(polling)。

现在再让我们回想网络IO的两个阶段,阻塞和非阻塞主要区别其实是在第一阶段等待数据的时候但是在第二阶段,阻塞和非阻塞其实是没有区别的。程序必须等待内核把收到的数据复制到进程缓冲区来。换句话说,非阻塞也不是真的一点都不阻塞,只是在不能立刻得到结果的时候不会傻乎乎地等在那里而已。

2. 同步/异步 IO
对于这两个东西,POSIX其实是有官方定义的。
A synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes;
An asynchronous I/O operation does not cause the requesting process to be blocked;
根据这个定义,不管是blocking IO还是non-blocking IO,其实都是synchronous IO。因为它们一定都会阻塞在第二阶段拷贝数据那里。

3. IO复用
这时候有些人会问了,IO复用(multiplexing)算是什么类型的IO呀。不同于这篇IBM的文章我个人认为,IO复用是阻塞同步IO

跟传统的阻塞IO不同,IO复用可以阻塞在多个socket文件描述符上。当其中任何一个socket有数据可读的时候(或者超时),IO服用的函数(select, poll,epoll)才会返回。然后进程可以逐一处理可读的socket文件描述符。

4. 真正的异步IO
对此我只知道Linux AIO。可惜本人才疏学浅,并没有Linux AIO的实际开发经验,在此就不详细介绍了,有兴趣的读者可以自己去尝试。

参考资料:
https://www.zhihu.com/question/19732473/answer/20851256
http://lifeofzjs.com/blog/2014/03/29/sycron-vs-block/
http://blog.csdn.net/historyasamirror/article/details/5778378

Advertisements

How to set processor affinity on Linux using taskset

Today’s computers typically adopt multiple CPU cores. A process/thread can be executed on any of those CPU cores (determined by OS scheduling). Hence, performance optimization in such multi-core architecture is crucial.

Processor affinity, or CPU pinning is an important technique for above purpose. It enables the binding and unbinding of a process or a thread to a CPU or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU.

Processor affinity takes advantage of the fact that remnants of a process that was run on a given processor may remain in that processor’s state (for example, data in the cache memory) after another process was run on that processor. Therefore, it can effectively reduce cache miss problems. Also, when two processes communicate via shared memory intensively, scheduling both processes on the cores in the same NUMA domain would speed up their performance.

Now, let’s see how to set processor affinity on Linux. To this end, there are several approaches. To set the CPU affinity of a process, you can use taskset program and sched_setaffinity system call. To set the CPU affinity of a thread, you can use pthread_setaffinity_np and pthread_attr_setaffinity_np. In this article, I want to introduce the usage of taskset.

taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity.

Read the CPU Affinity of a Running Process

To retrieve the CPU affinity of a process, you can use the following command.
taskset -p [PID]

For example, to check the CPU affinity of a process with PID 1141.
$ taskset -p 1141
pid 1141's current affinity mask: ffffffff

The return value ffffffff is essentially a hexadecimal bitmask, corresponding to 1111 1111 1111 1111 1111 1111 1111 1111. Each bit in the bitmask corresponds to a CPU core. The bit value 1 means that the process can be executed on the corresponding CPU core. Therefore, in above example, pid 1141 can be executed on CPU core 0-31.

You may think that bitmask is a little hard to understand. Don’t worry. taskset can also show CPU affinity as a list of processors instead of a bitmask using “-c” option. An example is given as follows.
$ taskset -cp 1141
pid 1141's current affinity list: 0-31

Pin a Running Process to Particular CPU Core(s)

You can also use taskset to pin a running process to particular CPU core(s). The command formats are given as follows.
taskset -p [CORE-LIST] [PID]
taskset -cp [CORE-LIST] [PID]

For example, to assign process 1141 to CPU core 0 and 4:
$ taskset -p 0x11 1141

Launch a Program on Specific CPU Cores

tasklet also allows us to launch a program running on specific CPU cores. The command is given as follows.
taskset [COREMASK] [EXECUTABLE]

For example, to launch a ping program (destination 8.8.8.8) on a CPU core 0, use the following command.
$ taskset 0x1 ping 8.8.8.8

References
https://en.wikipedia.org/wiki/Processor_affinity
http://xmodulo.com/run-program-process-specific-cpu-cores-linux.html
https://linux.die.net/man/1/taskset

How many physical and logical CPU cores in your computer

Physical cores are just that, physical cores within the CPU.

Logical cores are the abilities of a single core to do 2 or more things simultaneously (through the use of hyper-threading)

We can get the number of physical and logical CPU cores using lscpu command in Linux as follows.

$ lscpu
Architecture:           x86_64
CPU op-mode(s):         32-bit, 64-bit
Byte Order:             Little Endian
CPU(s):                 32
On-line CPU(s) list:    0-31
Thread(s) per core:     2
Core(s) per socket:     8
Socket(s):              2

In the above example, the computer has 2 CPU sockets.
Socket(s):              2

Each CPU socket has 8 physical cores. Hence, the computer has 16 physical cores in total.
Core(s) per socket:     8

Each physical CPU core can run 2 threads.
Thread(s) per core:     2

These threads are the core’s logical capabilities. The total number of logical cores = CPU sockets × physical cores per socket × threads per physical core. Therefore, the computer has 2 × 8 × 2 = 32 logical cores in total.
CPU(s):                 32

References
https://unix.stackexchange.com/questions/88283/so-what-are-logical-cpu-cores-as-opposed-to-physical-cpu-cores
https://en.wikipedia.org/wiki/Hyper-threading

如何写程序的Usage

在开发程序的过程中,给程序写usage是非常重要的一步,否则用户拿到程序的时候根本不知道如何使用。以前我写usage都是按照自己的性子随便写的,直到今天在网上读到这篇文章,我才发现自己以前写usage的很多习惯是完全错误的。以下是我总结的一些写程序usage的要点。

1. 每个usage里面的command line argument必须是一个没有空格的单独单词。如果我们的argument包含了多个单词,那么就用下划线把它们连接在一起组成一个单词。比如,我们不能写number of generations,而是要写成number_of_generations。

2. 不能用圆括号(),方括号[],尖括号,波浪括号{}和引号””将argument包括起来,比如说,我们不能把argument写成[number_of_generations],和”number_of_generations”。这是因为方括号和尖括号有特殊含义,而另外的符号很丑。

3. argument之间不需要用逗号来分隔,用一个空格就足够了。比如说,我们不能写usage: myprog input_file, output_file

4. argument里面不能使用symbolic characters(符号字符),比如说,我们不能把number_of_generations写成#generations。

5. 如果想解释某个argument的含义,需要在第一行下面另起新的行来说明。下面是一个例子。
usage: myprog infile outfile
      infile: the input file
      outfile: the output file

6. 对于可选(optional)参数,需要用方括号[]把参数括起来。下面是两个例子。
usage: chess [-strength r]
      -strength r: playing strength in approximate rating (800-3000)

usage: average n1 [n2 …]
      n1, n2, etc.: numbers between 1 and 10
      Maximum length of the list: 100

7. 对于程序名,不应该在代码中hard-code,应该用argv[0]这种比较灵活的方法。

8. usage应该被输出到stderr而非stdout

References
http://courses.cms.caltech.edu/cs11/material/general/usage.html

Populate a file with random data in Linux

We use /dev/random or /dev/urandom to generate pseudorandom numbers. /dev/random is blocking while /dev/urandom is not. To control the file size, we use head -c N command to get the first N bytes worth of data.

To write N bytes worth of random data into myfile, we can use the following command:
head -c N </dev/urandom > myfile

Note that we can also use K, M, G suffix for N.

References
https://en.wikipedia.org/wiki//dev/random
https://unix.stackexchange.com/questions/33629/how-can-i-populate-a-file-with-random-data