Glusterfs之nfs模塊源碼分析（下）之NFS協議之RPC的實現和NFS協議內容

我的新浪微博：http://weibo.com/freshairbrucewoo。

歡迎大家相互交流，共同提高技術。

六、NFS協議之RPC的實現

因為nfs服務器啟動時的端口是不確定的，所以nfs服務器將自己的端口注冊到rpc服務，客戶端通過rpc請求知道nfs服務器的監聽端口。下面就分析整個rpc的處理過程。現在假設客戶端有一個rpc請求達到服務器端了，通過上面nfs協議初始化的分析知道：所有的數據讀寫事件都是在函數nfs_rpcsvc_conn_data_handler中處理，因為是客戶端發送來的請求數據，所以執行的是epoll_in事件處理相關代碼，這些事件的處理都是在函數nfs_rpcsvc_conn_data_poll_in中，這個函數實現如下：

 1 int nfs_rpcsvc_conn_data_poll_in (rpcsvc_conn_t *conn)
 2 
 3 {
 4 
 5 ssize_t         dataread = -1;
 6 
 7    size_t          readsize = 0;
 8 
 9    char            *readaddr = NULL;
10 
11   int             ret = -1;
12 
13 readaddr = nfs_rpcsvc_record_read_addr (&conn->rstate);//rpc服務記錄開始讀取數據的地址
14 
15 readsize = nfs_rpcsvc_record_read_size (&conn->rstate);//rpc服務記錄數據需要讀取的長度
16 
17    dataread = nfs_rpcsvc_socket_read (conn->sockfd, readaddr, readsize);//從socket中讀出記錄數據
18 
19   if (dataread > 0)
20 
21        ret = nfs_rpcsvc_record_update_state (conn, dataread);//根據讀取的數據處理
22 
23         return ret;
24 
25 }

上面代碼首先會根據rpc服務記錄中的接收數據類型來判斷接收什么數據，主要是分為頭部消息和正式的rpc消息，正式的rpc消息的長度是通過頭部消息中給出的，所以接收消息的步驟一般是先頭部消息，然后正式的rpc調用消息，否則就是視為錯誤的消息，然后根據消息的長度從socket中讀出消息到rpc服務記錄的結構體的成員變量中，最后交給函數nfs_rpcsvc_record_update_state處理，它根據讀取的數據來處理整個rpc的過程，包括xdr（外部數據表示）和根據消息獲取調用的函數并且執行函數，具體實現如下：

 1 int nfs_rpcsvc_record_update_state (rpcsvc_conn_t *conn, ssize_t dataread)
 2 
 3 {
 4 
 5 rpcsvc_record_state_t   *rs = NULL;
 6 
 7    rpcsvc_t                *svc = NULL;
 8 
 9 rs = &conn->rstate;
10 
11 if (nfs_rpcsvc_record_readfraghdr(rs))//根據rpc服務的記錄狀態是否讀取頭部消息
12 
13      dataread = nfs_rpcsvc_record_update_fraghdr (rs, dataread);//讀取消息頭部
14 
15 if (nfs_rpcsvc_record_readfrag(rs)) {//是否讀取后面的數據
16 
17     if ((dataread > 0) && (nfs_rpcsvc_record_vectored (rs))) {//是否讀取向量片段（
18 
19          dataread = nfs_rpcsvc_handle_vectored_frag (conn, dataread);//處理向量片段數據
20 
21        } else if (dataread > 0) {
22 
23        dataread = nfs_rpcsvc_record_update_frag (rs, dataread);//更新rpc服務記錄的片段數據
24 
25   }
26 
27  }
28 
29    if ((nfs_rpcsvc_record_readfraghdr(rs)) && (rs->islastfrag)) {//如果下一條消息是頭部消息且是最后一幀
30 
31      nfs_rpcsvc_handle_rpc_call (conn);//處理rpc調用
32 
33        svc = nfs_rpcsvc_conn_rpcsvc (conn);//鏈接對象引用加1
34 
35       nfs_rpcsvc_record_init (rs, svc->ctx->iobuf_pool);//重新初始化rpc服務記錄的狀態信息
36 
37  }
38 
39 return 0;
40 
41 }

整個函數首先讀取協議信息的頭部消息，讀取完頭部信息以后更新rpc服務記錄狀態，然后根據更新的狀態繼續讀取頭部信息后面的消息，后面的消息分為兩種情況來讀取，一般第一次來的是一個頭部消息，這個消息中記錄了下一次需要讀取的消息的長度，也就是正式的rpc調用信息的長度。所以當第二次消息響應來的時候就是正式消息，根據不同的消息有不同的處理方式。頭部消息處理方式主要是為接收正式的消息做一些初始化和準備工作（例如數據的長度和類型等）。如果頭部消息則不會執行處理rpc的調用函數，因為它必須要接收到rpc調用消息以后才能處理。下面繼續分析處理rpc調用的函數nfs_rpcsvc_handle_rpc_call，因為它是處理整個rpc調用的核心，它的實現如下：

 1 int nfs_rpcsvc_handle_rpc_call (rpcsvc_conn_t *conn)
 2 
 3 {
 4 
 5  rpcsvc_actor_t          *actor = NULL;
 6 
 7    rpcsvc_request_t        *req = NULL;
 8 
 9   int                     ret = -1;
10 
11 req = nfs_rpcsvc_request_create (conn);//動態創建一個rpc服務請求對象（結構體）
12 
13 if (!nfs_rpcsvc_request_accepted (req))//是否接受rpc服務請求
14 
15                 ;
16 
17 actor = nfs_rpcsvc_program_actor (req);//得到rpc服務調用過程的描述對象
18 
19  if ((actor) && (actor->actor)) {
20 
21      THIS = nfs_rpcsvc_request_actorxl (req);//得到請求的xlator鏈表
22 
23        nfs_rpcsvc_conn_ref (conn);//鏈接狀態對象的引用加1
24 
25        ret = actor->actor (req);//執行函數調用
26 
27   }
28 
29   return ret;
30 
31 }

這個函數首先根據鏈接狀態對象創建一個rpc服務請求的對象，然后根據rpc服務請求對象得到一個rpc服務調用過程的描述對象，最后就根據這個描述對象執行具體的某一個rpc遠程調用請求。下面在看看怎樣根據連接狀態對象創建rpc服務請求對象的，nfs_rpcsvc_request_create函數實現如下：

 1 rpcsvc_request_t * nfs_rpcsvc_request_create (rpcsvc_conn_t *conn)
 2 
 3 {
 4 
 5 char                    *msgbuf = NULL;
 6 
 7   struct rpc_msg          rpcmsg;
 8 
 9   struct iovec            progmsg;        /* RPC Program payload */
10 
11    rpcsvc_request_t        *req = NULL;
12 
13   int                     ret = -1;
14 
15    rpcsvc_program_t        *program = NULL;
16 
17  nfs_rpcsvc_alloc_request (conn, req);//從內存池中得到一個權限請求對象并且初始化為0
18 
19 msgbuf = iobuf_ptr (conn->rstate.activeiob);//從激活的IO緩存得到一個用于消息存放的緩存空間
20 
21 //從xdr數據格式轉換到rpc數據格式
22 
23   ret = nfs_xdr_to_rpc_call (msgbuf, conn->rstate.recordsize, &rpcmsg,
24 
25                                    &progmsg, req->cred.authdata, req->verf.authdata);
26 
27   nfs_rpcsvc_request_init (conn, &rpcmsg, progmsg, req);//根據上面轉換的消息初始化rpc服務請求對象
28 
29    if (nfs_rpc_call_rpcvers (&rpcmsg) != 2) {//rpc協議版本是否支持
30 
31      ;
32 
33   }
34 
35 ret = __nfs_rpcsvc_program_actor (req, &program);//根據程序版本號得到正確的rpc請求描述對象
36 
37   req->program = program;
38 
39   ret = nfs_rpcsvc_authenticate (req);//執行權限驗證函數調用驗證權限
40 
41   if (ret == RPCSVC_AUTH_REJECT) {//是否被權限拒絕
42 
43     ;
44 
45    }
46 
47   return req;
48 
49 }

通過上面的函數調用就得到了一個正確版本的rpc服務遠程調用程序的描述對象，后面會根據這個對象得到對應的遠程調用函數的描述對象，這個是通過下面這個函數實現的：

 1 rpcsvc_actor_t * nfs_rpcsvc_program_actor (rpcsvc_request_t *req)
 2 
 3 {
 4 
 5  int                     err = SYSTEM_ERR;
 6 
 7    rpcsvc_actor_t          *actor = NULL;
 8 
 9    actor = &req->program->actors[req->procnum];//根據函數id得到正確的函數調用對象
10 
11 return actor;
12 
13 }

這里得到的函數調用對象就會返回給調用程序，調用程序就會具體執行遠程過程調用了。到此一個完整的rpc調用以及一個nfs服務就完成了，nfs服務器就等待下一個請求，整個過程可謂一波三折，整個過程繞了很大一個圈。下面通過一個圖來完整描述整個過程：

附件1 NFS Protocol Family

NFS Protocol Family

The NFS protocol suite includes the following protocols:
MNTV1	Mount protocol version 1, for NFS version 2
Mntv3	Mount protocol version 3, for NFS version 3
NFS2	Sun Network File system version 2
NFS3	Sun Network File system version 3
NFSv4	Sun Network File system version 4
NLMv4	Network Lock Manager version 4
NSMv1	Network Status Monitor protocol

MNTV1：ftp://ftp.rfc-editor.org/in-notes/rfc1094.txt.
    The Mount protocol version 1 for NFS version 2 (MNTv1) is separate from, but related to, the NFS protocol. It provides operating system specific services to get the NFS off the ground -- looks up server path names, validates user identity, and checks access permissions. Clients use the Mount protocol to get the first file handle, which allows them entry into a remote filesystem.
The Mount protocol is kept separate from the NFS protocol to make it easy to plug in new access checking and validation methods without changing the NFS server protocol.
    Notice that the protocol definition implies stateful servers because the server maintains a list of client's mount requests. The Mount list information is not critical for the correct functioning of either the client or the server. It is intended for advisory use only, for example, to warn possible clients when a server is going down.
    Version one of the Mount protocol is used with version two of the NFS protocol. The only information communicated between these two protocols is the "fhandle" structure. The header structure is as follows:

8	7	6	5	4	3	2	1	Octets
Directory Path Length								1
								2
								3
								4
Directory Path Name								5-N

Directory Path Length：The directory path length.
Directory Path Name：The directory path name.

Mntv3：ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt.
    The supporting Mount protocol version 3 for NFS version 3 protocol performs the operating system-specific functions that allow clients to attach remote directory trees to a point within the local file system. The Mount process also allows the server to grant remote access privileges to a restricted set of clients via export control.
    The Lock Manager provides support for file locking when used in the NFS environment. The Network Lock Manager (NLM) protocol isolates the inherently stateful aspects of file locking into a separate protocol. A complete description of the above protocols and their implementation is to be found in [X/OpenNFS].
    The normative text is the description of the RPC procedures and arguments and results, which defines the over-the-wire protocol, and the semantics of those procedures. The material describing implementation practice aids the understanding of the protocol specification and describes some possible implementation issues and solutions. It is not possible to describe all implementations and the UNIX operating system implementation of the NFS version 3 protocol is most often used to provide examples. The structure of the protocol is as follows.

8	7	6	5	4	3	2	1	Octets
Directory Path Length								1
								2
								3
								4
Directory Path Name								5-N

Directory path length：The directory path length.
Directory Path Name：The directory path name

NFS2：ftp://ftp.rfc-editor.org/in-notes/rfc1094.txt.
The Sun Network File system (NFS version 2) protocol provides transparent remote access to shared files across networks. The NFS protocol is designed to be portable across different machines, operating systems, network architectures, and transport protocols. This portability is achieved through the use of Remote Procedure Call (RPC) primitives built on top of an eXternal Data Representation (XDR). Implementations already exist for a variety of machines, from personal computers to supercomputers.
The supporting Mount protocol allows the server to hand out remote access privileges to a restricted set of clients. It performs the operating system-specific functions that allow, for example, to attach remote directory trees to some local file systems. The protocol header is as follows:

8	7	6	5	4	3	2	1	Octets
File info/Directory info								1
.								.
.								.
.								N

File info/Directory info：The File info or directory info.

NFS3：ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt.
Version 3 of the NFS protocol addresses new requirements, for instance; the need to support larger files and file systems has prompted extensions to allow 64 bit file sizes and offsets. The revision enhances security by adding support for an access check to be done on the server. Performance modifications are of three types:

1 The number of over-the-wire packets for a given set of file operations is reduced by returning file attributes on every operation, thus decreasing the number of calls to get modified attributes.

2 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns.

3 Limitations on transfer sizes have been relaxed.

The ability to support multiple versions of a protocol in RPC will allow implementors of the NFS version 3 protocol to define clients and servers that provide backward compatibility with the existing installed base of NFS version 2 protocol implementations.
The extensions described here represent an evolution of the existing NFS protocol and most of the design features of the NFS protocol previsouly persist. The protocol header structure is as follows:

8	7	6	5	4	3	2	1	Octets
Object info/ File info/ Directory info Length								1
								2
								3
								4
Object info/ File info/ Directory info Name								5-N

Object info/ File info/ Directory info Length：The information length in octets
Object info/ File info/ Directory info Name：The information value (string).

NFSv4：ftp://ftp.rfc-editor.org/in-notes/rfc3010.txt
NFS (Network File System) version 4 is a distributed file system protocol based on NFS protocol versions 2 [RFC1094] and 3 [RFC1813]. Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the mount protocol. In addition, support for strong security (and its negotiation), compound operations, client caching, and internationalization have been added. Attention has also been applied to making NFS version 4 operate well in an Internet environment.
The goals of the NFS version 4 revision are as follows:

· Improved access and good performance on the Internet.

· Strong security with negotiation built into the protocol.

· Good cross-platform interoperability.

· Designed for protocol extensions.

    The general file system model used for the NFS version 4 protocol is the same as previous versions. The server file system is hierarchical with the regular files contained within being treated as opaque byte streams. In a slight departure, file and directory names are encoded with UTF-8 to deal with the basics of internationalization.
    A separate protocol to provide for the initial mapping between path name and filehandle is no longer required. Instead of using the older MOUNT protocol for this mapping, theserver provides a ROOT filehandle that represents the logical root or top of the file system tree provided by the server.
    The protocol header is as follows:

8	7	6	5	4	3	2	1	Octets
Tag Length								1-4
Tag (depends on Tag length)								5-N
Minor Version								N+1-N+4
Operation Argument								N+5-N+8

Tag Length：The length in bytes of the tag
Tag：Defined by the implementor
Minor Version：Each minor version number will correspond to an RFC. Minor version zero corresponds to NFSv4
Operation Argument：Operation to be executed by the protocol
Operaton Argument Values：The operation arg value, can be one of the following:

Value	Name
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38	ACCESS CLOSE COMMIT CREATE DELEGPURGE DELEGRETURN GETATTR GETFH LINK LOCK LOCKT LOCKU LOOKUP LOOKUPP NVERIFY OPEN OPENATTR OPEN_CONFIRM OPEN_DOWNGRADE PUTFH PUTPUBFH PUTROOTFH READ READDIR READLINK REMOVE RENAME RENEW RESTOREFH SAVEFH SECINFO SETATTR SETCLIENTID SETCLIENTID_CONFIRM VERIFY WRITE

NLMv4：ftp://ftp.rfc-editor.org/in-notes/rfc1813.txt.
Since the NFS versions 2 and 3 are stateless, an additional Network Lock Manager (NLM) protocol is required to support locking of NFS-mounted files. As a result of the changes in version 3 of the NFS protocol version 4 of the NLM protocol is required.
In this version 4, almost all the names in the NLM version 4 protocol have been changed to include a version number. The procedures in the NLM version 4 protocol are semantically the same as those in the NLM version 3 protocol. The only semantic difference is the addition of a NULL procedure that can be used to test for server responsiveness.
The structure of the NLMv4 heading is as follows:

8	7	6	5	4	3	2	1	Octet
Cookie Length								1
								2
								3
								4
Cookie								5-N

Cookie Length：The cookie length.
Cookie：The cookie string itself.

NSMv1：http://www.opengroup.org/onlinepubs/009629799/chap11.htm.
    The Network Status Monitor (NSM) protocol is related to, but separate from, the Network Lock Manager (NLM) protocol.The NLM uses the NSM (Network Status Monitor Protocol V1) to enable it to recover from crashes of either the client or server host. To do this, the NSM and NLM protocols on both the client and server hosts must cooperate.
    The NSM is a service that provides applications with information on the status of network hosts. Each NSM keeps track of its own "state" and notifies any interested party of a change in this state to any other NSM upon request. The state is merely a number which increases monotonically each time the state of the host changes; an even number indicates the host is down, while an odd number indicates the host is up.
    Applications register the network hosts they are interested in with the local NSM. If one of these hosts crashes, the NSM on the crashed host, after a reboot, will notify the NSM on the local host that the state changed. The local NSM can then, in turn, notify the interested application of this state change.
    The NSM is used heavily by the Network Lock Manager (NLM). The local NLM registers with the local NSM all server hosts on which the NLM has currently active locks. In parallel, the NLM on the remote (server) host registers all of its client hosts with its local NSM. If the server host crashes and reboots, the server NSM will inform the NSM on the client hosts of this event. The local NLM can then take steps to re-establish the locks when the server is rebooted. Low-end systems that do not run an NSM, due to memory or speed constraints, are restricted to using non-monitored locks.
The structure of the protocol is as follows:

8	7	6	5	4	3	2	1	Octet
Name Length								1
								2
								3
								4
Mon Name /Host Name								5-N

Name Length： The mon name or host name length.
Mon Name： The name of the host to be monitored by the NSM.
Host Name： The host name.

posted @ 2012-05-09 22:42 薔薇理想人生閱讀(3407) 評論(0) 收藏舉報

刷新頁面返回頂部

Glusterfs之nfs模塊源碼分析（下）之NFS協議之RPC的實現和NFS協議內容

六、NFS協議之RPC的實現

附件1 NFS Protocol Family

公告