(翻譯 gafferongames) Packet Fragmentation and Reassembly 數(shù)據(jù)包分片與重組
https://gafferongames.com/post/packet_fragmentation_and_reassembly/
Now we are ready to start putting interesting things in our packets and sending them over the network, but immediately we run into an interesting question: how big should our packets be?
To answer this question properly we need a bit of background about how packets are actually sent over the Internet.
現(xiàn)在我們已經(jīng)準(zhǔn)備好在數(shù)據(jù)包中放入有趣的內(nèi)容并通過網(wǎng)絡(luò)發(fā)送它們,但我們立刻遇到了一個(gè)有趣的問題:我們的數(shù)據(jù)包應(yīng)該有多大?
要正確回答這個(gè)問題,我們需要先了解一些關(guān)于數(shù)據(jù)包在互聯(lián)網(wǎng)上是如何實(shí)際發(fā)送的背景知識(shí)。
Background
Perhaps the most important thing to understand about the internet is that there’s no direct connection between the source and destination IP address. What actually happens is that packets hop from one computer to another to reach their destination.
Each computer along this route enforces a maximum packet size called the maximum transmission unit, or MTU. According to the IP standard, if any computer recieves a packet larger than its MTU, it has the option of a) fragmenting that packet, or b) dropping the packet.
So here’s how this usually goes down. People write a multiplayer game where the average packet size is quite small, lets say a few hundred bytes, but every now and then when a lot of stuff is happening in their game and a burst of packet loss occurs, packets get a lot larger than usual, going above MTU for the route, and suddenly all packets start getting dropped!
Just last year (2015) I was talking with Alex Austin at Indiecade about networking in his game Sub Rosa. He had this strange networking bug he couldn’t reproduce. For some reason, players would randomly get disconnected from the game, but only when a bunch of stuff was going on. It was extremely rare and he was unable to reproduce it. Alex told me looking at the logs it seemed like packets just stopped getting through.
This sounded exactly like an MTU issue to me, and sure enough, when Alex limited his maximum packet size to a reasonable value the bug went away.
也許關(guān)于互聯(lián)網(wǎng)最重要的一點(diǎn)是:源 IP 地址與目標(biāo) IP 地址之間沒有直接連接。實(shí)際上,數(shù)據(jù)包是通過一臺(tái)接一臺(tái)的計(jì)算機(jī)跳躍(hop)傳輸,最終到達(dá)目的地。
這些路徑中的每臺(tái)計(jì)算機(jī)都會(huì)強(qiáng)制執(zhí)行一個(gè)最大數(shù)據(jù)包大小的限制,叫做 最大傳輸單元(MTU, Maximum Transmission Unit)。
根據(jù) IP 協(xié)議的標(biāo)準(zhǔn),如果某臺(tái)計(jì)算機(jī)接收到一個(gè)超過其 MTU 的數(shù)據(jù)包,它可以選擇:
a) 對(duì)該數(shù)據(jù)包進(jìn)行分片(fragmentation),或者
b) 直接**丟棄(drop)**該數(shù)據(jù)包。
這通常是如何出問題的:開發(fā)者寫了一個(gè)多人游戲,平時(shí)的數(shù)據(jù)包都很小,比如幾百字節(jié)。但當(dāng)游戲中發(fā)生大量事件,尤其是在突發(fā)丟包的情況下,數(shù)據(jù)包突然變得比平時(shí)大,超過了傳輸路徑上的 MTU,結(jié)果所有的數(shù)據(jù)包都開始被丟棄!
就在去年(2015 年),我在 Indiecade 上和 Alex Austin 聊他開發(fā)的游戲《Sub Rosa》的網(wǎng)絡(luò)問題。他遇到了一個(gè)奇怪的 bug,無法復(fù)現(xiàn):有時(shí)候,玩家會(huì)隨機(jī)斷線,而且只會(huì)在游戲中發(fā)生大量事件的時(shí)候才出現(xiàn)。這個(gè)問題極其罕見,而且 Alex 無法定位原因。他告訴我,從日志上看,好像是數(shù)據(jù)包突然完全發(fā)不過去了。
這聽起來完全就是一個(gè) MTU 問題,果不其然,當(dāng) Alex 把最大數(shù)據(jù)包大小限制在一個(gè)合理的范圍內(nèi)后,這個(gè) bug 就徹底消失了。
MTU in the real world
So what’s a reasonable maximum packet size?
On the Internet today (2016, IPv4) the real-world MTU is 1500 bytes.
Give or take a few bytes for UDP/IP packet header and you’ll find that the typical number before packets start to get dropped or fragmented is somewhere around 1472.
那么,什么才算是一個(gè)合理的最大數(shù)據(jù)包大小呢?
在當(dāng)今的互聯(lián)網(wǎng)環(huán)境中(2016 年,IPv4),現(xiàn)實(shí)中的 MTU(最大傳輸單元)通常是 1500 字節(jié)。
考慮到 UDP/IP 協(xié)議頭會(huì)占用一些字節(jié),一般來說,在大約 1472 字節(jié)左右的數(shù)據(jù)包大小以上,就有可能發(fā)生丟包或分片。
Why 1500? That’s the default MTU for MacOS X. It’s also the default MTU on Windows. So now we have an upper bound for your packet size assuming you actually care about packets getting through to Windows and Mac boxes without IP level fragmentation or a chance of being dropped: 1472 bytes.
為什么是 1500?因?yàn)?1500 字節(jié)是 macOS X 的默認(rèn) MTU,同時(shí)也是 Windows 的默認(rèn) MTU。
所以,如果你希望你的數(shù)據(jù)包在 Windows 和 Mac 設(shè)備之間傳輸時(shí)不會(huì)被 IP 層分片或丟棄,那么你就得遵守這個(gè)限制。
也就是說,你的數(shù)據(jù)包大小上限應(yīng)該設(shè)為 1472 字節(jié)(扣除 UDP/IP 協(xié)議頭后的有效載荷大?。?。
So what’s the lower bound? Unfortunately for the routers in between your computer and the destination the IPv4 standard says 576. Does this mean we have to limit our packets to 400 bytes or less? In practice, not really.
MacOS X lets me set MTU values in range 1280 to 1500 so considering packet header overhead, my first guess for a conservative lower bound on the IPv4 Internet today would be 1200 bytes. Moving forward, in IPv6 this is also a good value, as any packet of 1280 bytes or less is guaranteed to get passed on without IP level fragmentation.
This lines up with numbers that I’ve seen throughout my career. In my experience games rarely try anything complicated like attempting to discover path MTU, they just assume a reasonably conservative MTU and roll with that, something like 1000 to 1200 bytes of payload data. If a packet larger than this needs to be sent, it’s split up into fragments by the game protocol and re-assembled on the other side.
And that’s exactly what I’m going to show you how to do in this article.
那下限是多少?
不幸的是,對(duì)于你電腦和目標(biāo)之間的路由器來說,IPv4 標(biāo)準(zhǔn)要求最小 MTU 是 576 字節(jié)。這是否意味著我們的數(shù)據(jù)包必須限制在 400 字節(jié)或更少?實(shí)際上,并不需要那么保守。
在 macOS X 上,我可以設(shè)置的 MTU 范圍是 1280 到 1500 字節(jié),所以考慮到協(xié)議頭的開銷,我對(duì) 現(xiàn)代 IPv4 網(wǎng)絡(luò)上保守下限的估計(jì)是 1200 字節(jié)左右。
向前看,在 IPv6 中,這個(gè)數(shù)值同樣適用——因?yàn)?任何小于等于 1280 字節(jié)的包都能保證在不被 IP 層分片的情況下傳輸。
這個(gè)估值也符合我職業(yè)生涯中觀察到的情況。以我的經(jīng)驗(yàn)來看,大多數(shù)游戲都不會(huì)搞什么復(fù)雜的路徑 MTU 發(fā)現(xiàn),而是直接假設(shè)一個(gè)相對(duì)保守的 MTU 值,比如 1000 到 1200 字節(jié)的有效負(fù)載數(shù)據(jù)。
如果需要發(fā)送更大的包,那就由游戲協(xié)議本身進(jìn)行分片,然后在接收端重組。
Fragment Packet Structure
Let’s get started with implementation.
The first thing we need to decide is how we’re going to represent fragment packets over the network so they are distinct from non-fragmented packets.
Ideally, we would like fragmented and non-fragmented packets to be compatible with the existing packet structure we’ve already built, with as little overhead as possible in the common case when we are sending packets smaller than MTU.
現(xiàn)在我們開始進(jìn)入實(shí)現(xiàn)部分。
首先要決定的是:我們?nèi)绾卧诰W(wǎng)絡(luò)上傳輸分片數(shù)據(jù)包,以便它們能和普通(未分片)數(shù)據(jù)包區(qū)分開來。
理想情況下,我們希望分片數(shù)據(jù)包與我們已有的數(shù)據(jù)包結(jié)構(gòu)兼容,并且在大多數(shù)情況下(即發(fā)送的包小于 MTU,不需要分片)時(shí)盡量減少額外開銷。也就是說,普通數(shù)據(jù)包應(yīng)該保持簡(jiǎn)單高效,而分片數(shù)據(jù)包則只在必要時(shí)增加額外信息。
Here’s the packet structure from the previous article:
[protocol id] (64 bits) // not actually sent, but used to calc crc32 [crc32] (32 bits) [packet type] (2 bits for 3 distinct packet types) (variable length packet data according to packet type) [end of packet serialize check] (32 bits)
In our protocol we have three packet types: A, B and C.
Let’s make one of these packet types generate really large packets:
static const int MaxItems = 4096 * 4; struct TestPacketB : public Packet { int numItems; int items[MaxItems]; TestPacketB() : Packet( TEST_PACKET_B ) { numItems = random_int( 0, MaxItems ); for ( int i = 0; i < numItems; ++i ) items[i] = random_int( -100, +100 ); } template <typename Stream> bool Serialize( Stream & stream ) { serialize_int( stream, numItems, 0, MaxItems ); for ( int i = 0; i < numItems; ++i ) { serialize_int( stream, items[i], -100, +100 ); } return true; } };
This may seem somewhat contrived but these situations really do occur. For example, if you have a strategy where you send all un-acked events from server to client and you hit a burst of packet loss, you can easily end up with packets larger than MTU, even though your average packet size is quite small.
Another common case is delta encoded snapshots in a first person shooter. Here packet size is proportional to the amount of state changed between the baseline and current snapshots for each client. If there are a lot of differences between the snapshots the delta packet is large and there’s nothing you can do about it except break it up into fragments and re-assemble them on the other side.
Getting back to packet structure. It’s fairly common to add a sequence number at the header of each packet. This is just a packet number that increases with each packet sent. I like to use 16 bits for sequence numbers even though they wrap around in about 15 minutes @ 60 packets-per-second, because it’s extremely unlikely that a packet will be delivered 15 minutes late.
這看起來可能有些“人為制造”的復(fù)雜情況,但這種情況在現(xiàn)實(shí)中確實(shí)會(huì)發(fā)生。
比如說,如果你的服務(wù)器向客戶端發(fā)送所有尚未確認(rèn)(un-acked)的事件,而恰好遇到一波突發(fā)的數(shù)據(jù)包丟失,那么就可能在一次傳輸中堆積大量事件,導(dǎo)致單個(gè)數(shù)據(jù)包大小超過 MTU。哪怕平時(shí)你的平均數(shù)據(jù)包非常小,這種情況也可能突然出現(xiàn)。
另一個(gè)常見的例子是:第一人稱射擊游戲中的“增量編碼快照(delta encoded snapshots)”。在這種機(jī)制中,數(shù)據(jù)包大小取決于客戶端當(dāng)前快照與基準(zhǔn)快照之間的狀態(tài)差異。
如果兩次快照差異很大,增量數(shù)據(jù)包就會(huì)變得很大,這時(shí)候你別無選擇,只能把數(shù)據(jù)包分片,然后在客戶端重新組裝(re-assemble)。
回到數(shù)據(jù)包結(jié)構(gòu),在設(shè)計(jì)數(shù)據(jù)包結(jié)構(gòu)時(shí),一個(gè)非常常見的做法是:在每個(gè)數(shù)據(jù)包頭部加一個(gè)序列號(hào)(sequence number)。
這個(gè)序列號(hào)只是一個(gè)簡(jiǎn)單的遞增數(shù)字,每發(fā)送一個(gè)數(shù)據(jù)包就加 1。
我個(gè)人喜歡用 16 位的序列號(hào)。雖然這意味著在大約 15 分鐘內(nèi)(假設(shè)每秒發(fā)送 60 個(gè)數(shù)據(jù)包)就會(huì)循環(huán)一次,但這完全可以接受——因?yàn)?strong data-start="538" data-end="564">幾乎不可能有數(shù)據(jù)包延遲長達(dá) 15 分鐘才送達(dá)。
Sequence numbers are useful for a bunch of things like acks, reliability and detecting and discarding out of order packets. In our case, we’re going to use the sequence number to identify which packet a fragment belongs to:
序列號(hào)在很多方面都非常有用,比如用于確認(rèn)(ack)、可靠性(reliability),以及檢測(cè)和丟棄亂序的數(shù)據(jù)包。
在我們的情況下,我們將使用序列號(hào)來標(biāo)識(shí)每個(gè)分片屬于哪個(gè)原始數(shù)據(jù)包:
[protocol id] (64 bits) // not actually sent, but used to calc crc32 [crc32] (32 bits) [sequence] (16 bits) [packet type] (2 bits) (variable length packet data according to packet type) [end of packet serialize check] (32 bits)
Here’s the interesting part. Sure we could just add a bit is_fragment to the header, but then in the common case of non-fragmented packets you’re wasting one bit that is always set to zero.
What I do instead is add a special fragment packet type:
enum TestPacketTypes { PACKET_FRAGMENT = 0, // RESERVED TEST_PACKET_A, TEST_PACKET_B, TEST_PACKET_C, TEST_PACKET_NUM_TYPES };
And it just happens to be free because four packet types fit into 2 bits. Now when a packet is read, if the packet type is zero we know it’s a fragment packet, otherwise we run through the ordinary, non-fragmented read packet codepath.
Lets design what this fragment packet looks like. We’ll allow a maximum of 256 fragments per-packet and have a fragment size of 1024 bytes. This gives a maximum packet size of 256k that we can send through this system, which should be enough for anybody, but please don’t quote me on this.
With a small fixed size header, UDP header and IP header a fragment packet be well under the conservative MTU value of 1200. Plus, with 256 max fragments per-packet we can represent a fragment id in the range [0,255] and the total number of fragments per-packet [1,256] with 8 bits.
而且這幾乎是“免費(fèi)的”,因?yàn)樗姆N數(shù)據(jù)包類型剛好可以用 2 位來表示?,F(xiàn)在當(dāng)我們讀取一個(gè)數(shù)據(jù)包時(shí),如果數(shù)據(jù)包類型是 0,我們就知道它是一個(gè)分片數(shù)據(jù)包;否則,就走普通的非分片數(shù)據(jù)包讀取路徑。
現(xiàn)在來設(shè)計(jì)一下這個(gè)分片數(shù)據(jù)包的結(jié)構(gòu)。我們?cè)试S每個(gè)完整數(shù)據(jù)包最多有 256 個(gè)分片,并將每個(gè)分片的大小設(shè)為 1024 字節(jié)。這意味著我們可以通過該系統(tǒng)發(fā)送最大為 256KB 的數(shù)據(jù)包,這個(gè)大小對(duì)任何人來說應(yīng)該都足夠了(不過別引用我這句話)。
加上一個(gè)小的固定大小的頭部、UDP 頭和 IP 頭后,每個(gè)分片數(shù)據(jù)包的大小仍然可以保持在保守的 MTU 限值 1200 字節(jié)以內(nèi)。而且,由于每個(gè)完整數(shù)據(jù)包最多有 256 個(gè)分片,我們可以使用 8 位來表示分片 ID(范圍為 [0,255])以及每個(gè)數(shù)據(jù)包的總分片數(shù)(范圍為 [1,256])。
[protocol id] (32 bits) // not actually sent, but used to calc crc32 [crc32] (32 bits) [sequence] (16 bits) [packet type = 0] (2 bits) [fragment id] (8 bits) [num fragments] (8 bits) [pad zero bits to nearest byte index] <fragment data>
Notice that we pad bits up to the next byte before writing out the fragment data. Why do this? Two reasons: 1) it’s faster to copy fragment data into the packet via memcpy than bitpacking each byte, and 2) we can now save a small amount of bandwidth by inferring the fragment size by subtracting the start of the fragment data from the total size of the packet.
注意,在寫入分片數(shù)據(jù)之前,我們會(huì)將位填充到下一個(gè)字節(jié)。這么做有兩個(gè)原因:通過 memcpy 復(fù)制分片數(shù)據(jù)到數(shù)據(jù)包中比每個(gè)字節(jié)進(jìn)行位打包更快,因?yàn)?memcpy 是直接內(nèi)存復(fù)制,而位打包需要額外的處理步驟。我們可以通過將分片數(shù)據(jù)的起始位置從數(shù)據(jù)包的總大小中減去,來推算出分片的大小,從而節(jié)省一些帶寬。
Sending Packet Fragments
Sending packet fragments is easy. For any packet larger than conservative MTU, simply calculate how many 1024 byte fragments it needs to be split into, and send those fragment packets over the network. Fire and forget!
發(fā)送數(shù)據(jù)包分片很簡(jiǎn)單。對(duì)于任何大于保守 MTU 的數(shù)據(jù)包,只需要計(jì)算它需要拆分成多少個(gè) 1024 字節(jié)的分片,然后將這些分片數(shù)據(jù)包通過網(wǎng)絡(luò)發(fā)送出去。發(fā)射出去,忘掉它!
One consequence of this is that if any fragment of that packet is lost then the entire packet is lost. It follows that if you have packet loss then sending a 256k packet as 256 fragments is not a very good idea, because the probability of dropping a packet increases significantly as the number of fragments increases. Not quite linearly, but in an interesting way that you can read more about here.
In short, to calculate the probability of losing a packet, you must calculate the probability of all fragments being delivered successfully and subtract that from one, giving you the probability that at least one fragment was dropped.
這種方式的一個(gè)后果是:如果該數(shù)據(jù)包的任何一個(gè)分片丟失,那么整個(gè)數(shù)據(jù)包都會(huì)丟失。因此,如果發(fā)生了數(shù)據(jù)包丟失,將一個(gè) 256KB 的數(shù)據(jù)包拆分成 256 個(gè)分片并發(fā)送并不是一個(gè)很好的選擇,因?yàn)殡S著分片數(shù)量的增加,丟包的概率顯著增加。這個(gè)增加并不是完全線性的,但有一種有趣的方式可以理解,具體內(nèi)容可以在這里進(jìn)一步閱讀。
簡(jiǎn)而言之,要計(jì)算丟失數(shù)據(jù)包的概率,你必須首先計(jì)算所有分片都成功傳輸?shù)母怕?,然后?1 中減去這個(gè)值,得到至少有一個(gè)分片丟失的概率。
1 - probability_of_fragment_being_delivered ^ num_fragments
For example, if we send a non-fragmented packet over the network with 1% packet loss, there is naturally a 1/100 chance the packet will be dropped.
As the number of fragments increase, packet loss is amplified:
分片越多,整體包丟失概率越大
- Two fragments: 1 - (99/100) ^ 2 = 2%
- Ten fragments: 1 - (99/100) ^ 10 = 9.5%
- 100 fragments: 1 - (99/100) ^ 100 = 63.4%
- 256 fragments: 1 - (99/100) ^ 256 = 92.4%
So I recommend you take it easy with the number of fragments. It’s best to use this strategy only for packets in the 2-4 fragment range, and only for time critical data that doesn’t matter too much if it gets dropped. It’s definitely not a good idea to fire down a bunch of reliable-ordered events in a huge packet and rely on packet fragmentation and reassembly to save your ass.
Another typical use case for large packets is when a client initially joins a game. Here you usually want to send a large block of data down reliably to that client, for example, representing the initial state of the world for late join. Whatever you do, don’t send that block of data down using the fragmentation and re-assembly technique in this article.
Instead, check out the technique in next article which handles packet loss by resending fragments until they are all received.
因此,我建議你不要輕易增加分片的數(shù)量。最好只在 2-4 個(gè)分片范圍內(nèi)使用這種策略,并且只用于 時(shí)間敏感、即使丟包也不太重要的數(shù)據(jù)。如果你依賴于數(shù)據(jù)包分片和重組來處理大量可靠順序事件的傳輸,這絕對(duì)不是一個(gè)好主意。
另一個(gè)常見的大數(shù)據(jù)包使用場(chǎng)景是 客戶端首次加入游戲時(shí)。在這種情況下,你通常希望 可靠地將大量數(shù)據(jù)發(fā)送到客戶端,例如,表示 游戲世界的初始狀態(tài),以便 遲到的玩家能夠趕上。無論如何,千萬不要使用本文中的分片和重組技術(shù)來傳輸這些數(shù)據(jù)塊。
相反,建議查看下一篇文章中的技術(shù),該方法通過 重新發(fā)送分片,直到所有分片都成功接收,來處理數(shù)據(jù)包丟失問題。
Receiving Packet Fragments
It’s time to implement the code that receives and processed packet fragments. This is a bit tricky because we have to be particularly careful of somebody trying to attack us with malicious packets.
Here’s a list of all the ways I can think of to attack the protocol:
-
Try to send out of bound fragments ids trying to get you to crash memory. eg: send fragments [0,255] in a packet that has just two fragments.
-
Send packet n with some maximum fragment count of say 2, and then send more fragment packets belonging to the same packet n but with maximum fragments of 256 hoping that you didn’t realize I widened the maximum number of fragments in the packet after the first one you received, and you trash memory.
-
Send really large fragment packets with fragment data larger than 1k hoping to get you to trash memory as you try to copy that fragment data into the data structure, or blow memory budget trying to allocate fragments
-
Continually send fragments of maximum size (256/256 fragments) in hope that it I could make you allocate a bunch of memory and crash you out. Lets say you have a sliding window of 256 packets, 256 fragments per-packet max, and each fragment is 1k. That’s 64 mb per-client.
-
Can I fragment the heap with a bunch of funny sized fragment packets sent over and over? Perhaps the server shares a common allocator across clients and I can make allocations fail for other clients in the game because the heap becomes fragmented.
Aside from these concerns, implementation is reasonably straightforward: store received fragments somewhere and when all fragments arrive for a packet, reassemble them into the original packet and return that to the user.
接收數(shù)據(jù)包分片
現(xiàn)在是實(shí)現(xiàn)接收和處理數(shù)據(jù)包分片代碼的時(shí)候了。這有點(diǎn)棘手,因?yàn)槲覀冃枰貏e小心有人通過惡意數(shù)據(jù)包攻擊我們。
以下是我能想到的所有可能攻擊協(xié)議的方式:
-
嘗試發(fā)送越界的分片 ID,試圖讓你崩潰內(nèi)存。例如:發(fā)送一個(gè)數(shù)據(jù)包,包含 [0,255] 的分片,但實(shí)際這個(gè)數(shù)據(jù)包只有兩個(gè)分片。
-
發(fā)送一個(gè)數(shù)據(jù)包(n),其最大分片數(shù)為 2,然后再發(fā)送更多屬于同一個(gè)數(shù)據(jù)包 n 的分片,但這次最大分片數(shù)為 256,試圖讓你沒發(fā)現(xiàn)自己在收到第一個(gè)分片后,實(shí)際上修改了分片數(shù)上限,從而導(dǎo)致內(nèi)存損壞。
-
發(fā)送非常大的分片數(shù)據(jù)包,其中每個(gè)分片的大小超過了 1k,試圖讓你在將分片數(shù)據(jù)復(fù)制到數(shù)據(jù)結(jié)構(gòu)中時(shí)崩潰內(nèi)存,或者在試圖分配更多分片時(shí)超出內(nèi)存限制。
-
不斷發(fā)送最大大小的分片(每個(gè)數(shù)據(jù)包有 256 個(gè)分片),希望通過占用大量?jī)?nèi)存導(dǎo)致崩潰。假設(shè)你有一個(gè) 256 包的滑動(dòng)窗口,每個(gè)數(shù)據(jù)包最大 256 個(gè)分片,每個(gè)分片 1k,這就相當(dāng)于每個(gè)客戶端占用 64 MB 內(nèi)存。
-
能否通過反復(fù)發(fā)送不同大小的分片數(shù)據(jù)包來碎片化堆內(nèi)存?也許服務(wù)器與客戶端共享相同的內(nèi)存分配器,我可以通過讓堆內(nèi)存碎片化,從而使得其他客戶端的內(nèi)存分配失敗,導(dǎo)致游戲崩潰。
除了這些安全考慮之外,實(shí)現(xiàn)本身還是比較直接的:我們只需要將接收到的分片存儲(chǔ)在某個(gè)地方,當(dāng)所有分片都接收完畢后,將它們重新組裝成原始數(shù)據(jù)包,并返回給用戶。
Data Structure on Receiver Side
The first thing we need is some way to store fragments before they are reassembled. My favorite data structure is something I call a sequence buffer:
我們首先需要一種方法來存儲(chǔ)分片,直到它們被重新組裝。我最喜歡的數(shù)據(jù)結(jié)構(gòu)是我稱之為“序列緩沖區(qū)”(sequence buffer)。
const int MaxEntries = 256; struct SequenceBuffer { uint32_t sequence[MaxEntries]; Entry entries[MaxEntries]; };
Indexing into the arrays is performed with modulo arithmetic, giving us a fast O(1) lookup of entries by sequence number:
對(duì)數(shù)組的索引操作是通過模運(yùn)算進(jìn)行的,這使得我們可以根據(jù)序列號(hào)快速地以 O(1) 時(shí)間復(fù)雜度查找條目。
const int index = sequence % MaxEntries;
A sentinel value of 0xFFFFFFFF is used to represent empty entries. This value cannot possibly occur with 16 bit sequence numbers, thus providing us with a fast test to see if an entry exists for a given sequence number, without an additional branch to test if that entry exists.
This data structure is used as follows. When the first fragment of a new packet comes in, the sequence number is mapped to an entry in the sequence buffer. If an entry doesn’t exist, it’s added and the fragment data is stored in there, along with information about the fragment, eg. how many fragments there are, how many fragments have been received so far, and so on.
Each time a new fragment arrives, it looks up the entry by the packet sequence number. When an entry already exists, the fragment data is stored and number of fragments received is incremented. Eventually, once the number of fragments received matches the number of fragments in the packet, the packet is reassembled and delivered to the user.
Since it’s possible for old entries to stick around (potentially with allocated blocks), great care must be taken to clean up any stale entries when inserting new entries in the sequence buffer. These stale entries correspond to packets that didn’t receive all fragments.
And that’s basically it at a high level. For further details on this approach please refer to the example source code for this article.
一個(gè)哨兵值 0xFFFFFFFF 被用來表示空條目。由于 16 位序列號(hào)無法出現(xiàn)該值,這提供了一種快速的方式來檢查某個(gè)序列號(hào)的條目是否存在,而無需額外的分支來判斷該條目是否存在。
這個(gè)數(shù)據(jù)結(jié)構(gòu)的使用方式如下:當(dāng)一個(gè)新數(shù)據(jù)包的第一個(gè)分片到達(dá)時(shí),序列號(hào)會(huì)映射到序列緩沖區(qū)中的一個(gè)條目。如果該條目不存在,就會(huì)被添加,并且將分片數(shù)據(jù)存儲(chǔ)在該條目中,此外,還會(huì)存儲(chǔ)分片的信息,比如總共有多少個(gè)分片,已接收多少個(gè)分片,等等。
每次新分片到達(dá)時(shí),它會(huì)根據(jù)數(shù)據(jù)包的序列號(hào)查找該條目。如果條目已經(jīng)存在,分片數(shù)據(jù)會(huì)被存儲(chǔ),并且已接收的分片數(shù)量會(huì)遞增。最終,一旦接收到的分片數(shù)量與數(shù)據(jù)包中的總分片數(shù)匹配,數(shù)據(jù)包就會(huì)被重新組裝并交付給用戶。
由于舊的條目有可能仍然存在(可能包含已分配的內(nèi)存塊),在插入新條目時(shí)必須小心處理清理過期條目。這些過期條目對(duì)應(yīng)于那些沒有接收到所有分片的數(shù)據(jù)包。
總體來說,整個(gè)過程就是這樣。對(duì)于這種方法的進(jìn)一步細(xì)節(jié),請(qǐng)參考本文提供的示例源代碼。
Test Driven Development
One thing I’d like to close this article out on.
Writing a custom UDP network protocol is hard. It’s so hard that even though I’ve done this from scratch at least 10 times, each time I still manage to fuck it up in a new and exciting ways. You’d think eventually I’d learn, but this stuff is complicated. You can’t just write low-level netcode and expect it to just work.
You have to test it!
測(cè)試驅(qū)動(dòng)開發(fā)
在文章的結(jié)尾,我想強(qiáng)調(diào)一件事。
編寫一個(gè)自定義的 UDP 網(wǎng)絡(luò)協(xié)議非常困難。難到什么程度呢?就算我已經(jīng)從零開始寫過至少十次,每次還是能以一種新奇又令人抓狂的方式把它搞砸。你可能會(huì)想,我總該學(xué)乖了吧?但事實(shí)是,這玩意兒真的很復(fù)雜。你不能指望寫出一套底層網(wǎng)絡(luò)代碼,它就能“正常工作”。
你必須測(cè)試它!
My strategy when testing low-level netcode is as follows:
-
Code defensively. Assert everywhere. These asserts will fire and they’ll be important clues you need when something goes wrong.
-
Add functional tests and make sure stuff is working as you are writing it. Put your code through its paces at a basic level as you write it and make sure it’s working as you build it up. Think hard about the essential cases that need to be property handled and add tests that cover them.
-
But just adding a bunch of functional tests is not enough. There are of course cases you didn’t think of! Now you have to get really mean. I call this soak testing and I’ve never, not even once, have coded a network protocol that hasn’t subsequently had problems found in it by soak testing.
-
When soak testing just loop forever and just do a mix of random stuff that puts your system through its paces, eg. random length packets in this case with a huge amount of packet loss, out of order and duplicates through a packet simulator. Your soak test passes when it runs overnight and doesn’t hang or assert.
-
If you find anything wrong with soak testing. You may need to go back and add detailed logs to the soak test to work out how you got to the failure case. Once you know what’s going on, stop. Don’t fix it immediately and just run the soak test again.
-
Instead, add a unit test that reproduces that problem you are trying to fix, verify your test reproduces the problem, and that it problem goes away with your fix. Only after this, go back to the soak test and make sure they run overnight. This way the unit tests document the correct behavior of your system and can quickly be run in future to make sure you don’t break this thing moving forward when you make other changes.
-
Add a bunch of logs. High level errors, info asserts showing an overview of what is going on, but also low-level warnings and debug logs that show what went wrong after the fact. You’re going to need these logs to diagnose issues that don’t occur on your machine. Make sure the log level can be adjusted dynamically.
-
Implement network simulators and make sure code handles the worst possible network conditions imaginable. 99% packet loss, 10 seconds of latency and +/- several seconds of jitter. Again, you’ll be surprised how much this uncovers. Testing is the time where you want to uncover and fix issues with bad network conditions, not the night before your open beta.
-
Implement fuzz tests where appropriate to make sure your protocol doesn’t crash when processing random packets. Leave fuzz tests running overnight to feel confident that your code is reasonably secure against malicious packets and doesn’t crash.
-
Surprisingly, I’ve consistently found issues that only show up when I loop the set of unit tests over and over, perhaps these issues are caused by different random numbers in tests, especially with the network simulator being driven by random numbers. This is a great way to take a rare test that fails once every few days and make it fail every time. So before you congratulate yourself on your tests passing 100%, add a mode where your unit tests can be looped easily, to uncover such errors.
-
Test simultaneously on multiple platforms. I’ve never written a low-level library that worked first time on MacOS, Windows and Linux. There are always interesting compiler specific issues and crashes. Test on multiple platforms as you develop, otherwise it’s pretty painful fixing all these at the end.
-
Think about how people can attack the protocol. Implement code to defend against these attacks. Add functional tests that mimic these attacks and make sure that your code handles them correctly.
我在測(cè)試底層網(wǎng)絡(luò)代碼時(shí)的策略如下:
防御式編寫代碼。到處加斷言。這些斷言會(huì)觸發(fā),而且當(dāng)出問題時(shí),它們會(huì)成為你需要的重要線索。
添加功能性測(cè)試,并在編寫代碼時(shí)確保功能正常。邊寫邊在基本層面上驗(yàn)證你的代碼是否能運(yùn)行,并確保它在構(gòu)建過程中保持正常。認(rèn)真思考那些必須正確處理的關(guān)鍵情況,并為它們編寫覆蓋測(cè)試。
但僅僅添加一堆功能測(cè)試還不夠。一定有你沒想到的情況!這時(shí)你需要變得“狠一點(diǎn)”。我稱之為浸泡測(cè)試(soak testing),我從來沒有寫過一個(gè)網(wǎng)絡(luò)協(xié)議,之后沒有通過浸泡測(cè)試發(fā)現(xiàn)問題的。
進(jìn)行浸泡測(cè)試時(shí),就讓程序無限循環(huán),混合執(zhí)行各種隨機(jī)操作,對(duì)你的系統(tǒng)進(jìn)行壓力測(cè)試,例如在這種情況下,用一個(gè)數(shù)據(jù)包模擬器模擬大量丟包、亂序和重復(fù)的數(shù)據(jù)包,以及隨機(jī)長度的包。當(dāng)它可以通宵運(yùn)行且不會(huì)卡住或觸發(fā)斷言時(shí),說明你的浸泡測(cè)試通過了。
如果你在浸泡測(cè)試中發(fā)現(xiàn)了問題,你可能需要回去給測(cè)試添加詳細(xì)日志,以便分析是怎么導(dǎo)致這個(gè)錯(cuò)誤的。一旦你弄清楚發(fā)生了什么,先別修復(fù)。不要立刻修,而是再次運(yùn)行浸泡測(cè)試。
相反,先添加一個(gè)能重現(xiàn)你要修復(fù)問題的單元測(cè)試,確認(rèn)你的測(cè)試確實(shí)能復(fù)現(xiàn)該問題,并且在你修復(fù)之后問題會(huì)消失。只有在這之后,再去跑你的浸泡測(cè)試,確保它能通宵跑完。這樣,單元測(cè)試就記錄了系統(tǒng)的正確行為,未來你在做其他修改時(shí)可以快速運(yùn)行這些測(cè)試,確保沒有引入新問題。
加一堆日志。包括高層級(jí)的錯(cuò)誤信息、展示系統(tǒng)概況的斷言信息,也包括低層級(jí)的警告和調(diào)試日志,用于在事后定位問題。你會(huì)需要這些日志來診斷那些在你機(jī)器上無法重現(xiàn)的問題。確保日志等級(jí)可以動(dòng)態(tài)調(diào)整。
實(shí)現(xiàn)網(wǎng)絡(luò)模擬器,并確保代碼可以處理你能想象到的最糟糕的網(wǎng)絡(luò)條件。比如 99% 的丟包,10 秒的延遲,以及 ± 幾秒的抖動(dòng)。你會(huì)驚訝地發(fā)現(xiàn)這能揭示出許多問題。測(cè)試階段就是發(fā)現(xiàn)并修復(fù)這些網(wǎng)絡(luò)劣化帶來的問題的最好時(shí)機(jī),而不是在公開測(cè)試前一晚才修修補(bǔ)補(bǔ)。
在合適的地方實(shí)現(xiàn)模糊測(cè)試(fuzz test),以確保你的協(xié)議在處理隨機(jī)數(shù)據(jù)包時(shí)不會(huì)崩潰。把模糊測(cè)試放著通宵跑,以增強(qiáng)信心:你的代碼在面對(duì)惡意數(shù)據(jù)包時(shí)也不會(huì)崩。
出人意料地,我經(jīng)常發(fā)現(xiàn)一些問題,只有在把一組單元測(cè)試循環(huán)執(zhí)行多次時(shí)才會(huì)暴露出來,也許這是因?yàn)闇y(cè)試?yán)锸褂昧瞬煌碾S機(jī)數(shù),尤其是當(dāng)網(wǎng)絡(luò)模擬器是由隨機(jī)數(shù)驅(qū)動(dòng)的時(shí)候。這是一個(gè)很好的方法,可以讓那些每幾天才偶發(fā)一次的測(cè)試失敗變成每次都失敗。因此,在你為測(cè)試 100% 通過而慶祝之前,加入一個(gè)可以輕松循環(huán)執(zhí)行單元測(cè)試的模式,用來揭露這類錯(cuò)誤。
在多個(gè)平臺(tái)上同時(shí)測(cè)試。我從沒寫過一個(gè)底層庫,能第一次就跑在 macOS、Windows 和 Linux 上??倳?huì)有一些編譯器特有的問題和崩潰。開發(fā)過程中就要在多個(gè)平臺(tái)上測(cè)試,否則最后再統(tǒng)一修復(fù)會(huì)非常痛苦。
思考人們可能會(huì)如何攻擊你的協(xié)議。實(shí)現(xiàn)防御這些攻擊的代碼。添加模擬這些攻擊的功能測(cè)試,并確保你的代碼能夠正確處理它們。
This is my process and it seems to work pretty well. If you are writing a low-level network protocol, the rest of your game depends on this code working correctly. You need to be absolutely sure it works before you build on it, otherwise it’s basically a stack of cards.
In my experience, game neworking is hard enough without having suspicions that that your low-level network protocol has bugs that only show up under extreme network conditions. That’s exactly where you need to be able to trust your code works correctly. So test it!
這是我的流程,而且效果相當(dāng)不錯(cuò)。如果你正在編寫一個(gè)底層的網(wǎng)絡(luò)協(xié)議,那你游戲的其他部分都依賴于這段代碼能否正常工作。你必須絕對(duì)確定它是可靠的,再在此基礎(chǔ)上繼續(xù)構(gòu)建。否則,它就像是一座紙牌塔,隨時(shí)可能倒塌。
根據(jù)我的經(jīng)驗(yàn),游戲網(wǎng)絡(luò)開發(fā)本身就已經(jīng)夠難了,如果你還要擔(dān)心底層網(wǎng)絡(luò)協(xié)議在極端網(wǎng)絡(luò)條件下才會(huì)暴露出 bug,那就更麻煩了。而恰恰是在這種情況下,你必須能信賴你的代碼是穩(wěn)定的。
所以——測(cè)試它!

浙公網(wǎng)安備 33010602011771號(hào)