This document was written in about 2009. Since then, client-server
communications options for the web have developed and may now be a viable
transport, though still with some tradeoffs and overhead vs. plain VIP.
VOS Infrastructure Protocol
VIP stands for VOS Infrastructure Protocol, and is a transport-layer protocol
similar to TCP, but specifically provides for low latency sending of small
messages while simultaneously also allowing for other types of data transfer
(such as large file download/upload) over the same logical connection. This
attempts to provide quick communication for wide-area interactive use, as well
as improving local network utilization in high-traffic situations. (Or what you
may be thinking of as a "bandwidth" problem, even when more stricly speaking,
simple bandwidth
capacity
is not the actual or only problem.)
VIP is not specific to VOS (and thus may be useful for other applications) but was developed because VOS demanded specific features from the transport layer that were not found in any other available protocol.
VIP was designed and implemented by Peter Amstutz
VIP is data (payload) format agnostic. VOS uses it by serializing VOS Messages (the contents of which are defined by object types, e.g. the [[A3DL]] object types for 3D graphical objects) (see Manual) or [[Compact_Object_Description]]s.
VIP is written in standard C++ and consists of approximately 2,000 lines of
code. It depends on VUtil from VOS (for reference counting
and a few other small utilitites), boost_threads, the STL and OS (Linux or
Windows) IP sockets implementation.
Problem statement
Why not use TCP?
- Head of line problem: large downloads takes time, data sent after the download is "stuck behind it" until the download finshes sending; unsuitable for interactive sessions
- Send every byte problem: once data has been committed to the OS to be sent, no way to revoke sending it (for example, because the unsent data becomes out of date and it would be more efficient to only send the most recent data)
- No priorities: can't ask TCP to send data out of order because there is some new data that should preempt the data already in the send buffer
- TCP doesn't care about latency: it will happily delay sending out
your packets (Nagle algorithm), delay sending ACKS (cumulative acks) and (by
design, to maximize average throughput) will happily fill up the buffers of
every router between you and the remote host (which are often needlessly large,
this is "bufferbloat"), which has the side effect of significantly
increasing the effective latency of each individual packet. This can especially
be bad over wireless networks, and last-mile residential connections
(cable/DSL).
- We did try it (the TCP VOS protocol is called VOP and is still supported in VOS through the LocalSocketSiteExtension) and encountered all these problems!
These problems may also apply to any connection that is routed through a network link that forces all data through a single serial stream, such as any TCP protol based tunnel (e.g. HTTP tunnel), a point-to-point radio or satellite link, a modem, etc.
TCP Workarounds
- Multiple TCP connections: seems obvious, but actually very hard to get right; requires a handshake to determine that connection X and Y are logically associated; each TCP connection competes with the others for bandwidth, no ability to prioritize
- TCP+UDP: similar to multiple TCP connections, but actually much worse, if one starts sending a significant amount of data on TCP it will run roughshod over the UDP packets and add a lot of latency
- TCP slicing: send a little bit of data for channel 1, a little bit for channel 2, etc (similar to processor timeslicing employed by an OS) -- still can't take back something you've already passed to the TCP stack to send; can't ask that new data be sent before old data; a packet drop in channel 1 still delays sending any data on channel 2
All of these approaches are still using TCP, with the associated problems as
described above.
Alternatives:
- SCTP (Stream Control Transmission Protocol): pro: supports multiple in-order streams, sits at IP level, IETF standard. cons: not yet widly deployed; because it's an IP level protocol it requires operating system support. Unlikely to be supported by firewalls and NAT anytime soon.
- BEEP: (Blocks Extensible Exchange Protocol): meta-protocol for message-oriented connections. Uses TCP slicing strategy discussed above (with attendent strengths and weaknesses).
- HTTP/XML-RPC/SOAP/etc: These are all based on a single (verbose) stream over TCP
Desired VIP Features:
- Have a single connection with multiple simultaneous prioritized channels
- Support different sending algorithms for different channels (with different performance
characteristics) for within the logical connection
- Be able to cancel messages that have been passed to the transport layer but not yet sent/acknowledged, and replace them with updated data
- Model the network capacity between the local and remote hosts, and schedule packet sending so as not to overwhelm network buffers
How does it work?
- Startup: three-way handshake. Initial sequence numbers for all channels are established.
- The connection is responsible for maintaining current RTT estimate, count of
in-flight data, congestion window
- The connection is also responsible for connection teardown
- The connection is alive as long as there is traffic; if there is no traffic
after 30 seconds, peer starts sending pings every 4 seconds until it gets a
reply or another 30 seconds has elapsed, in which case we haven't heard anything
from the peer for 60 seconds and the connection times out and ends.
- Each VIP packet consists of:
- a control method (synchronize, data, ping, reset)
- packet id
- most recently received packet id
- sequence of blocks
- packet ids are simply a serial number used to compute round trip time
- blocks correspond to different sub-protocols used in VIP
- the first byte of a block is a protocol identifier
- the rest of the block contains protocol-specific information and payload data which is passed to that protocol's handler
- a sub-protocol defines a particular packing and sending strategy
- referred to as "type of service" in the VIP API
- controls when outgoing packets are sent (or re-sent)
- maintains queue of data to be sent via this sub-protocol
- also determines what order to deliver newly received data to the application
- when constructing a new packet, connection asks each sub-protocol for data to send based on fixed sub-protocol priority, in-flight/congestion window sizes and special scheduling decisions
Low-Latency sub-protocol
- provides reliable, in-order delivery of small messages (255 bytes or less)
- single ordered channel
- designed to send small messages efficiently
- inspired from reading about the Quake 2 & Quake 3 protocol
- 255 byte limit allows for several efficiencies
- messages are guaranteed to be delivered whole, does not split messages across packet bounderies
- well suited for small amounts of rapidly changing data, such as updating
the position and orientation of a 3D game object
- for comparison, with binary packing a 4x4 homogeneous matrix of doubles would take 128 bytes, an XYZ vector of floats takes a mere 12 bytes, so 255 bytes upper limit is quite sufficient for the intended uses
- sends on a heartbeat; every N milliseconds (default 100ms) sends out all messages accumlated from the application since the last heartbeat, as well as any necessary resends
- permits application to cancel outgoing messages
- every message has unsigned 32 bit message id
- if a message is sent bearing the same id as a message which is in the sending queue (waiting to be sent or waiting to be acknowledged) the previous message is deleted
- allows application to push updates down to the network layer whenever is convenient, protocol ensures that only the most recent data is ever sent
- also ensures that in the event of packet loss, only the most recent data is retransmitted
Standard sub-protocol
- provides reliable, in-order byte stream over multiple simultaneous channels
- congestion control, loss recovery algorithms based on TCP best practices
- maximum of 16 parallel channels
- a single packet may contain data from several channels
- channel 0 gets priority (half of the available packet space), the remaining available space is divided up evenly among remaining channels with data to send
- performance profile is similar to TCP
- called the "standard" protocol because this is what most traffic will actually use
Other sub-protocols
- Currently only the two sub-protocols described above are implemented, but there are plans for at least two additional ones
- Unreliable sub-protocol
- would send a single packet immediately, with no guarantees about delivery or ordering
- would basically be a "trapdoor" into the UDP layer, but within the VIP connection management framework
- Streaming sub-protocol
- intended for sending fixed-rate data such as audio or video streams
- would provide in-order delivery and be based on standard sub-protocol
- acknowledgements would include amount of data receiver has buffered
- sender could use this information about the peer buffer state to make decisions about flow control, such as increasing sending rate if the receiver is being starved
- could optionally allow for zero-buffering mode, for real-time streams like live videoconferencing
- could also optionally permit packet loss, if the application is able to handle jumps in data sequence
- could make packet size guarantees to application layer (so that packets could fall cleanly on data block bounderies)
Packet Scheduling
- different types of service are scheduled according to priority
- background:
- the network has a certain bandwidth and queuing capacity
- if data enters the network at or below the bandwidth, no queuing occurs
- if too much data comes at once (bottleneck) then queuing occurs
- queuing is bad because
- packets get delayed, or
- packets get dropped
- so a packet will travel through the network fastest if it experiences no queuing between the endpoints
low latency packets want to go from end to end in least possible time
VIP explicitly models queuing in order to avoid creating congestion
tracks count of "in-flight" data (bytes which may still be in the network)
standard protocol computes congestion window
- congestion window is effectively the queuing capacity of the network
from our perspective
- we can compute the network throughput, so we can compute how long it should take for a congestion window's worth of data to "drain out"
to make low latency packets actually low latency, the network must not be congested
thus, the standard protocol must stop sending for some time prior to sending the low latency packet, in order to let the network buffers "drain out"
because the low latency protocol has a heartbeat, standard protocol sends inbetween heartbeats
VIP computes "CANSEND" which is the estimate of how much data we are allowed to send, such that it will have been transmitted completely through the network by the time we are ready to send a low latency packet
end result: VIP makes it possible to have a large-scale transfer take full advantage of available bandwidth, but with having a negative impact on the latency of packets scheduled with the low-latency protocol
VIP Performance Test Results
- interested in two parameters: throughput and latency
- test setup
- tranfering 1 megabyte of data from my laptop on a residential cable modem to interreality.org on commercial colocation hosting
- test control is sending 1 megabyte of data using TCP (via "netcat") and looking at ICMP pingtime
- VIP sends 1 megabyte of data via standard sub-protocol, while low-latency protocol sends ping messages
- normal ICMP ping time between hosts is about 50 milliseconds
- results
- transfer time of 1 MB with TCP: 35 seconds
- ICMP ping time during transfer: 1000 milliseconds
- transfer time of 1 MB with VIP: 45 seconds
- VIP ping time during transfer: 200 milliseconds
- conclusion: a 30% increase in transfer time results in a fivefold improvement in latency, easily the difference between usable and unusable in a real-time app
VIP Implementation
- written in portable C++ and tested on Linux, Windows
- mostly independent of VOS, although it relies on libvutil for reference counting implementation (so as to be easily compatible with other interreality.org software) and byte packing
- multithreaded and threadsafe, allows for multiple connections and multiple listen sockets (uses boost::threads)
- callback based, so the application is notified when connection and data receive events occur
- also supports a "message sent" event (to know when some amount of data that was scheduled to be sent has been fully ACK'd by the other side)
- fairly stable and robust so far, but has not been fully security audited or tested against garbage packets
- since it is an application-level protocol, suspending the application does causes it to stop responding to pings (unlike a kernel-level service)
- similarly if the application terminates unexpectedly the connected peers arn't sent any kind of clean shutdown and instead just have to time out
- LGPL license
- Comes as part of VOS download. See http://interreality.org/download for source code download links.
Discussion and requests (won't go into the final doc)
Request: from looking at ter'angreal, I assume there is a way to interact with vip from vos - as in "get me the value of this property, but via standard subprotocol"; "send this message via low-latency subprotocol". If I'm right, please document the APIs to do that. If not, please point to why not and how to achieve the desired effect instead :-) -- Lalo
--> Yeah, there's not much at the application level (Property class) like that, but you can set the "priority" of a Message object to Normal, LowLatency or Bulk. I expect that individual [MetaObject] classes like Property will expose that to applications when appropriate, (or we could add a method to the MetaObject class to set the priority for all messages sent by that object...). I expect more features of VIP to be exposed at the VOS and application (MetaObject) levels as needed in the future as well. -reed
--> The property class has a "setPriority" method on it which determines what priority the property read/write/update messages will be sent with.
You're right, I will add a section describing how VIP fits into VOS. -- Pete