evs_overview.8 9.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
  1. .\"/*
  2. .\" * Copyright (c) 2004 MontaVista Software, Inc.
  3. .\" *
  4. .\" * All rights reserved.
  5. .\" *
  6. .\" * Author: Steven Dake (sdake@mvista.com)
  7. .\" *
  8. .\" * This software licensed under BSD license, the text of which follows:
  9. .\" *
  10. .\" * Redistribution and use in source and binary forms, with or without
  11. .\" * modification, are permitted provided that the following conditions are met:
  12. .\" *
  13. .\" * - Redistributions of source code must retain the above copyright notice,
  14. .\" * this list of conditions and the following disclaimer.
  15. .\" * - Redistributions in binary form must reproduce the above copyright notice,
  16. .\" * this list of conditions and the following disclaimer in the documentation
  17. .\" * and/or other materials provided with the distribution.
  18. .\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
  19. .\" * contributors may be used to endorse or promote products derived from this
  20. .\" * software without specific prior written permission.
  21. .\" *
  22. .\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  23. .\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  24. .\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  25. .\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  26. .\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  27. .\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  28. .\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  29. .\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  30. .\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  31. .\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  32. .\" * THE POSSIBILITY OF SUCH DAMAGE.
  33. .\" */
  34. .TH EVS_OVERVIEW 8 2004-08-31 "openais Man Page" "Openais Programmer's Manual"
  35. .SH OVERVIEW
  36. The EVS library is delivered with the openais project. This library is used
  37. to create distributed applications that operate properly during partitions, merges,
  38. and faults.
  39. .PP
  40. The library provides a mechanism to:
  41. * handle abstraction for multiple instances of an EVS library in one application
  42. * Deliver messages
  43. * Deliver configuration changes
  44. * join one or more groups
  45. * leave one or more groups
  46. * send messages to one or more groups
  47. * send messages to currently joined groups
  48. .PP
  49. The EVS library implements a messaging model known as Extended Virtual Synchrony.
  50. This model allows one sender to transmit to many receivers using standard UDP/IP.
  51. UDP/IP is unreliable and unordered, so the EVS library applies ordering and reliability
  52. to messages. Hardware multicast is used to avoid duplicated packets with two or more
  53. receivers. Erroneous messages are corrected automatically by the library.
  54. .PP
  55. Certain gaurantees are provided by the EVS library. These guarantees are related to
  56. message delivery and configuration change delivery.
  57. .SH DEFINITIONS
  58. .TP
  59. .B multicast
  60. A multicast occurs when a network interface card sends a UDP packet to multiple
  61. receivers simulatenously.
  62. .TP
  63. .B processor
  64. A processor is the entity that executes the extended virtual synchrony algorithms.
  65. .TP
  66. .B configuration
  67. A configuration is the current description of the processors executing the extended
  68. virtual syncrhony algorithm.
  69. .TP
  70. .B configuration change
  71. A configuration change occurs when a new configuration is delivered.
  72. .TP
  73. .B partition
  74. A partition occurs when a configuration splits into two or more configurations, or
  75. a processor fails or is stopped and leaves the configuration.
  76. .TP
  77. .B merge
  78. A merge occurs when two or more configurations join into a larger new configuration. When
  79. a new processor starts up, it is treated as a configuration with only one processor
  80. and a merge occurs.
  81. .TP
  82. .B fifo ordering
  83. A message is FIFO ordered when one sender and one receiver agree on the order of the
  84. messages sent.
  85. .TP
  86. .B agreed ordering
  87. A message is AGREED ordered when all processors agree on the order of the messages sent.
  88. .TP
  89. .B safe ordering
  90. A message is SAFE ordered when all processors agree on the order of messages sent and
  91. those messages are not delivered until all processors have a copy of the message to
  92. deliver.
  93. .TP
  94. .B virtual syncrhony
  95. Virtual syncrhony is obtained when all processors agree on the order of messages
  96. sent and configuration changes sent for each new configuration.
  97. .SH USING VIRTUAL SYNCHRONY
  98. The virtual synchrony messaging model has many benefits for developing distributed
  99. applications. Applications designed using replication have the most benefits. Applications
  100. that must be able to partition and merge also benefit from the virtual synchrony messaging
  101. model.
  102. .PP
  103. All applications receive a copy of transmitted messages even if there are errors on the
  104. transmission media. This allows optimiziations when every processor must receive a copy
  105. of the message for replication.
  106. .PP
  107. All messages are ordered according to agreed ordering. This mechanism allows the avoidance
  108. of race conditions. Consider a lock service implemented over several processors. Two
  109. requests occur at the same time on two seperate processors. The requests are ordered for
  110. every processor in the same order and delivered to the processors. Then all processors
  111. will get request A before request B and can reject request B. Any type of creation or
  112. deletion of a shared data structure can benefit from this mechanism.
  113. .PP
  114. Self delivery ensures that messages that are sent by a processor are also delivered back
  115. to that processor. This allows the processor sending the message to execute logic when
  116. the message is self delivered according to agreed ordering and the virtual synchrony rules.
  117. It also permits all logic to be placed in one message handler instead of two seperate places.
  118. .PP
  119. Virtual Synchrony allows the current configuration to be used to make decisions in partitions
  120. and merges. Since the configuration is sent in the stream of messages to the application,
  121. the application can alter its behavior based upon the configuration changes.
  122. .SH ARCHITECTURE AND ALGORITHM
  123. The EVS library is a thin IPC interface to the openais executive. The openais executive
  124. provides services for the SA Forum AIS libraries as well as the EVS library.
  125. .PP
  126. The openais executive uses a ring protocol and membership protocol to send messages
  127. according to the semantics required by extended virtual synchrony. The ring protocol
  128. creates a virtual ring of processors. A token is rotated around the ring of processors.
  129. When the token is possessed by a processor, that processor may multicast messages to
  130. other processors in the system.
  131. .PP
  132. The token is called the ORF token (for ordering, reliability, flow control). The ORF
  133. token orders all messages by increasing a sequence number every time a message is
  134. multicasted. In this way, an ordering is placed on all messages that all processors
  135. agree to. The token also contains a retransmission list. If a token is received by
  136. a processor that has not yet received a message it should have, a message sequence
  137. number is added to the retransmission list. A processor that has a copy of the message
  138. then retransmits the message. The ORF token provides configuration-wide flow control
  139. by tracking the number of messages sent and limiting the number of messages that may
  140. be sent by one processor on each posession of the token.
  141. .PP
  142. The membership protocol is responsible for ring formation and detecting when a processor
  143. within a ring has failed. If the token fails to make a rotation within a timeout period
  144. known as the token rotation timeout, the membership protocol will form a new ring.
  145. If a new processor starts, it will also form a new ring. Two or more configurations
  146. may be used to form a new ring, allowing many partitions to merge together into one
  147. new configuration.
  148. .SH PERFORMANCE
  149. The EVS library obtains 8.5MB/sec throughput on 100 mbit network links with
  150. many processors. Larger messages obtain better throughput results because the
  151. time to access Ethernet is about the same for a small message as it is for a
  152. larger message. Smaller messages obtain better messages per second, because the
  153. time to send a message is not exactly the same.
  154. .PP
  155. 80% of CPU utilization occurs because of encryption and authentication. The openais
  156. can be built without encryption and authentication for those with no security
  157. requirements and low CPU utilization requirements. Even without encryption or
  158. authentication, under heavy load, processor utilization can reach 25% on 1.5 GHZ
  159. CPU processors.
  160. .PP
  161. The current openais executive supports 16 processors, however, support for more processors is possible by changing defines in the openais executive. This is untested, however.
  162. .SH SECURITY
  163. The EVS library encrypts all messages sent over the network using the SOBER-128
  164. stream cipher. The EVS library uses HMAC and SHA1 to authenticate all messages.
  165. The EVS library uses SOBER-128 as a pseudo random number generator. The EVS
  166. library feeds the PRNG using the /dev/random Linux device.
  167. .SH BUGS
  168. This software is not yet production, so there may still be some bugs. But it appears
  169. there are very few since nobody reports any unknown bugs at this point.
  170. .SH "SEE ALSO"
  171. .BR evs_initialize (3),
  172. .BR evs_finalize (3),
  173. .BR evs_fd_get (3),
  174. .BR evs_dispatch (3),
  175. .BR evs_join (3),
  176. .BR evs_leave (3),
  177. .BR evs_mcast_joined (3),
  178. .BR evs_mcast_groups (3),
  179. .BR evs_mmembership_get (3)
  180. .PP