evs_overview.8 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. .\"/*
  2. .\" * Copyright (c) 2004 MontaVista Software, Inc.
  3. .\" *
  4. .\" * All rights reserved.
  5. .\" *
  6. .\" * Author: Steven Dake (sdake@redhat.com)
  7. .\" *
  8. .\" * This software licensed under BSD license, the text of which follows:
  9. .\" *
  10. .\" * Redistribution and use in source and binary forms, with or without
  11. .\" * modification, are permitted provided that the following conditions are met:
  12. .\" *
  13. .\" * - Redistributions of source code must retain the above copyright notice,
  14. .\" * this list of conditions and the following disclaimer.
  15. .\" * - Redistributions in binary form must reproduce the above copyright notice,
  16. .\" * this list of conditions and the following disclaimer in the documentation
  17. .\" * and/or other materials provided with the distribution.
  18. .\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
  19. .\" * contributors may be used to endorse or promote products derived from this
  20. .\" * software without specific prior written permission.
  21. .\" *
  22. .\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  23. .\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  24. .\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  25. .\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  26. .\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  27. .\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  28. .\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  29. .\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  30. .\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  31. .\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  32. .\" * THE POSSIBILITY OF SUCH DAMAGE.
  33. .\" */
  34. .TH EVS_OVERVIEW 8 2004-08-31 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
  35. .SH NAME
  36. evs_overview \- EvS Library Overview
  37. .SH OVERVIEW
  38. The EVS library is delivered with the corosync project. This library is used
  39. to create distributed applications that operate properly during partitions, merges,
  40. and faults.
  41. .PP
  42. The library provides a mechanism to:
  43. * handle abstraction for multiple instances of an EVS library in one application
  44. * Deliver messages
  45. * Deliver configuration changes
  46. * join one or more groups
  47. * leave one or more groups
  48. * send messages to one or more groups
  49. * send messages to currently joined groups
  50. .PP
  51. The EVS library implements a messaging model known as Extended Virtual Synchrony.
  52. This model allows one sender to transmit to many receivers using standard UDP/IP.
  53. UDP/IP is unreliable and unordered, so the EVS library applies ordering and reliability
  54. to messages. Hardware multicast is used to avoid duplicated packets with two or more
  55. receivers. Erroneous messages are corrected automatically by the library.
  56. .PP
  57. Certain guarantees are provided by the EVS library. These guarantees are related to
  58. message delivery and configuration change delivery.
  59. .SH DEFINITIONS
  60. .TP
  61. .B multicast
  62. A multicast occurs when a network interface card sends a UDP packet to multiple
  63. receivers simulatenously.
  64. .TP
  65. .B processor
  66. A processor is the entity that executes the extended virtual synchrony algorithms.
  67. .TP
  68. .B configuration
  69. A configuration is the current description of the processors executing the extended
  70. virtual syncrhony algorithm.
  71. .TP
  72. .B configuration change
  73. A configuration change occurs when a new configuration is delivered.
  74. .TP
  75. .B partition
  76. A partition occurs when a configuration splits into two or more configurations, or
  77. a processor fails or is stopped and leaves the configuration.
  78. .TP
  79. .B merge
  80. A merge occurs when two or more configurations join into a larger new configuration. When
  81. a new processor starts up, it is treated as a configuration with only one processor
  82. and a merge occurs.
  83. .TP
  84. .B fifo ordering
  85. A message is FIFO ordered when one sender and one receiver agree on the order of the
  86. messages sent.
  87. .TP
  88. .B agreed ordering
  89. A message is AGREED ordered when all processors agree on the order of the messages sent.
  90. .TP
  91. .B safe ordering
  92. A message is SAFE ordered when all processors agree on the order of messages sent and
  93. those messages are not delivered until all processors have a copy of the message to
  94. deliver.
  95. .TP
  96. .B virtual syncrhony
  97. Virtual syncrhony is obtained when all processors agree on the order of messages
  98. sent and configuration changes sent for each new configuration.
  99. .SH USING VIRTUAL SYNCHRONY
  100. The virtual synchrony messaging model has many benefits for developing distributed
  101. applications. Applications designed using replication have the most benefits. Applications
  102. that must be able to partition and merge also benefit from the virtual synchrony messaging
  103. model.
  104. .PP
  105. All applications receive a copy of transmitted messages even if there are errors on the
  106. transmission media. This allows optimiziations when every processor must receive a copy
  107. of the message for replication.
  108. .PP
  109. All messages are ordered according to agreed ordering. This mechanism allows the avoidance
  110. of race conditions. Consider a lock service implemented over several processors. Two
  111. requests occur at the same time on two seperate processors. The requests are ordered for
  112. every processor in the same order and delivered to the processors. Then all processors
  113. will get request A before request B and can reject request B. Any type of creation or
  114. deletion of a shared data structure can benefit from this mechanism.
  115. .PP
  116. Self delivery ensures that messages that are sent by a processor are also delivered back
  117. to that processor. This allows the processor sending the message to execute logic when
  118. the message is self delivered according to agreed ordering and the virtual synchrony rules.
  119. It also permits all logic to be placed in one message handler instead of two seperate places.
  120. .PP
  121. Virtual Synchrony allows the current configuration to be used to make decisions in partitions
  122. and merges. Since the configuration is sent in the stream of messages to the application,
  123. the application can alter its behavior based upon the configuration changes.
  124. .SH ARCHITECTURE AND ALGORITHM
  125. The EVS library is a thin IPC interface to the corosync executive. The corosync executive
  126. provides services for the SA Forum AIS libraries as well as the EVS library.
  127. .PP
  128. The corosync executive uses a ring protocol and membership protocol to send messages
  129. according to the semantics required by extended virtual synchrony. The ring protocol
  130. creates a virtual ring of processors. A token is rotated around the ring of processors.
  131. When the token is possessed by a processor, that processor may multicast messages to
  132. other processors in the system.
  133. .PP
  134. The token is called the ORF token (for ordering, reliability, flow control). The ORF
  135. token orders all messages by increasing a sequence number every time a message is
  136. multicasted. In this way, an ordering is placed on all messages that all processors
  137. agree to. The token also contains a retransmission list. If a token is received by
  138. a processor that has not yet received a message it should have, a message sequence
  139. number is added to the retransmission list. A processor that has a copy of the message
  140. then retransmits the message. The ORF token provides configuration-wide flow control
  141. by tracking the number of messages sent and limiting the number of messages that may
  142. be sent by one processor on each posession of the token.
  143. .PP
  144. The membership protocol is responsible for ring formation and detecting when a processor
  145. within a ring has failed. If the token fails to make a rotation within a timeout period
  146. known as the token rotation timeout, the membership protocol will form a new ring.
  147. If a new processor starts, it will also form a new ring. Two or more configurations
  148. may be used to form a new ring, allowing many partitions to merge together into one
  149. new configuration.
  150. .SH PERFORMANCE
  151. The EVS library obtains 8.5MB/sec throughput on 100 mbit network links with
  152. many processors. Larger messages obtain better throughput results because the
  153. time to access Ethernet is about the same for a small message as it is for a
  154. larger message. Smaller messages obtain better messages per second, because the
  155. time to send a message is not exactly the same.
  156. .PP
  157. 80% of CPU utilization occurs because of encryption and authentication. The corosync
  158. can be built without encryption and authentication for those with no security
  159. requirements and low CPU utilization requirements. Even without encryption or
  160. authentication, under heavy load, processor utilization can reach 25% on 1.5 GHZ
  161. CPU processors.
  162. .PP
  163. The current corosync executive supports 16 processors, however, support for more processors is possible by changing defines in the corosync executive. This is untested, however.
  164. .SH SECURITY
  165. The EVS library encrypts all messages sent over the network using the SOBER-128
  166. stream cipher. The EVS library uses HMAC and SHA1 to authenticate all messages.
  167. The EVS library uses SOBER-128 as a pseudo random number generator. The EVS
  168. library feeds the PRNG using the /dev/random Linux device.
  169. .SH BUGS
  170. This software is not yet production, so there may still be some bugs. But it appears
  171. there are very few since nobody reports any unknown bugs at this point.
  172. .SH "SEE ALSO"
  173. .BR evs_initialize (3),
  174. .BR evs_finalize (3),
  175. .BR evs_fd_get (3),
  176. .BR evs_dispatch (3),
  177. .BR evs_join (3),
  178. .BR evs_leave (3),
  179. .BR evs_mcast_joined (3),
  180. .BR evs_mcast_groups (3),
  181. .BR evs_mmembership_get (3)
  182. .BR evs_context_get (3)
  183. .BR evs_context_set (3)
  184. .PP