4
0

QUICKSTART 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280
  1. Application Interface Specification Quckstart Guide
  2. ---------------------------------------------------
  3. ***
  4. All cryptographic software in this package is subject to the following legal
  5. notice:
  6. This package includes publicly available encryption source code which,
  7. together with object code resulting from the compiling of publicly
  8. available source code, may be exported from the United States under License
  9. Exception TSU prsuant to 15 C.F.R Section 740.13(e).
  10. ***
  11. This corosync package is broken into four parts. The exec directory contains
  12. all of the code responsible for serving the APIs. The lib directory contains
  13. APIs the to which the user may link. The test directory contains some simple
  14. test programs which exercise the APIs. The directory conf contains example
  15. configuration files which can be copied directly onto the target system.
  16. The API implements SA Forum APIs for Cluster Membership (CLM), Availabilty
  17. Management Framework (AMF), Checkpointing (CKPT), and Eventing (EVT).
  18. The API also contains an extended virtual synchrony API which can be used
  19. in distributed applications.
  20. Configuring the corosync executive:
  21. ---------------------------------
  22. The corosync executive will automatically determine cluster membership by
  23. communicating on a specified multicast address and port.
  24. The directory conf contains the file corosync.conf
  25. totem {
  26. bindnetaddr: 192.168.1.0
  27. mcastaddr: 226.94.1.1
  28. mcastport: 5405
  29. }
  30. logging {
  31. logoutput: file
  32. logoutput: stderr
  33. logoutput: syslog
  34. logfile: /tmp/ais
  35. debug: on
  36. timestamp: on
  37. }
  38. timeout {
  39. token: 200
  40. token_retransmit: 50
  41. hold: 30
  42. retransmits_before_loss: 4
  43. join: 100
  44. consensus: 200
  45. merge: 200
  46. downcheck: 1000
  47. fail_recv_const: 250
  48. }
  49. The totem section contains three values. All three values must be set
  50. or the corosync executive wll exit with an error.
  51. bindnetaddr specifies the address which the corosync Executive should bind to.
  52. This address should always end in zero. If the local interface taffic
  53. should routed over is 192.168.5.92, set bindnetaddr to 192.168.5.0.
  54. mcastaddr is a multicast address. The default should work but you may have
  55. a different network configuration. Avoid 224.x.x.x because this is a "config"
  56. multicast address.
  57. mcastport specifies the UDP port number. It is possible to use the same
  58. multicast address on a network with the corosync services configured for different
  59. UDP ports.
  60. The logging section contains values. These values do not have to be set in which
  61. case the system defaults to logging to syslog and stderr with timestamping and debug.
  62. It is possible to select 3 destinations for logs: files, stderr, and syslog. One or
  63. more may be selected at the same time. If file is selected as a destination, the file
  64. name must be specified via the logfile option or the corosync executive will exit.
  65. The debug option prints out internal debugging information during runtime which may
  66. be helpful for developers.
  67. The timestamp option prints the date and time on each log message.
  68. The timeout section contains seven values. This section is not normally used, but
  69. rather used to override the program defaults for the purposes of fine tuning for
  70. a given networking/processor combination or for debugging purposes. Be careful to
  71. use the same timeout values on each of the nodes in the cluster or unpredictable
  72. results may occur.
  73. All timeout values except fail_recv_const are miliseconds. fail_recv_const is
  74. a message count. Until the man page is done you'll have to check the code and the
  75. totem spec for the function and usage of the timeouts.
  76. The directory conf contains the file amf.conf which specifies the failover
  77. groups, service units, components, and policies to be used by the AMF. The
  78. configuration file matches the testamf1-6 programs in the test directory and
  79. can be copied directly.
  80. These two files should be placed in the /etc/ais directory.
  81. A few notes about the config files:
  82. 1) Do not use DOS style termination. This breaks the parser.
  83. 2) Do not have blank lines in the amf.conf file. This breaks the amf configuration
  84. file parser. We are working on fixing these bugs, but for the moment, it is
  85. easy to simply avoid them.
  86. Building corosync
  87. ----------------
  88. corosync requires GCC, LD, and a Linux 2.4/2.6 kernel. corosync has been tested on
  89. Debian Sarge(i386), Redhat 9(i386), Fedora Core 2(i386), Fedora Core
  90. 4(i386,x86_64) and MontaVista Carrier Grade Edition 3.1(i386, x86_64,
  91. classic ppc, ppc970, xscale).
  92. Compile corosync by running make in the root directory. Make can also be run
  93. in the individual directories. Nothing is installed by make. If install
  94. is desired, the files must be copied manually.
  95. Configure Host
  96. --------------
  97. For security reasons, the corosync only allows a process that had the EGID/GID
  98. of "ais" to connect to it. To make development easier, it is recommended to
  99. create an "ais" user with the "ais" group.
  100. [root@slickdeal root]# adduser ais -g ais
  101. Set the ais user's password:
  102. [root@slickdeal root]# passwd ais
  103. Changing password for user ais.
  104. New password:
  105. Retype new password:
  106. passwd: all authentication tokens updated successfully.
  107. Generate a private key
  108. ----------------------
  109. corosync uses cryptographic techniques to ensure authenticity and privacy of
  110. messages. A private key must be generated and shared by all processors for
  111. correct operation.
  112. First generate the key on one of the nodes:
  113. unix# exec/keygen
  114. Opencorosync Authentication key generator.
  115. Gathering 1024 bits for key from /dev/random.
  116. Writing corosync key to /etc/ais/authkey.
  117. After this is complete, a private key will be in the file /etc/ais/authkey.
  118. This private key must be copied to every processor that will be a member of
  119. the cluster. If the private key isn't the same for every node, those nodes
  120. with nonmatching private keys will not be able to join the same configuration.
  121. Copy the key to some transportable storage or use ssh to transmit the key
  122. from node to node. Then install the key with the command:
  123. unix# install -D --group=0 --owner=0 --mode=0400 /path_to_authkey/authkey /etc/ais/authkey
  124. If the message invalid digest appears, the keys are not the same on each node.
  125. Run the corosync executive
  126. -------------------------
  127. Get one or more nodes and run the corosync executive on each node. A list of
  128. node IPs should be logged when the nodes join a configuration. Run the
  129. aisexec program after following the previous directions.
  130. A final note on permissions:
  131. It is not absolutely required that corosync executive runs as root. If
  132. it runs as root, it schedules at the highest round robin realtime
  133. priority and locks all of it's pages into ram in case a swap would cause a
  134. delay in the real-time nature of the protocol. The warning "not
  135. able to lock pages" is simply a warning and can be ignored if you choose
  136. to run as a non root user.
  137. The ais user/group is required because applications are authenticated
  138. against the ais user and group. If an application(/library) is not root
  139. or ais, then the application cannot connect to the ais executive.
  140. please read SECURITY to understand the threat model assumed by corosync
  141. and the techniques corosync use to overcome these threats.
  142. Before running any of the test programs
  143. ---------------------------------------
  144. The corosync executive will ensure security by only allowing the ais group (or
  145. uid root) to connect to the service. Switch to the ais group before
  146. running any applications linked to the ais apis, or the applications will
  147. not be authenticated and won't be able to access services.
  148. [sdake@slickdeal sdake]$ su ais
  149. Password:
  150. [ais@slickdeal sdake]$ id
  151. uid=501(ais) gid=502(ais) groups=502(ais)
  152. Try out the corosync CLM functionality
  153. -------------------------------------
  154. After aisexec is running
  155. su to ais user
  156. Run test/testclm on one node. Then kill and add nodes. This will cause
  157. callbacks to be called in the testclm application which will print out
  158. the node state changes. The testclm program will not print any output
  159. after it is started and has printed the current configuration until nodes
  160. are added to or deleted from the configuration by starting and stopping
  161. aisexec on other nodes.
  162. Killing aisexec on the node the testclm is connected will cause the
  163. API to return error codes indicating the system has failed.
  164. Try out the corosync AMF functionality
  165. -------------------------------------
  166. After aisexec is running
  167. su to ais user
  168. The test/testamf{1-6} implement three seperate service units (SU). SU #1
  169. consists of testamf1, testamf2. SU #2 consists of testamf3, testamf4.
  170. SU #3 consists of testamf5, testamf6. The active and backup directives
  171. in amf.conf define how many SU's become active and how many
  172. become standby in the service group (SG).
  173. To test the corosync AMF, run testamf3 and testamf4 on one node. Both
  174. components become in service and active. Then run testamf1. Nothing
  175. appears to happen, because testamf1 is not placed in service (and made
  176. standby) until testamf2 is registered. Running testamf2 will show
  177. a variety of state changes. testamf1 will match these state changes.
  178. testamf2 is special because is reports an error, and later cancels
  179. the error, causing the entire SU to go out of service, then back in
  180. service. This behavior is expected by the corosync specification and the
  181. code in testamf2.c can be read for a clearer understanding of what
  182. is happening.
  183. Pressing ctrl-z to background the task (which causes the healthcheck to
  184. timeout) on a component will cause the remaining component to go
  185. out of service. If ctrl-z is pressed on the active SU, the standby
  186. SU will become active. CTRL-C on these tests behaves the same way.
  187. A crash behaves the same way.
  188. Try out the corosync CKPT functionality
  189. --------------------------------------
  190. su to ais user
  191. run testckpt. This will execute various checkpoint API operations.
  192. run ckptbench. This will execute non-threaded write benchmarks.
  193. run ckptbenchth. This will execute threaded write benchmarks.
  194. The benchmark configuration (how many threads to run, how many writes
  195. per benchmark run, and data write size are specified in the ckptbench.c
  196. and ckptbenchth.c programs.
  197. Two node clusters should approach 8.5 MB/sec on 100 mbit networks for
  198. larger checkpoint sizes with encryption and authentication. If you are not
  199. seeing these results, please report to the mailing list.
  200. Try out the corosync EVT functionality
  201. -------------------------------------
  202. su to ais user
  203. run testevt. This will execute various eventing API operations.
  204. Try out the corosync EVS functionality
  205. -------------------------------------
  206. su to ais user
  207. run testevs. This will generate multicast messages and self deliver them
  208. run evsbench. This will display the benchmark performance of the evs service.
  209. Write your own applications
  210. ---------------------------
  211. Without real applications, finding the hard bugs will be difficult. Please
  212. port or write apps and let us know of the progress!
  213. Contribute!
  214. -----------
  215. Code, examples, documentation, bug reports, testing are all appreciated.
  216. Read the TODO or the ask on the mailing lists for ways to contribute.