|
|
@@ -1,62 +1,106 @@
|
|
|
-Application Interface Specification TODO list
|
|
|
+The openais standards based cluster framework TODO list
|
|
|
+Last Updated: May 26, 2006
|
|
|
+
|
|
|
+P1 items are to be implemented before Wilson release. P2 items may be
|
|
|
+implemented in Wilson if interested parties provide patches. P3 items are
|
|
|
+targeted for Humphrey.
|
|
|
|
|
|
Generic Items
|
|
|
-------------
|
|
|
-* EVT, DLOCK, MSG APIs functionality need to be developed.
|
|
|
-* Error checking on parameters could use improvement.
|
|
|
-* Allow AIS Executive to configure cluster name.
|
|
|
-* Compliance testing of return values would be helpful.
|
|
|
-* Support B.01.01 version of spec (currently support A.01.01 version).
|
|
|
-* Consider implementing SOCK_SEQPACKET for the AF_UNIX family of sockets on
|
|
|
- Linux. This would save an extra system call every time an operation must
|
|
|
- be done from the API.
|
|
|
-* There are lots of TODO's in the code that need attention.
|
|
|
-
|
|
|
-Group Messaging Interface
|
|
|
--------------------------
|
|
|
-* Very important: implement full EVS semantics when holes occur in
|
|
|
- delivery messages after a configuration change but before the new
|
|
|
- configuration is delivered.
|
|
|
-* Very important: block new messages from being multicast until recovery
|
|
|
- of each service has completed after a configuration change. This could be
|
|
|
- done with a "plug" in the token which "stops" any GMI_PRIO_MED or GMI_PRIO_LOW
|
|
|
- messages from being multicast until all members of the configuration have
|
|
|
- unplugged the token. Then queued messages in MED or LOW priority can be
|
|
|
- sent, ensuring correct partition operation.
|
|
|
-* Implement error creation config file to test GMI since all my lossy
|
|
|
- hardware has been fixed.
|
|
|
-* Add secrecy/authentication to group messaging interface.
|
|
|
-* Add support for multiple rings with gateway ring to ring for added scalability
|
|
|
- in LANs.
|
|
|
-* Add support for multiple rings with gateway tuned to long haul networks for
|
|
|
- added scalability in WANs. Look at spread.org as a design.
|
|
|
-* Add support for low delivery-time delay FIFO messages.
|
|
|
-* Add support for SAFE ordering.
|
|
|
-* Add support for encryption/authentication using Helix. nonce will start
|
|
|
- at zero and increment for every message sent or rotation on the ring. Group
|
|
|
- key produced using group key generation protocol.
|
|
|
-
|
|
|
-Cluster Membership
|
|
|
-------------------
|
|
|
-* Make timeout on SaClmClusterNodeGet work. Currently the timeout is 5
|
|
|
- seconds, but the spec requires the timeout to be specified in the API call.
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+* P2 Itemize any changes required for AIS B.02.01 in the TODO list.
|
|
|
+* P2 doxygen-ize the include and lib directories.
|
|
|
+* P2 There are many TODOs in the code that need attention.
|
|
|
+* P2 Implement static voting virtual synchrony filter.
|
|
|
+* P3 Integrate with rgmanager system.
|
|
|
+* P3 doxygen-ize the exec directory.
|
|
|
+* P3 Advanced Synchronization Engine needed to synchronize data without
|
|
|
+ long blocking delays during configuration changes.
|
|
|
|
|
|
-Availability Management Framework
|
|
|
----------------------------------
|
|
|
-* Very Important: Implement configuration change support. This includes partitions.
|
|
|
-* Currently the executive can record and manage only one component service
|
|
|
- instance per component. As a result, one component cannot act as standby/active
|
|
|
- for two other components in the system (AIS Spec page 74 Figure 16. Example of
|
|
|
- n+1 redundancy model).
|
|
|
- - Fix to follow spec.
|
|
|
-* If a user of the AMF library doesn't respond with saAmfResponse, the state
|
|
|
- of the application will never change. Fix by adding timeouts to readiness state
|
|
|
- and ha state changes to force saAmfResponse state changes triggered.
|
|
|
-* Implement resource proxy functions. (currently dummy functions)
|
|
|
-* Implement pending operations. (currently dummy functions)
|
|
|
-* Implement NWAY and NWAYACTIVE redundancy models.
|
|
|
+Totem
|
|
|
+-----
|
|
|
+* P1 Test scalability to 128 processors.
|
|
|
+* P1 Implement safe message delivery (for both Wilson and Picacho).
|
|
|
+* P1 Add mechanism to reenable a redundant ring after that ring has been
|
|
|
+ declared faulty and then repaired by the system administration.
|
|
|
+* P1 In redundant ring configuration disallow binding to localhost interface.
|
|
|
+* P2 Flush totem messages during RECOVERY state that are in the
|
|
|
+ new_message_queue.
|
|
|
+* P2 Turn totem layer into an LCR component.
|
|
|
+* P3 Implement the totem multiring protocol.
|
|
|
+
|
|
|
+YKD Virtual Synchrony Filter
|
|
|
+----------------------------
|
|
|
+* P2 Scale to 128 processors.
|
|
|
+
|
|
|
+LCR
|
|
|
+---
|
|
|
+* P2 Finish live component replacement services.
|
|
|
|
|
|
Checkpointing
|
|
|
-------------
|
|
|
-* Very Important: Implement configuration change support. This includes partitions.
|
|
|
-* Implement expiration times on checkpoints.
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+* P1 There is a bug in that iteration doesn't work under heavy load that
|
|
|
+ must be fixed.
|
|
|
+* P1 Creation/deletion of checkpoints should be done using SAFE messaging.
|
|
|
+* P1 The checkpoint unlink operation doesn't work as per specification. The
|
|
|
+ specification says that a checkpoint unlink should then allow a new
|
|
|
+ checkpoint with the same name to be created. All cluster members using
|
|
|
+ the old checkpoint continue to use the old checkpoint until they all
|
|
|
+ stop using the checkpoint, at which point the checkpoint should be removed
|
|
|
+ from the cluster.
|
|
|
+* P2 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+Distributed Locking Service
|
|
|
+---------------------------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+* P1 Distributed locking needs configuration change support.
|
|
|
+* P1 Provide kernel-based DLM service handler for distributed locking.
|
|
|
+* P2 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+Messaging Service
|
|
|
+-----------------
|
|
|
+* P2 32/64 bit cross-endian must be working.
|
|
|
+* P2 Finish implementation.
|
|
|
+* P2 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+Availability Management Framework
|
|
|
+---------------------------------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+* P1 Finish next generation state machine design.
|
|
|
+* P1 Implement next generation state machine design.
|
|
|
+* P2 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+Logging Service
|
|
|
+---------------
|
|
|
+* P2 32/64 bit cross-endian must be working.
|
|
|
+* P2 Design and implement.
|
|
|
+* P3 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+IMMS
|
|
|
+----
|
|
|
+* P3 32/64 bit cross-endian must be working.
|
|
|
+* P3 Design and implement morphing the configuration object database.
|
|
|
+* P3 Conformance testing via SAF-TEST needs to reach 100%.
|
|
|
+
|
|
|
+Cluster Membership
|
|
|
+------------------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+
|
|
|
+Eventing
|
|
|
+--------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+
|
|
|
+Closed Process Groups
|
|
|
+---------------------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+
|
|
|
+Extended Virtual Synchrony Passthrough Interface
|
|
|
+------------------------------------------------
|
|
|
+* P1 32/64 bit cross-endian must be working.
|
|
|
+* P1 Modify man pages to match new API semantics.
|
|
|
+* P1 Add mechanism to get redundant ring information from EVS interface.
|
|
|
+
|
|
|
+IPC
|
|
|
+---
|
|
|
+* There are no TODOs for the IPC system.
|