What are the best practices to build a resilient system in AWS?

We can follow these best practices to build a resilient system in AWS:


  • Backup: We need a useful and fast, backup and restore strategy for our data. The backup and restore process should be automated.
  • Reboot: Since nodes crash and new nodes restart in AWS, it is good to build threads that automatically resume on reboot of the node.
  • Re-sync: The system in AWS cloud should be able to re-sync itself by reloading messages from queues.
  • Images: We need to maintain pre-configured and pre-optimized virtual images to restore the system. Also these images should be pre-configured to restart processes on reboot automatically.
  • In-memory sessions: Wherever possible we should minimize the use of in-memory sessions and stateful user context in AWS.

