top of page
Search
understandingdevop

Why AWS SSM Automation Document is stuck in the "Inprogress" state for 12 hours and finally timeout.

Updated: May 17, 2020

So this was my interesting troubleshooting time with AWS support team. So basically my SSM Automation document was supposed to be completed in 8 hours but then it took another 4 hours, stuck at the same stage (oh btw my SSM automation document was of 9 stages).

This was really interesting issue to look into, I started troubleshooting from my side.

So basically all shell commands were included in that stage (i was creating few directories, inflating those and making tar files and shipping it to AWS S3, and last shell command was shutdown ).

So after looking into the details, surprisingly all commands were successfully executed including shutdown command.

After debugging from my end, I talked to AWS support, and after 2 days we figured out that there was a RACE Condition happening between AWS SSM agent and my OS shutdown command.

After shutting down the instance, AWS SSM agent was not able to return a response to the AWS SSM, this was the reason it was not able to update its state.


So this was the reason the automation document was stuck in the same stage. because AWS SSM agent was not getting a response back from the SSM agent due to the RACE condition of the Operating system and SSM agent.


Fix suggested from AWS support is to separate out shutdown block and rest other stuff. So we added a new stage of StoppingInstance with some aws:executeApi.


Thanks to AWS support for helping figure out this issue. Cheers :beers:


78 views0 comments

Comments


Post: Blog2_Post
bottom of page