scalr lost track of all our servers

dan's Avatar

dan

18 Jul, 2016 02:49 PM

all our ec2 servers in all our farms were "lost" in scalr this morning. lots of messages like:

Server eb90a37b-07c8-48fc-bca3-af85f597e968 (ec2) found in Scalr but not found in the cloud. Terminating.

Unable to read LoadAverage value from server #6: 54.247.0.215: Wrong crypto key!

Message: Role 'lb-nginx64-ubuntu1204' scaling up (Min: 0 < 2). Starting new instance. ServerID = 8e8c08bf-7eb3-424b-8c1c-476edf209070.

however, all the previous instances were still running in ec2, and scalr started from scratch to allocate all new instances. this caused our main application to stop running completely. we have since recovered the situation, but would like to know what happened and how scalr is going to prevent something like this from happening again.

  1. 1 Posted by Nir Ben-Dor on 18 Jul, 2016 02:54 PM

    Nir Ben-Dor's Avatar

    Same here

  2. 2 Posted by Nir Ben-Dor on 18 Jul, 2016 02:54 PM

    Nir Ben-Dor's Avatar

    How did you guys recover?

  3. 3 Posted by dan on 18 Jul, 2016 02:55 PM

    dan's Avatar

    started all new instances and killed off the abandoned instances in ec2

  4. 4 Posted by marc on 18 Jul, 2016 04:10 PM

    marc's Avatar

    Hi all,

    We are presently investigating this issue and will follow up with more info shortly.

    Many thanks,
    Wm. Marc O'Brien
    Scalr Technical Support

  5. 5 Posted by dan on 18 Jul, 2016 05:15 PM

    dan's Avatar

    please do not change our instances. we are in the process of recovering the error and do not want any more unexpected changes to the farms.

  6. 6 Posted by marc on 18 Jul, 2016 05:39 PM

    marc's Avatar

    Hi Dan,

    Thank you for the follow up. We will circle back once this incident is fully resolved.

    Many thanks,
    Wm. Marc O'Brien
    Scalr Technical Support

  7. 7 Posted by dan on 18 Jul, 2016 05:57 PM

    dan's Avatar

    there is a single server that we would like to have restored back into its farm.

    farm 2666
    role 39216
    ec2 instance to re-insert into role: i-809f506f

    you can leave the currently running instance in the farm, we will handle it after the requested instance is re-inserted into role

  8. 8 Posted by marc on 18 Jul, 2016 06:03 PM

    marc's Avatar

    Hi Dan,

    Thank you for the info. This will be forwarded to Engineering.

    Many thanks,
    Wm. Marc O'Brien
    Scalr Technical Support

  9. 9 Posted by dan on 18 Jul, 2016 07:18 PM

    dan's Avatar

    another single server that we would like to have restored back into its farm.

    farm 2666
    role 27557
    ec2 instance to re-insert into role: i-d3814cbc

  10. 10 Posted by marc on 18 Jul, 2016 07:22 PM

    marc's Avatar

    Hi Dan,

    Thank you for the follow up. Engineering has been notified of this request as well.

    Many thanks,
    Wm. Marc O'Brien
    Scalr Technical Support

  11. Support Staff 11 Posted by Michael Lochead on 18 Jul, 2016 11:37 PM

    Michael Lochead's Avatar

    Hi Dan,

    As communicated, Scalr experienced a critical error this morning as detailed here: http://support.scalr.net/discussions/problems/26464-hosted-scalr-se...

    At this point, we have completed automated recovery of all impacted Servers. If you are experiencing any remaining impact to your service, please re-open this ticket, or create a new one and we will put you in direct contact with our engineering team to resolve the outstanding issues.

    Thank you for your patience and collaboration in helping to address this issue.  We will send a final report on the problem soon.

    Best,
    Michael

  12. Michael Lochead closed this discussion on 18 Jul, 2016 11:37 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac