MySql won't start

Jim Hill's Avatar

Jim Hill

13 Sep, 2016 06:49 AM

For maybe the 8th time now mysql failed to do backups anymore so I terminated the slave but another won't start. Currently I'm running with no DB slave in our production environment. Please help.

  1. Support Staff 1 Posted by Marat Komarov on 13 Sep, 2016 08:24 AM

    Marat Komarov's Avatar

    Hello,

    Database backups are failing, because another database backup is running in a parallel. Backup requires exclusive access to database.

    As for Slave no starting up, please launch new Slave server and increase HostUp timeout - we'll investigate.

    Regards,
    Marat

  2. 2 Posted by Jim Hill on 13 Sep, 2016 04:57 PM

    Jim Hill's Avatar

    Once it hangs it will never again do an incremental or dump backup until I terminate the slave and start a new one. This had happened close to double digit times.

    Where do i increase the hostuup? There's no orchestration for this role.

  3. 3 Posted by Jim Hill on 13 Sep, 2016 05:10 PM

    Jim Hill's Avatar

    I set the hostup termination to 10000 seconds. You should be able to get in there now.

  4. 4 Posted by Jim Hill on 13 Sep, 2016 05:45 PM

    Jim Hill's Avatar

    The server had been up failed for 35 minutes now. Please try to investgate before it times out.

  5. Support Staff 5 Posted by Marat Komarov on 13 Sep, 2016 05:56 PM

    Marat Komarov's Avatar

    The cause of the Slave failure: MySQL data volume filesystem is corrupted.

    root@ec2-54-153-24-22:/mnt/dbstorage/mysql-data/recon_truncated# ls -la
    ls: cannot access tag_sku.frm: Input/output error
    ls: cannot access ProductInventory.MYD: Input/output error
    ls: cannot access ProductInventory.frm: Input/output error
    ls: cannot access tag.frm: Input/output error
    ls: cannot access sku_image.frm: Input/output error
    ls: cannot access note.frm: Input/output error
    ls: cannot access gift_card.frm: Input/output error
    ls: cannot access tag_group.frm: Input/output error
    ls: cannot access hierarchy_user.frm: Input/output error
    ls: cannot access hierarchy_department.frm: Input/output error
    ls: cannot access UserOrderApprover.frm: Input/output error
    ls: cannot access ProductInventory.MYI: Input/output error
    

    dmesg

    [  366.081281] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909318, block=3637250
    [  366.631419] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909324, block=3637250
    [  366.633364] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909322, block=3637250
    [  366.634806] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909320, block=3637250
    [  366.635935] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909321, block=3637250
    [  366.637412] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909316, block=3637250
    [  366.638983] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909317, block=3637250
    [  366.641936] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909319, block=3637250
    [  366.643761] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909314, block=3637250
    [  366.645601] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909313, block=3637250
    [  366.646794] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909315, block=3637250
    [  366.648035] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909323, block=3637250
    [ 1684.878756] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909318, block=3637250
    [ 1685.470181] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909324, block=3637250
    [ 1685.475694] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909322, block=3637250
    [ 1685.480629] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909320, block=3637250
    [ 1685.483409] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909321, block=3637250
    [ 1685.485718] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909316, block=3637250
    [ 1685.491023] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909317, block=3637250
    [ 1685.495427] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909319, block=3637250
    [ 1685.502001] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909314, block=3637250
    [ 1685.505454] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909313, block=3637250
    [ 1685.510330] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909315, block=3637250
    [ 1685.515325] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=909323, block=3637250
    

    You need recovery tools for Ext3. And after you'll get it working - create new data bundle and launch new Slave.

    The alternate recovery scenario is launch new MySQL farm role and restore database from latest backup.

    Regards,
    Marat

  6. 6 Posted by Jim Hill on 13 Sep, 2016 07:20 PM

    Jim Hill's Avatar

    The reason we use scalr is that its a automated solution. I don't lonow how to do the things you';re asking and we cannot afford any more downtime in our production environment

  7. Support Staff 7 Posted by Igor Savchenko on 13 Sep, 2016 07:26 PM

    Igor Savchenko's Avatar

    Hi Jim,

    In a current case, you have a corrupted file system and we have very little control over this because snapshots and EBS volumes are handled by Amazon. We've never seen such thing before, when EBS from a snapshot has a corrupted FS, so most likely it's something with an original volume (from where a snapshot was created), but we have NO control over snapshot creation process. There is nothing we possibly can automate here.

    My advice would be to create a new Farm with a fresh volumes and import data to this new farm. If you cannot, don't want to do this, you should try to check FS on a master machine (run fsck) - but this will create some downtime. And when fsck will fix FS, create a new data bundle (snapshot) and then start a new slave.

    Regards,
    Igor

  8. 8 Posted by Jim Hill on 13 Sep, 2016 08:33 PM

    Jim Hill's Avatar

    the master is successfully doing snapshots and dumps. Why can't we just generate a new slave that will replicate for the healthy master?

  9. Support Staff 9 Posted by Igor Savchenko on 13 Sep, 2016 08:41 PM

    Igor Savchenko's Avatar

    EBS snapshots are not successful. Snapshots created from master db volume have corrupted FS on them, so every time Scalr trying to create a new volume for a slave from the snapshot that was taken on the master, it fails due to corrupted FS.

    Hope that makes sense.

  10. 10 Posted by Jim Hill on 13 Sep, 2016 08:48 PM

    Jim Hill's Avatar

    it's showing green

  11. Support Staff 11 Posted by Igor Savchenko on 13 Sep, 2016 08:55 PM

    Igor Savchenko's Avatar

    Snapshot API call and operation is successful, but a content of the snapshot is broken - and it's impossible to validate this via AWS API. You can try to create one more data bundle (Snapshot) and try to start slave, but if it fails, then I would recommend checking FS on the master volume.

    Regards,
    Igor

  12. 12 Posted by Jim Hill on 13 Sep, 2016 09:12 PM

    Jim Hill's Avatar

    I don't know how to do that

  13. Support Staff 13 Posted by Igor Savchenko on 13 Sep, 2016 09:35 PM

    Igor Savchenko's Avatar

    Let me check with engineering how we can help in this situation. Will get back to you with some options ASAP.

  14. 14 Posted by Jim Hill on 13 Sep, 2016 09:38 PM

    Jim Hill's Avatar

    Thanks for your help

  15. Support Staff 15 Posted by Marat Komarov on 14 Sep, 2016 05:00 PM

    Marat Komarov's Avatar

    We've corrected RAID on Master, created new data bundle, and launched new Slave.

    From now Slaves should start normally.

    Regards,
    Marat

  16. 16 Posted by Jim Hill on 14 Sep, 2016 05:46 PM

    Jim Hill's Avatar

    Awesome! Thanks so much!

  17. 17 Posted by marc on 16 Sep, 2016 07:26 PM

    marc's Avatar

    Hi Jim,

    Touching base on this ticket to confirm if you are ready to close. Let us know if any issues persist or if you have any questions.

    Many thanks,
    Wm. Marc O'Brien
    Scalr Technical Support

  18. marc closed this discussion on 30 Sep, 2016 07:38 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac