Home > Tintri VMstore™ > Knowledge Base > Internal Heartbeat Timeouts Cause a Controller Failover

Internal Heartbeat Timeouts Cause a Controller Failover

Applies To

 

Product(s): T540

Product Version(s): Tintri OS v1.4, OS v2.0.2.1 (and earlier).

Symptoms

 

Under certain conditions, the heart beat messages between the primary and secondary controllers may time out causing the secondary controller to incorrectly initiate a failover.

 

  • The standby controller takes over and the previously active controller is rebooted unexpectedly.
  • Example of an email alert triggered during controller failover.
LOG-HAMON-0042: Controller X is taking over.

 

  • Example of entries found in the support.log file.
HAMon[3654]: LOG-THRIFTCLIENT-0002: Thrift transport exception type #1 (open() timed out); failed to establish connection [addr=tt-peer-controller./port=30300]. 
HAMon[3654]: LOG-HAMONDISKFENCE-0005: Disk fencing unable to ping node (current seqNum=7696581). 
HAMon[3654]: LOG-HAMONDISKFENCE-0001: Shooting node #1... 
HAMon[3654]: LOG-HAMONPLATFORMAPPLIANCE-0011: Node0 power cycling peer!

Resolution

 

This issue is fixed in Tintri OS v2.0.2.2 and later. Upgrade to Tintri OS v2.0.2.2 or later at your earliest convenience.

Viewing 2 of 2 comments: view all
T445 is a single controller model so failover won't happen?
Posted 05:01, 17 Sep 2014
Good catch, thank you Takizawa-San. I have corrected this.
Posted 08:50, 20 Nov 2014
Viewing 2 of 2 comments: view all
You must to post a comment.
Last modified
10:54, 22 Apr 2017

Tags

Classifications

This page has no classifications.