Repointing vCenter PSC after PSC Node Failure

A Colleague recently asked me that was the reason in configuring two PSCs with our management environment as they are not fully redundant without a load balancer sitting in front of them in a multi PSC deployment. I explained that there is replication between both PSC’s and in the event of the primary PSC failing that with a simple CLI command that all services could be restored on the second PSC. Of course, this was all theory as I never had reason to completed that operation. But I was curious how easy that procedure was and in what scenarios could that fail over be completed successfully. So I spun up a small VC environment within my Lab with the intention of simulating a loss of a PSC.

But in researching the topic, I found that I had many more questions that I needed to answer or clear up for myself.

  • How could I tell which PSC was master
  • How can I tell replication relationships between other PSCs
  • What exactly was replicating between the PSC’s besides SSO data?
  • What is the interval period between PSC to PSC Replication

Easy one first:

Which PSC is my master PSC?  – It was pretty simple for me to tell which was the master as I had just deployed the VMs. But if you are unable to tell you simply need to check these advanced settings of your vCenter.

  • config.vpxd.sso.admin.uri
  • config.vpxd.sso.groupcheck.uri
  • config.vpxd.soo.sts.uri

sso-servicses

How can I tell how many PSCs are present and what are the replication status between my primary PSC’s and other PSC’s deployed within my environment? The /usr/lib/vmware-vmdir/bin folder is packed full of different utilities to help you out. I used the replication admin tool to get the below detail.

/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartnerstatus -h localhost -u administrator

patner

/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showservers -h localhost -u administrator

show-severs

What is the interval period between PSC to PSC Replication – The replication interval between two PSCs is 30 seconds. However, under certain conditions, this replication time can be increased in order for all PSCs to fully synchronize. Ref: KB2113115

What exactly is being replicated between the PSC’s
VMware Appliance Management Service (only in Appliance-based PSC)
VMware License Service
VMware Component Manager
VMware Identity Management Service
VMware HTTP Reverse Proxy
VMware Service Control Agent
VMware Security Token Service
VMware Common Logging Service
VMware Syslog Health Service
VMware Authentication Framework
VMware Certificate Service
VMware Directory Service

Now that some of the basics are known I simulated an outage of the Primary PSC. Needless to say, I could no longer login into vCenter. When I ran the showservers and showpartnerstatus command on the secondary, it confirmed that the server was down and no replication was taking place.

secondarySo to change the secondary PSC to the Master PSC, I ran this simple command on the VCSA appliance.

cmsso-util repoint --repoint-psc testpsc02.buildnet.local

repoint The whole process took less than 10 mins. When I successfully logged back into vCenter, and I checked the advanced setting, I could see the PSC02 was now the primary.

new-sso-services

To get back to a fully redundant solution I wanted to redeploy a new PSC to make sure I had a copy of the SSO, Certificate, and Licensing formation, etc. Before I could do that I needed to remove the stale PSC01 record from the system. It was still listed in the node section but marked as unknown.

nodes-post

To remove the node, I ran this command from the shell on PSC02.

cmsso-util unregister --node-pnid testpsc01.buildnet.local --username administrator@vsphere.local

removestalePSC.PNG

Running the showservers and showpartnersataus returned empty results as the PSC was now acting alone with no replication partner available. I then reran the install for an external PSC and opted to join and existing SSO domain. Once deployed everything looked good again.

new-nodes

postrebuild

Lastly, I again simulated another outage (I disconnected the NIC)  on the Primary PSC (Remember PSC02 is currently the master PSC). Failed over the services to the secondary node (PSC01) and then brought the failed server back online. Replication started up again with no issues.

lasttest

2 thoughts on “Repointing vCenter PSC after PSC Node Failure

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s