cluster – Networks and rants

Changing IP of HA WLC controller.

September 6, 2016 Nils Magnus Eliassen9 Comments

Today I did change IP of one of our HA Wireless Controllers. Since the company I work for got bought last year we have to change IP of our systems to fit into our new and bigger network.

Changing the IP address isn’t a big thing and you can do it without any downtime on the AP’s if you run flexconnect. If you run in local mode you will be looking at a short downtime. If you want to do it with the minimal of downtime you need a third controller that can host your AP’s while the HA cluster is down.

If you have a third controller that can host your AP’s you have to make sure that the mobility groups are configured and working to your HA. You can check this in the following menu: Controller -> Mobility Management -> Mobility Groups. In this menu all your mobility groups are listed. If the mobility group towards the controller is Up you should see it on the right side of the page. You also have to check this on the HA controller.
MobilityGroupCheck

If the mobility group is up and running then the next thing is to change primary controller for the AP’s. This is a very easy task but it’s time consuming if you don’t have Cisco Prime ( I got that luckily). From Cisco Prime you can just send out a template to all the AP’s and make them move to another primary controller. If you want to do it manually you can do it to. Then you have to first open an AP and choose High Availability. Then you configure the third controller as Primary. Within the next few minutes all the AP’s should be moved to a new controller.
HASettingAP

When there is no connected AP’s left we can start the work to re-IP the HA controller. The first thing we have to do is to break the HA cluster. We are not able to change the IP without breaking the cluster. When breaking the cluster there will also be a restart, so if you don’t have another controller for your AP’s, be ready for some downtime!

To disable the cluster you go to Controller -> Redundancy -> Global Configuration. In the lower part of the page you have the option to Disable or Enable the cluster. Set the drop down to Disabled and press Apply in the top right corner. The controller will then ask you if you are sure about breaking the cluster and that the controller will restart. Accept this and wait for a few minutes.
DisableCluster

The WLC will after a few minutes boot up again on the same IP address as before. Then you should go to the Interface menu to change the management IP address.

Change the IP Address, Netmask and Gateway to the new values and press Apply. You will now loose connection and need to connect on the new IP’s It’s very important to enter the correct IP’s so you don’t loose contact (or you could use the integrated service port if you have a 550x).
ManagementInterface

The next interface you need to change is the redundancy management IP address. This IP should be in the same subnet as the management IP. So unless you change the IP to something in the same subnet as your previous IP you need to change this IP also. This IP also needs to match the Redundancy mgmt IP in Controller -> Redundancy -> Global Configuration.

The last thing you need to do o this controller is go back to Controller -> Redundancy -> Global Configuration and change the IP’s for the Redundancy mgmt IPs and enable the cluster again.
2016-09-06_22-53-56

You should now be finished With the first Controller. The IP for the WLC HA is now active and if you want to move the AP’s to the New Controller you can do that now. You shoudl be able to Connect to the remaining Controller on the old management IP address. You should repeat the steps for changing the Redundancy mgmt IP and Redundancy port IP. When this is done you only need to enable the cluster on this Controller also and the HA should be working again as before.

When you have enable HA on the second Controller you can go to Monitor -> Redundancy -> Summary. There you will be able to see if the HA cluster is running successfully again.
2016-09-06_23-28-44

Your cluster should now be working correctly. If you got questions or feedback please leave a comment!

Secondary node locked when commiting

April 10, 2016April 10, 2016 Nils Magnus EliassenLeave a comment

The other day I got a problem with one of my SRX clusters when I was running a commit. The commit was not able to complete and I got the following error:

{primary:node0}[edit]
srx1400# commit
node1:
error: configuration database modified
node0:
error: remote lock-configuration failed on node1

The reason for this error is some uncommited configuration on the secondary node. Earlier the same day I changed the primary for redundancy-group 0 and I guess that I didn’t commit all the config on node1 before changing to node0.

To solve this I had to go into the secondary node (node1) and rollback the uncommitted configuration. Normally you can use OOB to connect to the secondary node but I dont have it at this location. So I have to connect to the secondary node trough the primary node. This is done with the following command on branch devices (SRX650 and below): request routing-engine login node 1
On High end devices like the one I’m working on (SRX1400 and above) you use: rlogin -T node1

{secondary:node1}% rlogin -T node1
root@srx1400>
--- JUNOS 11.4R9.4 built 2013-08-22 06:24:21 UTC
{secondary:node1}
root@srx1400> configure
warning: Clustering enabled; using private edit
error: shared configuration database modified

Please temporarily use 'configure shared' to commit
outstanding changes in the shared database, exit,
and return to configuration mode using 'configure'

As you can see from the error I have to use configure shared to be able to edit the configuration.

root@srx1400> configure shared
Entering configuration mode
The configuration has been changed but not committed

Before entering the rollback command you can check the uncommitted configuration by running show | compare. This will display all the uncommited configuration

{secondary:node1}[edit]
root@srx1400# show | compare
[edit access profile unos clientjunos]
- pap-password "$9$2V4GDikP5T3fTrvLXwsz36C0B"; ## SECRET-DATA
+ pap-password "$9$jhHP5QF/CA09AxdsYGUp0BRyl"; ## SECRET-DATA

Now you can rollback the uncommited config, check that there is any uncommited config left and exit the configuration mode.

{secondary:node1}[edit]
root@rx1400# rollback
load complete

{secondary:node1}[edit]
root@srx1400# show | compare

{secondary:node1}[edit]
root@srx1400# exit
Exiting configuration mode

{secondary:node1}
root@srx1400>

Now you can close the session and try to commit the configuration from the primary node again. It worked for me! 🙂

As a note I also know that alot of people has had a success of using just the command commit synchronize force on the primary node but it does not work for everyone.

Could not connect to node1 : No route to host

November 25, 2015December 9, 2015 Nils Magnus EliassenLeave a comment

Today I had some issues when working on a SRX650. We had to replace the Services and Routing Engine a few days ago. When I was supposed to get the cluster back online I got the following error message when trying to run a few of the commands on the device:

Could not connect to node1 : No route to host

I got this error when typing show interface ge-0/0/2. I also entered the command on the node1 so I felt it was a bit strange that node1 could not connect to node1.

The firewall was also saying that it was in a hold mode

{hold:node1}

So it was not showing as secondary or primary. It was keeping this status all the time and didn’t try to go to any other modes while the issue was occuring.

The reason for my issues was that I had not deleted all the default config from the new Service and Routing engine card that we got. My config was not correct for all the cluster ports since some of the ports in the cluster is dedicated to cluster services (on the SRX650 it is ge-0/0/0 (fxp0) and ge-0/0/0 (control plane)). These ports are not to be configured as network ports and that is the reason for my issues. When I deleted the config and set a default root authentication password everything was connected. When I did a commit from the primary node the config was correct on both devices and everything connected succesfully.

During my search on the internet I read that some people also forgot to set the reth-count and got the same error. The command to set the number of reth interfaces is:

set chassis cluster reth-count 4

A great source for more information is the following chapter of the book “Juniper SRX Series” written by Brad Woodberg and Rob Cameron.

http://chimera.labs.oreilly.com/books/1234000001633/ch07.html#activating_juniper_services_redundancy