Limit device traffic to only one MX uplink

Hi all

The Meraki MX devices gives you an easy way of automaticly use 2 uplinks. It works seamlessly but it’s hard to do some configuration that is possible on other Cisco devices.

One of those is to deny specific devices to connect over only 1 of the uplinks. Let’s say that WAN 1 is a fiber connection. You got enogh capacity to send and receive all kind of traffic. WAN 2 on the other hand is a sattelite connection. The 2 big drawbacks with sattelite is latency and speed. Sometimes even the cost per MB transferred. Often the guaranteed bandwith on a satelite connection could be as low as 64 kb/s. It’s not much bandwith for other devices then.
wanmeraki

Then the big question is, how do you limit the connection to only use WAN 1. This could be a device that sync data every hour and would generate traffic or a whole subnet with guest wifi users. You don’t want these devices use your costly satelite connection. You do most likely need it for business critical applications.

I spent a few good hours trying to find a solution to this. I asked help from Meraki and various forums. I always got told that traffic shaping should help etc. But the only thing it does is giving me the preferred uplink, it never blocks the traffic from going online if the other WAN connection is down.

The solution I came up with was to turn off NAT when you use the interface that should be blocked. All the devices behind the Meraki that should be blocked does need to be in a seperate VLAN. In version 15 you can exclude specific VLAN from the NAT policy on the uplinks. The traffic will then stop since there is no return route for the traffic (as long as youo don’t add a static route).

So in the above example we want VLAN 20 to only have access over WAN 1 and not WAN 2. You start by finding the network you want to do the change on in the Meraki Dashboard. Then go to Security & SD-WAN and Adressing and VLAN’s on the left side. In the bottom of the page you have NAT exceptions where you can choose to disable NAT on the different uplinks. In my screenshot I have excepted Crew network from the NAT policy. With this config the devices on Crew VLAN can’t use WAN 2/Uplink 2.
NAT meraki

Advertisement

Backup and restore config of Mobility Express.

Hi all

Lately I have been working with the mobility express AP’s from Cisco.  One of the important things to do when you set up new equipment is to have a backup and restore policy for the config.. I chose the easy way out using tftp, it’s the quickest and easiest way to transfer files as long as you have the tftp server secured. The other option you have is ftp.

transfer upload mode tftp
Sets the mode to tftp, you can also choose ftp but then you need to add in username and password too.

transfer upload datatype config
Choose config as the information to store on the server

transfer encrypt enable
Turns on encryption for the file

transfer encrypt set-key supersecret
Gives the encryption a password

transfer upload serverip 10.10.10.10
Gives the ME an IP to the server where to store the config

transfer upload filename MEconfig.cfg
Filename for the config.

transfer upload start
Start the upload.

transfer upload mode tftp
transfer upload datatype config
transfer encrypt enable
transfer encrypt set-key supersecret
transfer upload serverip 10.10.10.10
transfer upload filename MEconfig.cfg
transfer upload start

You should then get the following output.

Mode……………………………………… TFTP
TFTP Server IP…………………………….. 10.10.10.10
TFTP Path………………………………….
TFTP Filename……………………………… MEconfig.cfg
Data Type…………………………………. Config File
Encryption………………………………… Enabled

Are you sure you want to start? (y/N) y

File transfer operation completed successfully.

So far you have done the backup. Then the second most important thing comes, do the restore. It’s almost the same, but you swap out upload with download.

transfer download datatype config
transfer download mode tftp
transfer encrypt enable
transfer encrypt set-key supersecret
transfer download serverip 10.10.10.10
transfer download filename MEconfig.cfg
transfer download start

After the commands have been entered you should see the following output.

Mode............................................. TFTP
Data Type........................................ Config
TFTP Server IP................................... 10.10.10.10
TFTP Packet Timeout.............................. 6
TFTP Max Retries................................. 10
TFTP Path........................................
TFTP Filename.................................... MEconfig.cfg
Encrypt/Decrypt Flag............................. Enabled

Warning: Downloading configuration will cause the controller to reset...

This may take some time.
Are you sure you want to start? (y/N) y

TFTP Config transfer starting.

TFTP receive complete... updating configuration.

CCO Username & Password will NOT be imported. Please Re-Configure the Credentials 'transfer download ap-images cco-username '
'transfer download ap-images cco-password ' after bootup for Image Download

TFTP receive complete... storing in flash.

Sync config to peers.

System being reset.

 

Using python and telnet

So here we go with my first test of python. After reading a few blogs on python  and watched some videos I created my first script that actually does something on a switch. It’s not actually super useful but it’s something!

I got a Ubuntu machine that I connect to using SSH. On this computer I have used nano since it’s the only default editing tool I know how to use on a linux device (I really hate vi for text editing). When I get a little bit further along I’m going to set up my notepad++ client to automaticly upload my scripts since I like a little bit better to do text configs on my windows.

I’ll try to go trough the script almost line by line. I found the example in the python website and a youtube video.

import getpass 
import sys 
import telnetlib

The first part is importing moules to make the programming easier. In short terms it saves me alot of time making my own way of using the telnet protocol.

host = "10.10.10.30" 
user = raw_input("Username: ") 
password = getpass.getpass()

The second part is handeling the connection to the device. The first line is creating a variable called host. This is the IP or dns name for the device you are connecting to.
Second line is creating the variable called user. The information does it get using an input when you run the command. You can see when to input the information when Username: is displayed.
The last line in this section is creating the variable password. This uses the imported module password to not display the text when entered and hides it for us.

tn = telnetlib.Telnet(host)

This part is telling the python script to connect to the device with the IP address in the previous section. You can see the variable is with red  text.

tn.read_until("Username: ") tn.write(user + "\n") 
if password: 
    tn.read_until("Password: ") 
    tn.write(password + "\n")

Now to the login part of the script. It first skips the MOTD or whatever is shown before the login prompt. The script is continiuing until it sees Username:
When it reachs Username: it will enter the user variable (marked by red)  that you enter in the previous section. This is ended by a \n to signal that the script should press enter. The script will then read until Password: shows up and ad the variable password ended with a \n. Pretty much the same as user

tn.write("enable\n") 
tn.write("Cisco\n") 
tn.write("conf t\n") 
tn.write("vlan 20\n") 
tn.write("name guest\n") 
tn.write("vlan 100\n") 
tn.write("name production\n") 
tn.write("end\n") 
tn.write("exit\n")

This part should be familiear to most cisco engineers. You can see the different commands in each line ended by \n to simulate the press of the enter key. It basicly sends out what you type in the command window.

print tn.read_all()

In the end it reads everything out that has been sent using the telnet session.

The complete script will then be this:

import getpass 
import sys 
import telnetlib 
host = "10.10.10.30" 
user = raw_input("Username: ") 
password = getpass.getpass() 
tn = telnetlib.Telnet(host) 
tn.read_until("Username: ") 
tn.write(user + "\n") 
if password: 
  tn.read_until("Password: ") 
  tn.write(password + "\n") 
tn.write("enable\n") 
tn.write("Cisco\n") 
tn.write("conf t\n") 
tn.write("vlan 20\n") 
tn.write("name guest\n") 
tn.write("vlan 100\n") 
tn.write("name production\n") 
tn.write("end\n") 
tn.write("exit\n") print tn.read_all()

I have also attached a screenshot from the Linux server when I’m running the script

telnetcreatevlan

Cisco and python programming

I have decided I want to try to program cisco switches and devices using python. At the moment my programming skills are limited to simple if and else. I now have a plan to configure a linux server with python on to do all my scripting.

The plan is to try to upload the different scripts I create here as I go along. Hopefully it will get useful for others too in the end and not just something for me. I will try to use various youtube videos and pages to learn me the diffferent things with oython and will link them from my upcoming posts as I go along.

If you have something you want me to write about or create a script it’s greatly apreciated!

Rebooting a switch in a stack

During some recent switch replacement work I did I noticed not all my stacks had the correct IOS version, or wai. The correct thing to say would be that one of the switches did not have the correct IOS version. The resone for this was that I upgraded to the correct IOS before I created the stack and then connected the second switch to the stack. When the second switch got connected the stack was left with 2 IOS versions.

To solve this issue I used the archive download-sw command to download only the new OS to the switch. To do this I first run show version to know the stack number of the switch.
iosshversion

From the show version I could get the stack number of the switch that needs the IOS upgrade. Be aware that the screenshot is showing the IOS version the same on all switches, so there is no difference in the screenshot. In my blog post I wanted to upgrade switch 2.

archive download-sw /destination-system 2 tftp://1.1.1.1/IOS.tar

To complete the upgrade and not to reboot the whole switch you enter the command

reboot slot 2

This will only reboot the switch that has the stack number specified.

Changing IP of HA WLC controller.

Today I did change IP of one of our HA Wireless Controllers. Since the company I work for got bought last year we have to change IP of our systems to fit into our new and bigger network.

Changing the IP address isn’t a big thing and you can do it without any downtime on the AP’s if you run flexconnect. If you run in local mode you will be looking at a short downtime. If you want to do it with the minimal of downtime you need a third controller that can host your AP’s while the HA cluster is down.

If you have a third controller that can host your AP’s you have to make sure that the mobility groups are configured and working to your HA. You can check this in the following menu: Controller -> Mobility Management -> Mobility Groups. In this menu all your mobility groups are listed. If the mobility group towards the controller is Up you should see it on the right side of the page. You also have to check this on the HA controller.
MobilityGroupCheck

If the mobility group is up and running then the next thing is to change primary controller for the AP’s. This is a very easy task but it’s time consuming if you don’t have Cisco Prime ( I got that luckily). From Cisco Prime you can just send out a template to all the AP’s and make them move to another primary controller. If you want to do it manually you can do it to. Then you have to first open an AP and choose High Availability. Then you configure the third controller as Primary. Within the next few minutes all the AP’s should be moved to a new controller.
HASettingAP

When there is no connected AP’s left we can start the work to re-IP the HA controller. The first thing we have to do is to break the HA cluster. We are not able to change the IP without breaking the cluster. When breaking the cluster there will also be a restart, so if you don’t have another controller for your AP’s, be ready for some downtime!

To disable the cluster you go to  Controller -> Redundancy -> Global Configuration. In the lower part of the page you have the option to Disable or Enable the cluster. Set the drop down to Disabled and press Apply in the top right corner. The controller will then ask you if you are sure about breaking the cluster and that the controller will restart. Accept this and wait for a few minutes.
DisableCluster

The WLC will after a few minutes boot up again on the same IP address as before. Then you should go to the Interface menu to change the management IP address.
InterfaceOverview1.jpg

Change the IP Address, Netmask and Gateway to the new values and press Apply. You will now loose connection and need to connect on the new IP’s It’s very important to enter the correct IP’s so you don’t loose contact (or you could use the integrated service port if you have a 550x).
ManagementInterface

The next interface you need to change is the redundancy management IP address. This IP should be in the same subnet as the management IP. So unless you change the IP to something in the same subnet as your previous IP you need to change this IP also. This IP also needs to match the Redundancy mgmt IP in Controller -> Redundancy -> Global Configuration.
2016-08-17_09-23-42.jpg

The last thing you need to do o this controller is go back to Controller -> Redundancy -> Global Configuration and change the IP’s for the Redundancy mgmt IPs and enable the cluster again.
2016-09-06_22-53-56

You should now be finished With the first Controller. The IP for the WLC HA is now active and if you want to move the AP’s to the New Controller you can do that now. You shoudl be able to Connect to the remaining Controller on the old management IP address. You should repeat the steps for changing the Redundancy mgmt IP and Redundancy port IP. When this is done you only need to enable the cluster on this Controller also and the HA should be working again as before.

When you have enable HA on the second Controller you can go to Monitor -> Redundancy -> Summary. There you will be able to see if the HA cluster is running successfully again.
2016-09-06_23-28-44

Your cluster should now be working correctly. If you got questions or feedback please leave a comment!

Configuring a VPN tunnel from a VRF

In the company where I work we deliver some of our product using boats. Since most of our customers are in remote locations we use a supplier that have good coverage in those locations. The issue then becomes that the same supplier has a high cost on the bandwidth and they don’t have a good coverage in the areas where our factories are. To reduce cost and ensure good coverage close to our factories we have a wireless network that the boats connect to when they arrive. I have added a picture with a simple diagram showing the solution.
Boat network

At the moment we have Juniper SSG550M in a central location as our VPN hub. We have just recently started to buy Cisco routers instead of Juniper firewalls for the boats. So I had to configure the Cisco routers so they would automatically switch between the 2 connections and always try to choose our wireless connection first (the connection close to our factories).

I did this with the help of BGP and gave the expensive connection more AS path compared to our wireless connection at the factories. The VPN is a VTI/Routing based tunnel.

I will first start with the configuration of the Cisco router. In the first section here I am configuring the settings of the VPN tunnel:

crypto isakmp policy 10
encr aes 256
authentication pre-share
group 14
lifetime 3600
crypto isakmp invalid-spi-recovery
crypto isakmp keepalive 10
!
crypto ipsec transform-set aes256-sha esp-aes 256 esp-sha-hmac
mode tunnel
!
!
crypto ipsec profile boat-vpn
set transform-set aes256-sha
set pfs group14

With all the options set I can build the tunnel itself. The first tunnel is the one wireless in the factories. I have put the connection into a separate vrf to avoid conflicts between the two connections. I also want all the internet traffic to go over the “expensive connection”. Since the boats visit different factories I only have a dynamic IP at the boats. Every time they arrive at a factory they will receive a new IP, so the tunnel is configured with aggressive mode and identified by the fqdn name. Also remember to use another password than supersecret 🙂

ip vrf factorywireless
crypto isakmp peer address 192.168.2.1 vrf factorywireless
set aggressive-mode password supersecret
set aggressive-mode client-endpoint fqdn boat.example.com

For the second and primary connection I will use the default router instance. This is the connection that will have coverage most of the time and is where the internet traffic will be running . This connection is also using aggressive mode.

crypto isakmp peer address 8.8.8.8
set aggressive-mode password supersecret
set aggressive-mode client-endpoint fqdn boat-dialup.example.com

The interfaces for the tunnel are configured pretty straight forward as a normal VTI interfaces. The only difference is that the tunnel that connects from the factorywireless vrf has a line about that.

interface Tunnel1
description Tunnel over ICE
ip address 10.0.1.6 255.255.255.252
tunnel source FastEthernet4
tunnel mode ipsec ipv4
tunnel destination 8.8.8.8
tunnel protection ipsec profile boat-vpn
!
interface Tunnel105
description Tunnel over Wireless at factories
ip address 10.0.1.2 255.255.255.252
tunnel source Vlan110
tunnel mode ipsec ipv4
tunnel vrf factorywireless
tunnel destination 192.168.2.1
tunnel protection ipsec profile boat-vpn

The last thing we need to do on the Cisco router is to configure the BGP. This is to make sure the traffic is routed on the correct path. You can see that I have added route map prepend-internet  where I have configured 4 extra prepends to the AS path. I only configure the AS path on an outgoing basis so you will see the same amount of prepends on the Netscreen. The prepend is only configured on the traffic going over the expensive internet connection.

router bgp 64501
bgp log-neighbor-changes
network 10.2.1.0 mask 255.255.255.192
neighbor 10.0.111.1 remote-as 64590
neighbor 10.0.111.1 route-map prepend-internet out
neighbor 10.0.111.5 remote-as 64500
!
route-map prepend-internet permit 10
 set as-path prepend 64501 64501 64501 64501

 

 

That completes the configuration of the Cisco router. We will now start on the configuration on the SSG550M. I will start with the configuration of the VPN proposal. It’s important that these match the Cisco device that we tested with before.

 set ike p1-proposal "vpn-boats-phase1" preshare group14 esp aes256 sha-1 second 3600
 set ike p2-proposal vpn-boats-phase-2 group14 esp aes256 sha-1 second 3600 

Then we will create the connection for the VPN tunnels. We will start on the factory wireless connection. Since we never know what IP address the tunnel is coming from this will be an aggressive tunnel. Remember to type the fqdn name for the connection correct in the first line and choose the correct interface. The interface that you bind the connection to is also important to remember since you will create it in the next section.

set ike gateway "vpn-boats-fb4" address 0.0.0.0 id "boat.example.com" Aggr outgoing-interface "redundant1" preshare "supersecret" proposal "vpn-boats-phase1"
 set vpn vpn-boats gateway vpn-boats replay proposal vpn-boats-phase-2 
 set vpn vpn-boats bind interface tunnel.1
 set vpn vpn-boats monitor optimized rekey

The second connection is almost the same but it contains NAT traversal and is using another incoming interface. The NAT traversal is enabled since I don’t get a public IP on the boat towards the internet.

 set ike gateway "vpn-boats-cellular" address 0.0.0.0 id "boat-dialup.example.com" Aggr outgoing-interface "redundant2" preshare "supersecret" proposal "vpn-boats-phase1"
 set ike gateway vpn-boats-cellular nat-traversal
 set vpn vpn-boats-cellular gateway vpn-boats-cellular replay proposal vpn-boats-phase-2 
 set vpn vpn-boats-cellular bind interface tunnel.2
 set vpn vpn-boats-cellular monitor optimized rekey
 unset vpn vpn-boats-cellular dscp-mark 

The last thing needed before getting the connection up on the VPN tunnel is creating the tunnel interfaces.Remember to choose the address that you are peering with on the BGP and the tunnel number you did bind in the previous section

 set interface tunnel.1 zone vpn-boats
 set interface tunnel.1 ip 10.0.111.1/30
set interface tunnel.1 protocol bgp
set interface tunnel.1 protocol ping
 set interface tunnel.2 zone vpn-boats
set interface tunnel.2 ip 10.0.111.5/30
set interface tunnel.2 protocol bgp
set interface tunnel.2 protocol ping

Now your tunnel should be UP and running and you can do a ping test to verify the connection between them. We will now start on the final part that is the BGP configuration. I am expecting that the BGP config on the device itself is done when writing this so I wont include all the BGP configuration. Only the important part 🙂

I’m beginning with creating the route-map to prepend the traffic over the VPN. The route map will be named internet-prepend. The AS number on the local router is 64500.

set vrouter trust-vr
 set route-map name internet-prepend permit 1
 set match ip 20 10
 set as-path 12
 exit
set protocol bgp 64500
 set as-path-access-list 12 permit "64500 64500"

Then I will start configuring the neighbor connections. The first will be the BGP going over the internet and is having the prepend enabled. The rest of the configuration is straight forward.

set neighbor 10.0.111.6 remote-as 64501 local-ip 10.0.111.5/30
set neighbor 10.0.111.6 activate
set neighbor 10.0.111.6 force-reconnect
set neighbor 10.0.111.6 nhself-enable
set neighbor 10.0.111.6 reject-default-route
set neighbor 10.0.111.6 enable
set neighbor 10.0.111.6 route-map internet-prepend out

Then it’s the last BGP connection. It’s almost the same as the previous one except for the prepend.

set neighbor 10.0.111.2 remote-as 64501 local-ip 10.0.111.1/30
set neighbor 10.0.111.2 activate
set neighbor 10.0.111.2 force-reconnect
set neighbor 10.0.111.2 nhself-enable
set neighbor 10.0.111.2 reject-default-route
set neighbor 10.0.111.2 enable

That is all. If you have any questions or comments you can leave one in the comments section below.

Downgrading from Lightweight AP to Autonomous AP

Hi all

Today I did something new to me. I did a downgrade from a lightweight AP to an standalone AP remotely. I have previously used the old method with the console cable and TFTP server at a local network (until yesterday I only knew about this method). Today I did not have that possibilty and I then took advantage of the following command:

config ap tftp-downgrade tftp-server-ip-address filename access-point-name

You enter the command into the controller and it then starts the downgrade. But there is one thing that bothers me, you can’t monitor the process. My AP’s are located on boats with an controller in the datacenter. They have been running flexconnect but I wan’t to turn them into standalone AP’s since that works better for the solution I have on the boats. The problems is that the boats has low bandwidth and they loose the connection  from time to time. For the TFTP downgrade it seemed fine with a 30 sec downtime for the transfer but I got some issues when it was above that. So how do I monitor the downgrade of the AP?

The solution was to log into the AP by SSH and check that the AP contained an upgrade folder in the flash. The command to do this is:

dir flash:

If the folder upgrade shows up in the list it should be working on an upgrade, normally if the upgrade fails the folder is autmaticly deleted. But I have seen this does not always happen. You could the check the files inside the upgrade folder and look for a change in the filesize. If you type “dir flash:update/<version-folder>” you should see the filesize of normally the last file change every second. I have added an example below where you can see the filesize of  8005.img is 627200.

boat-wl-01#dir flash:update/ap1g2-k9w7-mx.153-3.JC
Directory of flash:update/ap1g2-k9w7-mx.153-3.JC/

22 -rwx 123464 May 12 2016 10:14:52 +00:00 ap1g2-k9w7-mx.153-3.JC
 24 drwx 64 May 12 2016 10:14:52 +00:00 html
 253 -rwx 9029888 May 12 2016 10:45:05 +00:00 ap1g2-k9w7-xx.153-3.JC
 254 -rwx 627200 May 12 2016 10:46:58 +00:00 8005.img

31808000 bytes total (8753152 bytes free)

If you look at the text below you can also see that the size is changing for the file 8005.img. In the example below the filesize is 833536.

boat-wl-01#dir flash:update/ap1g2-k9w7-mx.153-3.JC
Directory of flash:update/ap1g2-k9w7-mx.153-3.JC/

22 -rwx 123464 May 12 2016 10:14:52 +00:00 ap1g2-k9w7-mx.153-3.JC
 24 drwx 64 May 12 2016 10:14:52 +00:00 html
 253 -rwx 9029888 May 12 2016 10:45:05 +00:00 ap1g2-k9w7-xx.153-3.JC
 254 -rwx 833536 May 12 2016 10:47:39 +00:00 8005.img

31808000 bytes total (8546816 bytes free)

 

Examples of issues I got when testing other ways of downgrading:

I tried to do the downgrade directly from the AP while downgrading from the WLC at the same time ( I thought the downgrade from the WLC failed). It then gave me the following error:

boat-wl-01#archive download-sw tftp://172.17.76.231/ap1g2-k9w7-tar.153-3.JC.tar
Unable to create temp dir "flash:/update"
Download image failed, notify controller!!! From:8.0.121.0 to 8.2.100.0, FailureCode:7

Obviously that won’t work but the command itself should work, but I liked more doing it from the controller. I just found it easier that way.

If you try several times to downgrade from the controller you will see the following message in the event log of the lightweight AP:

*May 12 10:18:09.351: lwapp_image_proc: encounter flash problem, retry here
*May 12 10:18:09.351: lwapp_image_proc: encounter flash problem, retry here
*May 12 10:18:09.351: lwapp_image_proc: encounter flash problem, retry here

 

Problems with NSM after schema upgrade.

The other day we upgraded the schema on our NSM server from 327 to 329. After the upgrade the devices was not able to connect to our NSM anymore. In the deviceDeamon I got the following error:

[Notice] [3078149840-connectionMgr.c:2329] SSH Protocol is not enabled -- DeviceBroker is not ready for incoming device connection.
[Notice] [3078149840-connectionMgr.c:2318] Incoming TCP connection from SSH, device ip x.x.x.x
[Notice] [3078149840-connectionMgr.c:2329] SSH Protocol is not enabled -- DeviceBroker is not ready for incoming device connection.
[Notice] [3078149840-connectionMgr.c:2318] Incoming TCP connection from SSH, device ip x.x.x.x
[Notice] [3078149840-connectionMgr.c:2329] SSH Protocol is not enabled -- DeviceBroker is not ready for incoming device connection.
[Notice] [3078149840-connectionMgr.c:2318] Incoming TCP connection from SSH, device ip x.x.x.x
[Notice] [3078149840-connectionMgr.c:2329] SSH Protocol is not enabled -- DeviceBroker is not ready for incoming device connection.
[Notice] [3078149840-connectionMgr.c:2318] Incoming TCP connection from SSH, device ip x.x.x.x
[Notice] [3078149840-connectionMgr.c:2329] SSH Protocol is not enabled -- DeviceBroker is not ready for incoming device connection.

I didn’t know that a simple schema upgrade could do something to the NSM that would not allow the devices to connect so I ended up contacting JTAC support. When I got a support engineer and explained him the issue he found another error message in the guiDaemon. The error was “DC not connected”.

After a while with troubleshooting the engineer discovered that the issue was the RSA key that is responsible for the communication between the NSM services  and the guiDaemon and devDaemon. The engineer then navigated to devSvr.cfg under /usr/netscreen/DevSvr/var and deleted the RSA keys (ourRsaPrivateKey and theirRsaPublicKey).

After that all the devices in some magical way connected again!

Secondary node locked when commiting

The other day I got a problem with one of my SRX clusters when I was running a commit. The commit was not able to complete and I got the following error:

{primary:node0}[edit]
srx1400# commit
node1:
error: configuration database modified
node0:
error: remote lock-configuration failed on node1

The reason for this error is some uncommited configuration on the secondary node. Earlier the same day I changed the primary for redundancy-group 0 and I guess that I didn’t commit all the config on node1 before changing to node0.

To solve this I had to go into the secondary node (node1) and rollback the uncommitted configuration. Normally you can use OOB to connect to the secondary node but I dont have it at this location. So I have to connect to the secondary node trough the primary node. This is done with the following command on branch devices (SRX650 and below):  request routing-engine login node 1
On High end devices like the one I’m working on (SRX1400 and above) you use: rlogin -T node1

{secondary:node1}% rlogin -T node1
root@srx1400>
--- JUNOS 11.4R9.4 built 2013-08-22 06:24:21 UTC
{secondary:node1}
root@srx1400> configure
warning: Clustering enabled; using private edit
error: shared configuration database modified

Please temporarily use 'configure shared' to commit
outstanding changes in the shared database, exit,
and return to configuration mode using 'configure'

As you can see from the error I have to use configure shared to be able to edit the configuration.

root@srx1400> configure shared
Entering configuration mode
The configuration has been changed but not committed

Before entering the rollback command you can check the uncommitted configuration by running show | compare. This will display all the uncommited configuration

{secondary:node1}[edit]
root@srx1400# show | compare
[edit access profile unos clientjunos]
- pap-password "$9$2V4GDikP5T3fTrvLXwsz36C0B"; ## SECRET-DATA
+ pap-password "$9$jhHP5QF/CA09AxdsYGUp0BRyl"; ## SECRET-DATA

Now you can rollback the uncommited config, check that there is any uncommited config left and exit the configuration mode.

{secondary:node1}[edit]
root@rx1400# rollback
load complete

{secondary:node1}[edit]
root@srx1400# show | compare

{secondary:node1}[edit]
root@srx1400# exit
Exiting configuration mode

{secondary:node1}
root@srx1400>

Now you can close the session and try to commit the configuration from the primary node again. It worked for me! 🙂

As a note I also know that alot of people has had a success of using just the command commit synchronize force on the primary node but it does not work for everyone.