NSX on AWS (Or Google Cloud) – Part 3

Finally, let’s stretch our vSphere clusters from AWS to Google using the L2VPN feature of NSX and Ravello! This method also applies between a physical site (a branch office, your laptop…) and a cloud infrastructure.

What we will achieve:

img_54e26bb887aaf

Site A is going to be Google Cloud, side B AWS. The VLANs we chose to use are a bit different from the diagram above, but what we are trying to achieve is pretty much the same.

Side A does not use network virtualisation, site B has NSX 6.1.2 and VXLANs configured. I have tried 6.1.4 unsuccessfully because of a bug with self-signed certificates, no doubt 6.2 will fix that.

We will show that VM1 on this diagram can ping VM7, following this data path:

VM1 (Google) > VLAN > L2VPN Client > L2VPN Server > Logical switch (VXLAN) > VM7 (AWS)

In our setup, the “real” VLAN is VLAN501 and is tagged with dot1q 501 and  the VXLAN is VXLAN5000.

From parts 1 and 2 your site B should be up and running.

Our 2 cloud applications:

Screen Shot 2015-08-10 at 1.07.45 AM

Site B’s topology is:

Screen Shot 2015-08-10 at 1.03.33 AM

 

Screen Shot 2015-08-10 at 1.39.46 AM

At the top infrastructure level, we have the following

NS1 (CentOS) is running named and NTPd (192.168.1.101/24)

Vcenter 5.5 U2 appliance (192.168.1.100/24)

Jumphost is a Windows 7 VM running RDP on an elastic IP (192.168.1.5/24)

ESX MGMT is our management cluster for NSX manager and the controllers (192.168.1.11/24)

ESX PROD simulates a compute cluster (192.168.1.12/24)

The following nested/non-nested objects are managed by our VCenter server:

DC (DataCenter)

> MGMT (Cluster)

>>esx1.tomlab.com (Host)

>>>NSX Manager (VM) (192.168.1.200/24)

>>>NSX Controller1 (VM) (192.168.1.160/24)

>PRD (Cluster)

>>esx2.tomlab.com (Host)

>>>Server1 (VM) (192.168.4.2/24) on VXLAN5000 (logical switch)

And finally, network-wise:

– 1 Standard switch (1 uplink, esx1 and 2 vnic1) no VLAN configured (192.168.1.X)

– 1 VDS with 1 trunk port group (1 uplink, esx1 and esx2 vnic2) VLANs 500-505 configured

 

Site A is much simpler and simulates a small branch infrastructure:

Screen Shot 2015-08-10 at 1.03.58 AM

 

Screen Shot 2015-08-10 at 1.43.31 AM

 

Jumphost is a Windows 7 VM running RDP on an elastic IP (192.168.2.5/24)

Remote ESX is an ESXi server (192.168.2.11/24)

Vcenter 5.5 U2 appliance (192.168.2.100/24)

The following nested/non-nested objects are managed by our VCenter server:

DC2 (Datacentre)

>Cluster1 (Cluster)

>>esx1.tomlab2.com (Host)

>>>server1 (VM) (192.168.4.1/24)

Finally, network-wise:

esx1’s first nic is on a Standard switch, no VLAN configured, access to 192.168.2.0/24

esx1’s second nic is on a Distributed switch, 2 port groups configured, 1 access port group to VLAN 501, 1 trunk port group (500-505).

server1 is connected to the access port group on VLAN501.

L2 VPNs connect trunk ports together, hence the presence of trunk port groups at both sites. The creation of these ports require special parameters, there is some great documentation specifically about that here.

Let’s get started:

For SiteB (L2VPN server):

1) In the 2d part of this doc, we have deployed a logical switch. We will now connect a LIF of an edge gateway to this logical switch, and a second LIF of the same EG on our uplink network (192.168.1.X/24).

Follow the steps to create and configure your edge gateway (here are screenshots of the less standard parameters). It will be our L2VPN server and reside on our production cluster in the AWS cloud:

Screen Shot 2015-08-09 at 8.17.22 PM

Screen Shot 2015-08-09 at 8.17.46 PM

Screen Shot 2015-08-09 at 8.18.22 PM

Let’s just configure the uplink for now:

Screen Shot 2015-08-09 at 8.18.44 PM

Screen Shot 2015-08-09 at 8.19.14 PM

Once the VM is deployed, power it on and edit its second interface. The interface type must be set to trunk and connected to your trunk port on the VDS. Then you need to add a sub interface on the trunk with a TunnelID. This will make your VXLAN (Logical Switch) a part of the trunk:

Screen Shot 2015-08-09 at 9.01.16 PM

Screen Shot 2015-08-09 at 9.00.09 PM

Next step is to generate a CSR and self-sign our certificate (head to the certificates section below your edge interfaces configuration), click on Actions > “Generate CSR”, fill in basic information and once your CSR is generated, click on Actions > “self-sign certificate” and enter a number of days.

Screen Shot 2015-08-09 at 8.52.27 PM

Screen Shot 2015-08-09 at 8.53.03 PM

 

Then head to manage > VPN and choose L2VPN, and enter your server settings:

Screen Shot 2015-08-09 at 9.01.51 PM

Once done, we need to enable the peer site (SiteA, the client), with the following configuration:

Choose a password there that you will use during SiteA configuration later and the interfaces that you want to stretch to the remote site:

Screen Shot 2015-08-09 at 9.03.14 PM

 

Then just click on the “enable” button and publish your changes.

We have chosen 192.168.1.250 as the listener IP so we will need to configure DNAT on the Ravello web UI, ElasticIP > 192.168.1.250:443. The way to do this is to add a secondary IP on your ESX PROD host (192.168.1.250) and add a service on 443, bound to an elastic IP:

Screen Shot 2015-08-10 at 11.03.32 am

 

Screen Shot 2015-08-10 at 11.03.43 am

 

This will make the L2VPN service on the edge gateway listen to incoming connections on port 443. Test if everything works well by issuing

telnet <elastic IP> 443

If you get a reply, proceed to the following step, if not review your settings.

Refer to this site if you need more help.

Now for SiteA:

We first need to deploy the NSX client OVF available with every NSX release on vmware’s download website. Just do that  from your jump host using either the vsphere client or the web UI.

You will have to fill out  a few configuration settings:

The setup network screen sets the uplink (192.168.2.0/24) and the trunk port that we are going to use. The trunk port MUST be different from the access port your VM is connected to, but MUST have it in its trunked VLANs list.

In other words, we will use 3 port groups for this setup. 1 “VM network” (192.168.2.0/24), 1 “VLAN501″ configured as an access port to VLAN 501 (and our linux server is connected to that one”) and 1 TRUNK port group configured as a trunk for VLANs 500-505. On the “setup network” screen, use 192.168.2.0/24 as your public source and TRUNK as your trunk source.

This shows the trunk port configuration on site :
Screen Shot 2015-08-09 at 10.55.05 PM

Screen Shot 2015-08-09 at 10.30.42 PM

 

For the server address, enter your elastic IP, and for the rest, just match your server settings.

Once the appliance runs, the connection should be automatically established between your 2 sites:

Screen Shot 2015-08-10 at 12.49.44 AM

If your tunnel is not established, try the edge l2vpn troubleshooting commands from this blog.

If not, you are pretty much done, and should have connectivity between your linux machines (if issues still exist it might be because of the configuration of the security of your port groups, check out the blog aforementioned):

Screen Shot 2015-08-10 at 1.01.55 AM

Screen Shot 2015-08-10 at 1.00.57 AM

That’s it… your L2 network is now stretched between Google Cloud and AWS. This opens a quite a lot of possibilities, think disaster recovery, vmotion… NSX can be used as-a-service over any IaaS cloud (provided the extra layer Ravello provides for now!) We will explore them in future posts.

Thanks for Ravello’s support and let me know if you have questions!