The aim here is to describe the configuration of Sun LDOMs (also named Oracle VM For SPARC) as failover guest Domains. As the name suggests, failover guest domain is a Sun Logical Domain (often abbreviated LDOM) which could be hosted on one of the nodes of a Solaris Cluster at any one time. It implies that, in order to configure a failover guest domain we should already have installed and configured Solaris Cluster on a set of Control domains.
Oracle VM for SPARC provides the ability to split a single physical system into multiple, independent virtual systems. This is achieved by an additional software application in the firmware layer, interposed between the operating system and the hardware platform called as the hypervisor. It abstracts the hardware and can expose or hide various resources, allowing for the creation of resource partitions that can operate as discrete systems, complete with virtual CPU, memory and I/O devices."
While the benefit of platform virtualization encourages server consolidation, it introduces a problem that the physical platform now becomes a huge single point of failure. In this regard, the Solaris Cluster agent for LDoms guest domains aims to mitigate against a physical server failure by managing the guest domains. The agent performs the following:
- Manage the start/stop and restart of LDoms guest domains.
- Failover a Ldoms guest domain between Solaris Cluster nodes.
- Allow for strong positive and negative affinities between LDoms guest domains across Solaris Cluster nodes.
- Allow for different failover techniques, i.e. Stop/Failover/Start, Warm migration.
- Plugin probe facility for fault monitor.
At this point as the term “Migration” is being introduced, it is important to emphasize that migration on it's own does not represent high availability. The significant difference between the two [migration and high availability] is that migration, without Sun cluster and HA-LDOM agent, requires human intervention to initiate migration. whereas high availability is responsible for the automated failure detection in hardware and software so that services can be started on another cluster node without human intervention.
Below is a description of the architecture we're willing to deploy (click on the image for full size).
As you can see, we've 02 Coolthreads servers (we used Sun Fire T5440) connected to a shared Storage trough a SAN Infrastructure. A Service and Control Domain was configured as well (primary domain). Below we' won't covered neither the installation of Oracle VM For Sparc ( You can check Oracle VM Server for SPARC Administration Guide on oracle.com) nor the installation of Solaris Cluster (Again, Installation Guide for Solaris Cluster Available on oracle.com), we'll just describe the Failover guest domain Configuration.
Also, this configuration won't allow us to perform neither a Live nor a Warm Migration, only a cold migration will be possible in such configuration (the main reason here is that ZFS cannot be used as a Global FileSystem). If you're not familiar with the LDOM Migration, then i highly recommend you to read the Oracle VM Server for SPARC Administration Guide on oracle.com, the Migration process is very well described in this document.
1. Create the Zpool which will serve as Receptacle for the LDOM Domains Virtual Disks.
After Oracle VM for SPARC Installation/configuration and Solaris Cluster Installation/Configuration, A ZFS storage pool (Zpool) is created on Solaris Cluster did device (the shared LUN provided by the storage was identified as d14 on Solaris Cluster). Found better to partition the shared LUN and allocate all available space to slice 6.
2. Create Resource Group and add your Zpool as resource
Nodes List is provided during the creation in order to set a preference. In fact when the Resource group is brought online, Resource Group Manager will try to bring this RG up on the first node provided in this node list (in our case, ldom1-rg will be brought online on node1).
# clrs create -g ldom1-rg -t HAStoragePlus -p Zpools=zpool_ldom1 ldom1-fs-rs
# clrg online -M ldom1-rg
Check and verify that the Resource Group was brought online and Zpool imported and mounted on node1 (you can also verify that it wasn't mounted on Node2 :) )
3. Configure the Service and Control Domain (here it's the primary domain)
Keep in mind that both SC Domains should have same configurations in term of Services provided (same vswitch, same vdsdev...). So, the commands below should be run on Node1 and on Node2 and every time you perform a reconfiguration on one of the SC Domain, manually do the same on the other one.
#ldm add-vswitch net-dev=nxge2 primary-vsw1 primary
#ldm add-vdiskserver primary-vds0 primary
4. Create and configure your LDOM's Virtual Disks on Node1
Below, 02 Virtual Disks were created:
#zfs create zpool_ldom1/ldom1
#cd /zpool_ldom1/ldom1
#date && mkfile -v 100G disk0 && date
#date && mkfile -v 100G disk1 && date
After that, VDS-DEV corresponding to those 02 virtuals Disks were configured (again, on both primary domains). Doing so on the node where ldom1-rg is running wasn't a problem (was on Node1), but for the Node2, ldom1-rg should first be switched on the node (Otherwise, you'll have an error -the path must exist when adding the vds-dev)
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node2 ldom1-rg
On Node2:
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node1 ldom1-rg
Now, it's time to create and configure the LDOM.
Configuration of the LDOM itself must be performed only on One Node.
No need to perform the same on the others cluster nodes.
On Node 1:
#ldm add-domain ldom1
#ldm add-vcpu 8 ldom1
#ldm add-memory 16G ldom1
#ldm add-vdisk vdisk0 ldom1-vol0@primary-vds0 ldom1
#ldm add-vdisk vdisk1 ldom1-vol1@primary-vds0 ldom1
#ldm add-vnet vnet0 primary-vsw0 ldom1
#ldm add-vnet vnet1 primary-vsw1 ldom1
#ldm set-variable auto-boot\?=false ldom1
#ldm set-variable local-mac-address\?=true ldom1
#ldm set-variable boot-device=/virtual-devices@100/channel-devices@200/disk@0 ldom1
#ldm bind-domain ldom1
#ldm start ldom1
After that, go on the Console (#telnet localhost vconsport) and proceed to Classic Solaris 10 OS Installation.
6. Perform a test Migration (without LDOM-HA)
We choose to use a non-interactive migration (To perform this migration without being prompted for the target machine password). As explained in the introduction we've to perform a cold migration.
So we'll first stop and unbind the Logical Domain, then trigger a failover for the Zpool which contain the Vdisk of the LDOM and start the migration
On Node1:
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node2 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node2
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine.
On Node2:
#ldm bind ldom1
#ldm start ldom1
Check that LDOM is booting without any errors.
Perform any other tests you want for LDOM1, then shut it down and bring it back to node1
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node1 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node1
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine.
7. Create the SUNW.ldom resource for the guest Domain
Now, it's time to automate failure detection in hardware and software so that LDOM can be started on another cluster node without human intervention.
We'll achieve that by creating the following Resource in ldom1-rg
> -p password_file=/pfile -p Resource_dependencies=ldom1-fs-rs \
> –p Migration_type=NORMAL ldom1-ldom-rs
the first property (-p) here is simply the Domain Name, the Password File used here is the same used to test the migration (step 5), Resource dependencies is set to the Resource which is responsible of the Zpool which hosts the Vdisk, with this setting we're sure that ldom1 Domain won't be started if the zpool isn't imported and mounted on a node. The last property is just to precise the type of migration to use (Normal for cold Migration).
Test the migration of the Resource Group
# clrg switch -n node2 ldom1-rg
8. Maintenance of the Failover Guest Logical Domain
One thing to note is that SUNW.ldom will bring up the System even if you performed a clean shutdown from the Operating System (like an init 5). This behavior is normal because this agent is configured in a way that it'll always try to keep the Guest Domain up and running. So in order to perform LDOM's operations which request a downtime, the best thing to do is to suspend the resource group which own the domain. For example below, i decided to modify the hostid of a guest Domain which is under SUNW.ldom monitoring.
# ldm set-dom hostid=82a85641 ldom1
LDom ldom1 must be stopped before the hostid can be modified
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 11% 19d 4h 4m
ldom1 active -t---- 5001 4 16G 25% 1m
# clrg suspend ldom1-rg
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 3.2% 19d 4h 5m
ldom1 active -t---- 5001 4 16G 25% 1m
# ldm stop ldom1
LDom ldom1 stopped
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 8.6% 19d 4h 5m
ldom1 bound ------ 5001 4 16G
# ldm set-dom hostid=82a85641 ldom1
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 7.8% 19d 4h 5m
ldom1 bound ------ 5001 4 16G
# clrg resume ldom1-rg
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 9.0% 19d 4h 7m
ldom1 active -n---- 5001 4 16G 28% 1m
DONE.
Hope that this will help many people to deploy Highly Available LDOMs.
Below are links for Documentations i used.
Feel free to post any comments.
References:
Oracle Solaris Cluster Essentials, 2010, Prentice Hall
http://hub.opensolaris.org/bin/view/Project+ha-xvm/WebHome
http://download.oracle.com/docs/cd/E23120_01/index.html
http://www.oracle.com/technetwork/server-storage/solaris-cluster/overview/index.html?ssSourceSiteId=ocomen

