Steve's Solaris/Linux experiences: Oracle VM for Sparc with Solaris Cluster 3.3 (Failover guest domain)

2011/09/02

Oracle VM for Sparc with Solaris Cluster 3.3 (Failover guest domain)

The aim here is to describe the configuration of Sun LDOMs (also named Oracle VM For SPARC) as failover guest Domains. As the name suggests, failover guest domain is a Sun Logical Domain (often abbreviated LDOM) which could be hosted on one of the nodes of a Solaris Cluster at any one time. It implies that, in order to configure a failover guest domain we should already have installed and configured Solaris Cluster on a set of Control domains.

Oracle VM for SPARC provides the ability to split a single physical system into multiple, independent virtual systems. This is achieved by an additional software application in the firmware layer, interposed between the operating system and the hardware platform called as the hypervisor. It abstracts the hardware and can expose or hide various resources, allowing for the creation of resource partitions that can operate as discrete systems, complete with virtual CPU, memory and I/O devices."

While the benefit of platform virtualization encourages server consolidation, it introduces a problem that the physical platform now becomes a huge single point of failure. In this regard, the Solaris Cluster agent for LDoms guest domains aims to mitigate against a physical server failure by managing the guest domains. The agent performs the following:

Manage the start/stop and restart of LDoms guest domains.
Failover a Ldoms guest domain between Solaris Cluster nodes.
Allow for strong positive and negative affinities between LDoms guest domains across Solaris Cluster nodes.
Allow for different failover techniques, i.e. Stop/Failover/Start, Warm migration.
Plugin probe facility for fault monitor.

At this point as the term “Migration” is being introduced, it is important to emphasize that migration on it's own does not represent high availability. The significant difference between the two [migration and high availability] is that migration, without Sun cluster and HA-LDOM agent, requires human intervention to initiate migration. whereas high availability is responsible for the automated failure detection in hardware and software so that services can be started on another cluster node without human intervention.

Below is a description of the architecture we're willing to deploy (click on the image for full size).

As you can see, we've 02 Coolthreads servers (we used Sun Fire T5440) connected to a shared Storage trough a SAN Infrastructure. A Service and Control Domain was configured as well (primary domain). Below we' won't covered neither the installation of Oracle VM For Sparc ( You can check Oracle VM Server for SPARC Administration Guide on oracle.com) nor the installation of Solaris Cluster (Again, Installation Guide for Solaris Cluster Available on oracle.com), we'll just describe the Failover guest domain Configuration.

Also, this configuration won't allow us to perform neither a Live nor a Warm Migration, only a cold migration will be possible in such configuration (the main reason here is that ZFS cannot be used as a Global FileSystem). If you're not familiar with the LDOM Migration, then i highly recommend you to read the Oracle VM Server for SPARC Administration Guide on oracle.com, the Migration process is very well described in this document.

1. Create the Zpool which will serve as Receptacle for the LDOM Domains Virtual Disks.

After Oracle VM for SPARC Installation/configuration and Solaris Cluster Installation/Configuration, A ZFS storage pool (Zpool) is created on Solaris Cluster did device (the shared LUN provided by the storage was identified as d14 on Solaris Cluster). Found better to partition the shared LUN and allocate all available space to slice 6.

# zpool create zpool_ldom1 /dev/did/dsk/d14s6

2. Create Resource Group and add your Zpool as resource

Nodes List is provided during the creation in order to set a preference. In fact when the Resource group is brought online, Resource Group Manager will try to bring this RG up on the first node provided in this node list (in our case, ldom1-rg will be brought online on node1).

# clrg create –n node1,node2 ldom1-rg
# clrs create -g ldom1-rg -t HAStoragePlus -p Zpools=zpool_ldom1 ldom1-fs-rs
# clrg online -M ldom1-rg

Check and verify that the Resource Group was brought online and Zpool imported and mounted on node1 (you can also verify that it wasn't mounted on Node2 :) )

#zpool list

3. Configure the Service and Control Domain (here it's the primary domain)

Keep in mind that both SC Domains should have same configurations in term of Services provided (same vswitch, same vdsdev...). So, the commands below should be run on Node1 and on Node2 and every time you perform a reconfiguration on one of the SC Domain, manually do the same on the other one.

#ldm add-vswitch net-dev=nxge0 primary-vsw0 primary
#ldm add-vswitch net-dev=nxge2 primary-vsw1 primary
#ldm add-vdiskserver primary-vds0 primary

4. Create and configure your LDOM's Virtual Disks on Node1

Below, 02 Virtual Disks were created:

#zfs create zpool_ldom1/ldom1
#cd /zpool_ldom1/ldom1
#date && mkfile -v 100G disk0 && date
#date && mkfile -v 100G disk1 && date

After that, VDS-DEV corresponding to those 02 virtuals Disks were configured (again, on both primary domains). Doing so on the node where ldom1-rg is running wasn't a problem (was on Node1), but for the Node2, ldom1-rg should first be switched on the node (Otherwise, you'll have an error -the path must exist when adding the vds-dev)

On Node1:
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node2 ldom1-rg

On Node2:
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node1 ldom1-rg

5. Create and Configure Failover Guest Domain (ldom1)

Now, it's time to create and configure the LDOM.

Configuration of the LDOM itself must be performed only on One Node.

No need to perform the same on the others cluster nodes.

On Node 1:

#ldm add-domain ldom1
#ldm add-vcpu 8 ldom1
#ldm add-memory 16G ldom1
#ldm add-vdisk vdisk0 ldom1-vol0@primary-vds0 ldom1
#ldm add-vdisk vdisk1 ldom1-vol1@primary-vds0 ldom1
#ldm add-vnet vnet0 primary-vsw0 ldom1
#ldm add-vnet vnet1 primary-vsw1 ldom1
#ldm set-variable auto-boot\?=false ldom1
#ldm set-variable local-mac-address\?=true ldom1
#ldm set-variable boot-device=/virtual-devices@100/channel-devices@200/disk@0 ldom1
#ldm bind-domain ldom1
#ldm start ldom1

After that, go on the Console (#telnet localhost vconsport) and proceed to Classic Solaris 10 OS Installation.

6. Perform a test Migration (without LDOM-HA)

We choose to use a non-interactive migration (To perform this migration without being prompted for the target machine password). As explained in the introduction we've to perform a cold migration.

So we'll first stop and unbind the Logical Domain, then trigger a failover for the Zpool which contain the Vdisk of the LDOM and start the migration

On Node1:
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node2 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node2
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine.

On Node2:
#ldm bind ldom1
#ldm start ldom1
Check that LDOM is booting without any errors.
Perform any other tests you want for LDOM1, then shut it down and bring it back to node1
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node1 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node1
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine.

7. Create the SUNW.ldom resource for the guest Domain

Now, it's time to automate failure detection in hardware and software so that LDOM can be started on another cluster node without human intervention.

We'll achieve that by creating the following Resource in ldom1-rg

# clrs create –g ldom1-rg –t SUNW.ldom –p Domain_name=ldom1 \
> -p password_file=/pfile -p Resource_dependencies=ldom1-fs-rs \
> –p Migration_type=NORMAL ldom1-ldom-rs

the first property (-p) here is simply the Domain Name, the Password File used here is the same used to test the migration (step 5), Resource dependencies is set to the Resource which is responsible of the Zpool which hosts the Vdisk, with this setting we're sure that ldom1 Domain won't be started if the zpool isn't imported and mounted on a node. The last property is just to precise the type of migration to use (Normal for cold Migration).

Test the migration of the Resource Group
# clrg switch -n node2 ldom1-rg

8. Maintenance of the Failover Guest Logical Domain

One thing to note is that SUNW.ldom will bring up the System even if you performed a clean shutdown from the Operating System (like an init 5). This behavior is normal because this agent is configured in a way that it'll always try to keep the Guest Domain up and running. So in order to perform LDOM's operations which request a downtime, the best thing to do is to suspend the resource group which own the domain. For example below, i decided to modify the hostid of a guest Domain which is under SUNW.ldom monitoring.

# ldm set-dom hostid=82a85641 ldom1
LDom ldom1 must be stopped before the hostid can be modified
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 11% 19d 4h 4m
ldom1 active -t---- 5001 4 16G 25% 1m
# clrg suspend ldom1-rg
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 3.2% 19d 4h 5m
ldom1 active -t---- 5001 4 16G 25% 1m
# ldm stop ldom1
LDom ldom1 stopped
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 8.6% 19d 4h 5m
ldom1 bound ------ 5001 4 16G
# ldm set-dom hostid=82a85641 ldom1
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 7.8% 19d 4h 5m
ldom1 bound ------ 5001 4 16G
# clrg resume ldom1-rg
# ldm ls
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 32992M 9.0% 19d 4h 7m
ldom1 active -n---- 5001 4 16G 28% 1m

DONE.

Hope that this will help many people to deploy Highly Available LDOMs.

Below are links for Documentations i used.

Feel free to post any comments.

References:
Oracle Solaris Cluster Essentials, 2010, Prentice Hall
http://hub.opensolaris.org/bin/view/Project+ha-xvm/WebHome
http://download.oracle.com/docs/cd/E23120_01/index.html
http://www.oracle.com/technetwork/server-storage/solaris-cluster/overview/index.html?ssSourceSiteId=ocomen

56 comments:

Anonymous31 January 2012 at 16:09
Hi Steve, it' s a great article with clear and easy to follow procedures:). My company is embarking on a huge project that will likely to use the LDOM with HA setup and your article gives me a very good foundation on what's going to happen.

Hence, does it mean that if we want to use the live migration feature of LDOM, we cannot use ZFS as the Global FileSystem within the control domain. Do you know anyway we can workaround it?

Thanks.
LD
ReplyDelete
Replies
Anonymous5 February 2012 at 16:08
Hi Steve,

Ok thanks for the clarification. I did some read up and understood the following but not sure if they are correct. Really appreciate if you can cross check to see if we are on the same page.

- Solaris Cluster is not required for Live Migration to work. But there are certain
pre-requisites that must be met as indicated in Chapter 9 of the Oracle VM Server for SPARC 2.1 Administration Guide for Live migration to work.

- For Live Migration to work with Solaris Cluster, only image files can be used as backend disk devices and they must reside on a Global FileSystem (filesystem which is "accessible" via multiple nodes simultaeously). Since ZFS only supports HA on Local FileSystem, it cannot be used as a Global FileSystem hence we have to resort to a VXFS or UFS formated filesystem instead.

This also means other types of backend disk devices like VXFS/SVM volumes or raw devices cannot be used as a repository for the LDOM guest OS(s).

Lastly, I am also quite curious on how you backup and recover the guest and control LDOMs at your site. Care to share your backup and recovery strategy as well?
Millions of thanks.
ReplyDelete
Replies
Anonymous14 February 2012 at 17:28
I've been told by Oracle backline that we can pass the Solaris Cluster DIDs directly to the guest domains and do live migration that way. So, we don't 'need' to use a PxFS/Global FS to be the backing store for the LDOM virtual disks. Simply add the /dev/did/dsk/dNs2 to the vds as vdsdev, set the vdsdev as the vdisk in the HA LDOM and then let the HA-LDOM data service do the migration.

I just tested this manually and it works like a charm. Having issues the the SUNW.ldom agent however (got a case open with Oracle for that).
ReplyDelete
Replies
Anonymous19 February 2012 at 15:54
Ok Great Steve! Thanks for the detailed answer. Based on all the information that I gathered and understood, I would like to use the following setup but do you think it is feasible or does it differ from the industrial standard on how things are done?

* Solaris Cluster 3.3 Update 5 will be installed on Oracle Solaris 10 with ZFS root pool on each cluster node
* OS image file will be stored on each guest domain ZFS pool but for the virtual disk(s) used to store data, a complete LUN in the form of /dev/did/dsk/dXXs2 will be exported to the guest domains.
As such two virtual disk servers will be created - the primary vds to store the OS zpools which contain the image files and a second vds to hold the data LUNs.
* For networking, each logical domain will have
- one primary virtual network switch with 2 network ports aggregated from 2 different physical switches
- one backup LAN virtual network switch that comprise of only one port which is shared across all logical domains for OS backup
* The rest of the configuration is the same as what was describe in Steve' s blog except that the mode of migration_type is set to MIGRATE
ReplyDelete
Replies
Anonymous1 March 2012 at 11:45
Hi Steve, Thanks for the excellent documentation. I have an query on HALdom setup. As we use HALdom agent how do we control Applications/Databases which installed on failover ldom? Should Applications/Databases be confgured with SMF?
ReplyDelete
Replies
Anonymous2 March 2012 at 15:59
Hi Steve,

Thanks for your response & update. Yes It's about relation between the services(Applications & Databases) which are running in the Failover Guest Domain and The Cluster Software.

In Failover Guest Domain(HALdom) setup, is there any way to configure resource in cluster that it would trigger failover RG if Applications/Databases got some issue?

I assume that can't be done with Failover Guest Domain(HALdom) setup. That can be done in Solaris Cluster on 02 guests LDOM only.

Am I correct?
ReplyDelete
Replies
Anonymous6 March 2012 at 09:15
Hi,

Just curious can the SUNW.ldom resource and resource group creation (I mean all Solaris cluster steps) be defered to a later stage instead of performing them at the beginning or middle? Thanks
ReplyDelete
Replies
Anonymous7 March 2012 at 01:37
Thanks Dwai & steve I agree with your updates :)
ReplyDelete
Replies
Anonymous9 March 2012 at 04:35
Hi,

Just would like to c;larify can the cluster steps clrg/clrs be deferred to a later stage.. Meaning only after I have tested all the SAN/network connectivity and after the guest LDOMs OS are all setup, then the guest LDOMs be managed by the cluster service? Thanks.
ReplyDelete
Replies
Anonymous24 May 2012 at 23:20
What do you think to install SUN Cluster in each guest domain instead of Control Domain? Are there an disadvantages?
ReplyDelete
Replies
Vineeth Elias Mathew20 June 2013 at 06:56
Hi, Can we reboot the Control Domain without affecting guest domains? Why?
ReplyDelete
Replies
endtimes29 June 2013 at 11:42
Hi Steve
Great blog on fail-over, have u tried it for 4.1? and RAC CL ZONES?
ReplyDelete
Replies
Unknown17 July 2013 at 01:06
hi Steve,
thanks for this helpful document.
I'm not familiar with OVM and i have some question:
- cluster configuration must be done before LDOM configuration or after?
-how I can migrate my servers Solaris 8 and Solaris 9 into guest domain?
-do you have a procedure how to install and configure cluster within LDOM?

thanks for cooperation.

best regards...
ReplyDelete
Replies
Unknown17 July 2013 at 18:23
dear Steve,

Thank for your prompt reply.
can we use P2V utility to have our Solaris 8,9 machine on the OVM or it better to create a guest Solaris 10 under the LDOM and make a zone Solaris 8 and 9 ?

best regards...
ReplyDelete
Replies
Unknown16 September 2013 at 09:55
We have two T3-1 blades and have 4 ldoms in each.The LUNS are shared.
We are able to migrate the guest ldoms manually from one blade to another.
We want to make sure that incase one blade goes down the guest ldoms in that blade migrate automatically to the other blade.

This did not happen once one of our blade got down. How do we accomplish this feature ?
ReplyDelete
Replies
Unknown6 June 2014 at 08:02
Hi

I have 4 gust domin with zfs file sytem can i create cluster using this zfs file system os-solaris11 & sun cluster is 4.1

Please help on this
i saw some new zfs will not support global file sytem then how can i care cluster in b/w two gest domains
Thanks
Kesava
ReplyDelete
Replies
Unknown7 June 2014 at 08:20
Hi Steve

Thanks for your quick responce,
i have 2 SparcT5 servers, each servers i have created (4 ldoms), i have to configure cluster how can i do this ?
Example:
primary1 primary2
ldom1 ldom1

i have to configure cluster form primary1 ldom to primary 2 ldom, please help on this
I have installed cluster s/w on each ldom is it corrrect ?
Thanks
Kesava
ReplyDelete
Replies
Unknown7 June 2014 at 08:24
As per your update so, can i create zpool's inside the ldoms ? is it correct.or i have to create zpools in primary domain .
ReplyDelete
Replies
Unknown7 June 2014 at 08:47
Hi Steve

Please share the steps how to configure sun cluster on 2 gust domain(ldoms),now i am working on this could you provide ASAP all ere in (solaris11 -os),

Thanks
Kesava
ReplyDelete
Replies
Unknown8 June 2014 at 06:39
Hi Steve

I am waiting for for your update ??

Thanks
Kesava
ReplyDelete
Replies
Unknown9 June 2014 at 14:43
Dear Steve,

Thanks For sharing the Document it's very easy to understand , i will fellow same.

Thanks
Kesava
ReplyDelete
Replies
Unknown10 June 2014 at 12:36
hi Steve,

I got one problem setting the stack vale of Oracle user ,how to do it permanent,
Remaining all vales i am able to set but stack value not able to set(min,max)

Thanks
Kesava
ReplyDelete
Replies
Unknown14 June 2014 at 16:14
Hi Steve,

Could you please help me to configure the quorum device on 2ldoms from different hosts, i have assigned share lun to 2 physical server but i am not able to add this lun to gust domain.

so,Please help me how add a shared lun to gust domains in cluster environment.

Thanks
Kesava
ReplyDelete
Replies
Unknown14 June 2014 at 17:20
Hi Steve,

How can map shared luns to directly to gust domain. please provide the info if u have any idea abount above 2 requests.

Thanks
Kesava
ReplyDelete
Replies
Anonymous25 July 2014 at 20:51
Steve,

I have a question on the LDOM and ZFS filesystem.

If I share a LUN between the source and target T4 servers and then assign this LUN to the LDOM created on the source system as a vdisk and create a ZFS pool on that vdisk.

Then will I be able to migrate this ldom from the source server to the target server on-need basis and will the ZFS pool be seen on the target system as the below underlying physical disk/LUN is the same and seen on both the nodes?

I am asking this question because ZFS is more a volume manager and not a clustered file system so will a vdisk with ZFS be allowed to migrate and be made available on the target server.

The doc below does show a neat example but I am still looking for clarifications and further insights if this is something that would happen or not.

http://www.oracle.com/technetwork/server-storage/vm/ovm-sparc-livemigration-1522412.pdf

Appreciate your inputs

Thanks in advance,

Rohan
ReplyDelete
Replies
Unknown1 April 2015 at 04:03
Hi Steve,

We are using Solaris 10 and cluster version 3.3u2 and ldm version 3.1.1.2.2.
We are using RAW disks (LUNs) for guest domains.
So while configuring failover LDOM, what we have to take care with respect to RAW disks ?

Thanks,
Harish
ReplyDelete
Replies
Unknown12 April 2015 at 18:56
Hi Steve,

I am able to do the live-migration with migrate-domain option.
However I am trying to do the same under sun-cluster 3.3 and when I am trying to create the resource getting error like invalid password file.
Even I have tried with point#5 as you specified then getting below message .Please suggest on this.

"Failed to establish connection with ldmd(1m) on target: node2
Check that the 'ldmd' service is enabled on the target machine and
that the version supports Domain Migration. Check that the 'xmpp_enabled'
and 'incoming_migration_enabled' properties of the 'ldmd' service on
the target machine are set to 'true' using svccfg(1M)."
ReplyDelete
Replies
Unknown12 April 2015 at 20:19
hey Steve,
Its done , created that file on both the nodes :)

Thanks
Harish
ReplyDelete
Replies
Gurpreet4 May 2016 at 06:55
The error is also seen in case incorrect destination root password is supplied
ReplyDelete
Replies
Unknown23 May 2017 at 08:26
More appreciable. Thank you sharing
web development company in Bangalore | Website Design Company in Bangalore | user experience design firms india| ux design firms in bangalore
ReplyDelete
Replies
Anonymous22 May 2019 at 19:09
Reviving this thread! Comparing this to vSphere HA/DRS, what would happen if you had a host failure and the target server didn't have enough resources? Would LDOM's be created on the target and just not booted?

Can you have a cluster with more than 2 nodes? Lets say I had a cluster with 4 T7-2's and one host failed. What is the logic behind where the LDOMS are booted up on the surviving hosts?
ReplyDelete
Replies

Add comment