Oracle VM Server for SPARC Physical-to-Virtual: Conversion of Physical Systems to LDOM in Clustered Control Domains

In my previous post, i detailed the configuration of Sun LDOMs (now called Oracle VM for SPARC) as failover guest domains. In this post, I’m going to describe the migration of some physical Servers (running Solaris 10) in the Clustered Control Domains configured previously. In order to achieve such migrations, i used Oracle VM Server for SPARC P2V tool (ldmp2v). 
The Oracle VM Server for SPARC P2V Tool automatically converts an existing physical system to a virtual system that runs in a logical domain on a chip multithreading (CMT) system. The source system can be any of the following:
  •  Any sun4u SPARC based system that runs at least the Solaris 8 OS
  •  Any sun4v system that runs the Oracle Solaris 10 OS, but does not run in a logical domain

The Migration of Physical Systems to Clustered Control Domains is performed trough 04 steps which are described below: Phases: Collection --> Preparation --> Conversion --> Clustering

Collection Phase:

During the collection phase, we’ll perform the Backup and Archiving of the Source System. For this step, we’ve decided to separate the collection of System Image and the collection of Database Data Backup. 
The Collection of the System Image is performed trough the ldmp2v tool. The Oracle VM Server for SPARC P2V Tool package must be installed and configured only on the control domain of the target system. It's not needed to be installed on the source system. Instead, the /usr/sbin/ldmp2v script was copied from the target system to the source system. Ldmp2v command creates a backup of all mounted UFS file systems, except the one excluded by option –x. Below are the commands to run on the Source System (The System we're migrating). 

/*Archive_System is a NFS server*/
# mount Archive_System:/share /mnt  
# mkdir /mnt/$(hostname)

Start the Collection Phase (/data here contains Database's Data Files, so this FS was excluded and will be restored separately).

/*Backup of System using ldmp2v*/
# ldmp2v collect -d /mnt/$(hostname) –x /data
/*Backup of Database's Data File*/
# tar cvf /mnt/data.tar /data/*             

Preparation Phase:

This phase will take place on one of the target Control Domain (Only on one Control Domain). The aim here is to create the target system according to the collect phase. 

1. Create the Zpool which will serve as Receptacle for the Guest Domain

We've identified the Did device which will be used to create the Zpool (d11) and format this device by allocating the entire space of the Device to the slice 6.

# zpool create zpool_system1 /dev/did/dsk/d11s6

2. Create the Server Resource Group Cluster for this Server  and add in your Zpool as a resource group

# clrg create –n node1,node2 system1-rg
# clrs create -g system1-rg -t HAStoragePlus  -p Zpools=zpool_system1 system1-fs-rs
# clrg online -M system1-rg
# zpool list
 rpool                278G  12.8G   265G     4%  ONLINE  -
 zpool_system1 99.5G  79.5K  99.5G    0%  ONLINE  /

3. Create the /etc/ldmp2v.conf file and configure the following properties

# cat /etc/ldmp2v.conf 

4. Start the Restoration

The file system image is restored to one or more virtual disks.

# mount Archive_System:/share /mnt
# ldmp2v prepare -vvv -b file -d /mnt/system1 -m /:15g system1
Available VCPUs: 234
Available memory: 218624 MB
Creating vdisks ...
Resizing partitions ...
Resize /
Partition(s) on disk /dev/dsk/c1t0d0 were resized, adjusting disksize ...
Creating vdisk system1-disk0 ...
Creating volume system1-vol0@primary-vds0 (75653 MB)...
Creating file //zpool_system1//system1/disk0 ...
Creating VTOC on /dev/rdsk/c6d0s2 (disk0) ...
Creating file systems ...
Creating UFS file system on /dev/rdsk/c6d0s0 ...
Creating UFS file system on /dev/rdsk/c6d0s4 ...
Creating UFS file system on /dev/rdsk/c6d0s6 ...
Creating UFS file system on /dev/rdsk/c6d0s5 ...
Creating UFS file system on /dev/rdsk/c6d0s3 ...
Populating file systems ...
Mounting /var/run/ldmp2v/system1 ...
Mounting /var/run/ldmp2v/system1/app ...
Mounting /var/run/ldmp2v/system1/export/home ...
Mounting /var/run/ldmp2v/system1/opt ...
Mounting /var/run/ldmp2v/system1/var ...
Extracting Flash archive /mnt/system1/system1.flar to /var/run/ldmp2v/system1 ...
41999934 blocks
Modifying guest OS image ...
Modifying SVM configuration ...
Modifying /etc/vfstab ...
Modifying network interfaces ...
Modifying /devices ...
Modifying /dev ...
Creating disk device links ...
Cleaning /var/fm/fmd ...
Modifying platform specific services ...
Modifying /etc/path_to_inst ...
Unmounting file systems ...
Unmounting /var/run/ldmp2v/system1/var ...
Unmounting /var/run/ldmp2v/system1/opt ...
Unmounting /var/run/ldmp2v/system1/export/home ...
Unmounting /var/run/ldmp2v/system1/app ...
Unmounting /var/run/ldmp2v/system1 ...
Creating domain ...
Attaching vdisks to domain system1 ...
Attaching volume system1-vol0@primary-vds0 as vdisk disk0 ...
Setting boot-device to disk0:a

Conversion Phase:

During the conversion phase, the logical domain uses the Solaris upgrade process to upgrade to the Oracle Solaris 10 OS. The upgrade operation removes all existing packages and installs the Oracle Solaris 10 sun4v packages, which automatically performs a sun4u-to-sun4v conversion.The convert phase can use an Oracle Solaris DVD ISO image or a network installation image. Here we used DVD ISO Image.
Before starting the conversion, it's better to stop the source system (prevents duplicate IP addresses).

# ldmp2v convert -i /export/home/ldoms/iso/sol-10-u9-ga-sparc-dvd.iso  -d /mnt/system1/ -v system1
LDom system1 started
Waiting for Solaris to come up ...
ldmp2v: ERROR: Timeout waiting for Solaris to come up. 
The Solaris install image /export/home/ldoms/iso/sol-10-u9-ga-sparc-dvd.iso cannot be booted.

For unknown reasons, i got the error highlighted above (Maybe the timeout specified in ldmp2v.conf was too short). Nevertheless the error could be ignored because the OS started. That could be checked by running ldm ls and telnet to the console (see below).

# ldm ls
primary          active     -n-cv-  SP      16    32992M   3.5%  16d 21h 18m
system1          active     -t----  5000    2     8G        50%  19s

# telnet localhost 5002
Connected to localhost.
Escape character is '^]'.

Connecting to console "system1" in group "system1" ....
Press ~? for control options ..
/ - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - Skipped interface vnet0
Reading ZFS config: * done.
Setting up Java. Please wait...
Serial console, reverting to text install
Beginning system identification...
Searching for configuration file(s)...
Search complete.
Discovering additional network configuration...

Select a Language

   0. English
   1. Brazilian Portuguese
   2. French
   3. German
   4. Italian
   5. Japanese
   6. Korean
   7. Simplified Chinese
   8. Spanish
   9. Swedish
  10. Traditional Chinese

Please make a choice (0 - 10), or press h or ? for help: 0

Note – The answers to the sysid questions are only used for the duration of the upgrade process. This data is not applied to the existing OS image on disk. The fastest and simplest way to run the conversion is to select Non-networked. The root password that you specify does not need to match the root password of the source system. The system's original identity is preserved by the upgrade and takes effect after the post-upgrade reboot. The time required to perform the upgrade depends on the Oracle Solaris Cluster that is installed on the original system.

- Solaris Interactive Installation --------------------------------------------
This system is upgradable, so there are two ways to install the Solaris
The Upgrade option updates the Solaris software to the new release, saving
as many modifications to the previous version of Solaris software as
possible. Back up the system before using the Upgrade option.
The Initial option overwrites the system disks with the new version of
Solaris software. This option allows you to preserve any existing file
systems. Back up any modifications made to the previous version of Solaris
software before starting the Initial option.
After you select an option and complete the tasks that follow, a summary of
your actions will be displayed.
F2_Upgrade F3_Go Back F4_Initial F5_Exit F6_Help

Clustering Phase:

During the Clustering Phase, we'll configure the Control Domain of the others Clusters Nodes. As described in my previous post, both Control Domain should have the same services configured.
The Vswitch and VdiskServer was created during the installation/configuration of Control Domains. So, the first thing to do at this stage is to switch-over the Resource Group which contains the zpool and create the vds-dev required by the others System Controllers Nodes.
On Node which owned both the zpool and the ldom system1, stop the domain and perform the switchover of the zpool.

Node 1:

# zpool list
rpool            278G  12.8G   265G     4%  ONLINE  -
zpool_system1    99.5G  79.5K  99.5G    0%  ONLINE  /
# ldm ls
primary          active     -n-cv-  SP      16    32992M   2.9%  17d 3h 37m
system1          active     -n----  5002    2     8G       1.9%  2h 35m
# ldm stop system1
# ldm ls -l primary
primary          active     -n-cv-  SP      16    32992M   3.6%  17d 4h 26m

Solaris running

    NAME             VOLUME         OPTIONS          MPGROUP        DEVICE
    primary-vds0    system1-vol0                            //zpool_system1//system1/disk0
                    system1-solarisdvd      ro         /export/.../sol-10-u9-ga-sparc-dvd.iso
# clrg switch -n node2 system1-rg

Node 2:

# zpool list
rpool            278G  12.8G   265G     4%  ONLINE  -
zpool_system1    99.5G  79.5K  99.5G    0%  ONLINE  /
# ldm add-vdsdev //zpool_system1//system1/disk0 itsmserver-vol0@primary-vds0
# ldm add-vdsdev options=ro /export/home/ldoms/iso/sol-10-u9-ga-sparc-dvd.iso \
# ldm ls -l primary
primary          active     -n-cv-  SP      16    32992M   3.6%  17d 4h 26m

Solaris running

    NAME             VOLUME         OPTIONS          MPGROUP        DEVICE
    primary-vds0    system1-vol0                          //zpool_system1//system1/disk0
                  system1-solarisdvd       ro               /export/.../sol-10-u9-ga-sparc-dvd.iso
# clrg switch -n node1 system1-rg

It's now time to create the SUNW.LDOM Service, this service will automate the failure detection in hardware and software so that LDOM can be started on another cluster node without human intervention.

clrs create –g system1-rg –t SUNW.ldom  –p Domain_name=system1 \
-p password_file=/migrate/noninteractive \
-p Resource_dependencies=system1-fs-rs \
–p Migration_type=NORMAL system1-ldom-rs

The parameters provided were explained in my previous post
Here we are, the Migrated System is up and running in the Clustered System Controller.
A failover could be initiated by running "clrg switch -n nodex system1-rg" , but keep in mind that Migration Type was set to NORMAL. It means that the LDOM Guest OS must be down (at OBP, don't stop because the SUNW.LDOM agent will restart if you do so) when a failover is initiated. 

Oracle Solaris Cluster Essentials, 2010, Prentice Hall


Oracle VM for Sparc with Solaris Cluster 3.3 (Failover guest domain)

The aim here is to describe the configuration of Sun LDOMs (also named Oracle VM For SPARC) as failover guest Domains. As the name suggests, failover guest domain is a Sun Logical Domain (often abbreviated LDOM) which could be hosted on one of the nodes of a Solaris Cluster at any one time. It implies that, in order to configure a failover guest domain we should already have installed and configured Solaris Cluster on a set of Control domains.

Oracle VM for SPARC provides the ability to split a single physical system into multiple, independent virtual systems. This is achieved by an additional software application in the firmware layer, interposed between the operating system and the hardware platform called as the hypervisor. It abstracts the hardware and can expose or hide various resources, allowing for the creation of resource partitions that can operate as discrete systems, complete with virtual CPU, memory and I/O devices."

While the benefit of platform virtualization encourages server consolidation, it introduces a problem that the physical platform now becomes a huge single point of failure. In this regard, the Solaris Cluster agent for LDoms guest domains aims to mitigate against a physical server failure by managing the guest domains. The agent performs the following:
  • Manage the start/stop and restart of LDoms guest domains.
  • Failover a Ldoms guest domain between Solaris Cluster nodes.
  • Allow for strong positive and negative affinities between LDoms guest domains across Solaris Cluster nodes.
  • Allow for different failover techniques, i.e. Stop/Failover/Start, Warm migration.
  • Plugin probe facility for fault monitor.

At this point as the term “Migration” is being introduced, it is important to emphasize that migration on it's own does not represent high availability. The significant difference between the two [migration and high availability] is that migration, without Sun cluster and HA-LDOM agent, requires human intervention to initiate migration. whereas high availability is responsible for the automated failure detection in hardware and software so that services can be started on another cluster node without human intervention.

Below is a description of the architecture we're willing to deploy (click on the image for full size).

As you can see, we've 02 Coolthreads servers (we used Sun Fire T5440) connected to a shared Storage trough a SAN Infrastructure. A Service and Control Domain was configured as well (primary domain). Below we' won't covered neither the installation of Oracle VM For Sparc ( You can check Oracle VM Server for SPARC Administration Guide on oracle.com) nor the installation of Solaris Cluster (Again, Installation Guide for Solaris Cluster Available on oracle.com), we'll just describe the Failover guest domain Configuration.
Also, this configuration won't allow us to perform neither a Live nor a Warm Migration, only a cold migration will be possible in such configuration (the main reason here is that ZFS cannot be used as a Global FileSystem). If you're not familiar with the LDOM Migration, then i highly recommend you to read the Oracle VM Server for SPARC Administration Guide on oracle.com, the Migration process is very well described in this document.

1. Create the Zpool which will serve as Receptacle for the LDOM Domains Virtual Disks.

After Oracle VM for SPARC Installation/configuration and Solaris Cluster Installation/Configuration, A ZFS storage pool (Zpool) is created on Solaris Cluster did device (the shared LUN provided by the storage was identified as d14 on Solaris Cluster).  Found better to partition the shared LUN and allocate all available space to slice 6.

# zpool create zpool_ldom1 /dev/did/dsk/d14s6

2. Create Resource Group and add your Zpool as resource

Nodes List is provided during the creation in order to set a preference. In fact when the Resource group is brought online, Resource Group Manager will try to bring this RG up on the first node provided in this node list (in our case, ldom1-rg will be brought online on node1).

# clrg create –n node1,node2 ldom1-rg
# clrs create -g ldom1-rg -t HAStoragePlus -p Zpools=zpool_ldom1 ldom1-fs-rs
# clrg online -M ldom1-rg

Check and verify that the Resource Group was brought online and Zpool imported and mounted on node1 (you can also verify that it wasn't mounted on Node2 :) )

#zpool list

3. Configure the Service and Control Domain (here it's the primary domain)

Keep in mind that both SC Domains should have same configurations in term of Services provided (same vswitch, same vdsdev...). So, the commands below should be run on Node1 and on Node2 and every time you perform a reconfiguration on one of the SC Domain, manually do the same on the other one.

#ldm add-vswitch net-dev=nxge0 primary-vsw0 primary
#ldm add-vswitch net-dev=nxge2 primary-vsw1 primary
#ldm add-vdiskserver primary-vds0 primary

4. Create and configure your LDOM's Virtual Disks on Node1

Below, 02 Virtual Disks were created:

#zfs create zpool_ldom1/ldom1
#cd /zpool_ldom1/ldom1
#date && mkfile -v 100G disk0 && date
#date && mkfile -v 100G disk1 && date

After that, VDS-DEV corresponding to those 02 virtuals Disks were configured (again, on both primary domains). Doing so on the node where ldom1-rg is running wasn't a problem (was on Node1), but for the Node2, ldom1-rg should first be switched on the node (Otherwise, you'll have an error -the path must exist when adding the vds-dev)

On Node1:
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node2 ldom1-rg

On Node2:
#ldm add-vdsdev //zpool_ldom1//ldom1/disk0 ldom1-vol0@primary-vds0
#ldm add-vdsdev //zpool_ldom1//ldom1/disk1 ldom1-vol1@primary-vds0
#clrg switch -n node1 ldom1-rg

5. Create and Configure Failover Guest Domain (ldom1)

Now, it's time to create and configure the LDOM.
Configuration of the LDOM itself must be performed only on One Node.
No need to perform the same on the others cluster nodes.

On Node 1:

#ldm add-domain ldom1
#ldm add-vcpu 8 ldom1
#ldm add-memory 16G ldom1
#ldm add-vdisk vdisk0 ldom1-vol0@primary-vds0 ldom1
#ldm add-vdisk vdisk1 ldom1-vol1@primary-vds0 ldom1
#ldm add-vnet vnet0 primary-vsw0 ldom1
#ldm add-vnet vnet1 primary-vsw1 ldom1
#ldm set-variable auto-boot\?=false ldom1
#ldm set-variable local-mac-address\?=true ldom1
#ldm set-variable boot-device=/virtual-devices@100/channel-devices@200/disk@0 ldom1
#ldm bind-domain ldom1
#ldm start ldom1

After that, go on the Console (#telnet localhost vconsport) and proceed to Classic Solaris 10 OS Installation.

6. Perform a test Migration (without LDOM-HA)

We choose to use a non-interactive migration (To perform this migration without being prompted for the target machine password). As explained in the introduction we've to perform a cold migration.
So we'll first stop and unbind the Logical Domain, then trigger a failover for the Zpool which contain the Vdisk of the LDOM and start the migration

On Node1:
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node2 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node2
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine. 

On Node2:
#ldm bind ldom1
#ldm start ldom1
Check that LDOM is booting without any errors.
Perform any other tests you want for LDOM1, then shut it down and bring it back to node1
#ldm stop ldom1
#ldm unbind ldom1
#clrg switch -n node1 ldom1-rg
#ldm migrate-domain -p /pfile ldom1 node1
The -p option takes a file name as an argument. The specified file contains the superuser
password for the target machine. 

7. Create the SUNW.ldom resource for the guest Domain

Now, it's time to automate failure detection in hardware and software so that LDOM can be started on another cluster node without human intervention.
We'll achieve that by creating the following Resource in ldom1-rg

#  clrs create –g ldom1-rg –t SUNW.ldom  –p Domain_name=ldom1 \
> -p password_file=/pfile -p Resource_dependencies=ldom1-fs-rs \
> –p Migration_type=NORMAL ldom1-ldom-rs

the first property (-p) here is simply the Domain Name, the Password File used here is the same used to test the migration (step 5), Resource dependencies is set to the Resource which is responsible of the Zpool which hosts the Vdisk, with this setting we're sure that ldom1 Domain won't be started if the zpool isn't imported and mounted on a node. The last property is just to precise the type of migration to use (Normal for cold Migration).

Test the migration of the Resource Group
# clrg switch -n node2 ldom1-rg

8. Maintenance of the Failover Guest Logical Domain

One thing to note is that SUNW.ldom will bring up the System even if you performed a clean shutdown from the Operating System (like an init 5). This behavior is normal because this agent is configured in a way that it'll always try to keep the Guest Domain up and running. So in order to perform LDOM's operations which request a downtime, the best thing to do is to suspend the resource group which own the domain. For example below, i decided to modify the hostid of a guest Domain which is under SUNW.ldom monitoring.

# ldm set-dom hostid=82a85641 ldom1
LDom ldom1 must be stopped before the hostid can be modified
# ldm ls
primary   active  -n-cv-  SP      16    32992M    11%  19d 4h 4m
ldom1     active  -t----  5001    4     16G       25%  1m
# clrg suspend ldom1-rg
# ldm ls
primary   active  -n-cv-  SP      16    32992M   3.2%  19d 4h 5m
ldom1     active  -t----  5001    4     16G       25%  1m
# ldm stop ldom1
LDom ldom1 stopped
# ldm ls
primary   active  -n-cv-  SP      16    32992M   8.6%  19d 4h 5m
ldom1     bound   ------  5001    4     16G            
# ldm set-dom hostid=82a85641 ldom1
# ldm ls
primary   active  -n-cv-  SP      16    32992M   7.8%  19d 4h 5m
ldom1     bound   ------  5001    4     16G            
# clrg resume ldom1-rg       
# ldm ls
primary   active  -n-cv-  SP      16    32992M   9.0%  19d 4h 7m
ldom1     active  -n----  5001    4     16G       28%  1m


Hope that this will help many people to deploy Highly Available LDOMs.
Below are links for Documentations i used.
Feel free to post any comments.

Oracle Solaris Cluster Essentials, 2010, Prentice Hall