Using compression to supercharge your Linux virtual image with btrfs and zram

As the size of Linux distributions grow along with the resource demands of modern software, virtual images have grown in size both with respect to memory and disk space. Here are couple of ways to save memory and disk space when running a modern Linux Distribution such as Oracle Linux 6.3 (also applies to other similar Linux distros such as RHEL 6.3 or CentOS 6.3). I am using VirtualBox for this image.

1. First rule that i use is to always do a “minimal” install and allocate only 4GB to the first disk.
2. I enabled the public yum repository under /etc/yum.repo.d and chose the UEK repository.
3. Ran “yum update” to ensure UEK kernel is updated.

Now the good stuff…

Transparent filesystem compression with btrfs

  • Create/add a second disk of dynamic size of 20GB
  • Install the btrfs-progs
    $ yum install btrfs-progs
  • fdisk and then format the 2nd disk as btrfs. Yes Oracle btrfs is supported in Oracle Linux 6.3.
    $ mkfs.btrfs /dev/sdb1
  • And this is the most important step. Mount the btrfs filesystem with compress=zlib. Note you can also use other compression algorithms.
    $ mount -t btrfs /dev/sdb1 /app -o compress=zlib
  • Add a permanent entry in /etc/fstab so that this disk is mounted on reboot
    $ echo "/dev/sdb1 /app btrfs compress=zlib" >> /etc/fstab

Supercharge the swap space with zram

  • The UEK kernel already has zram compiled in so you just need to enable it
  • Note the amount of memory/space that you allocate to the /dev/zramX will depend on your own workload and memory allocated to the VM. In this case i have 4GB allocated to the VM and i giving half of it to zram. Modify this simple sample script to your liking. It is a good idea to add it to /etc/rc.local. I would remove the “echo” etc. If you google, you’ll find much more intelligent versions of this script that automatically allocate the sizes.
    #!/bin/sh
    SIZE=2147483648
    DEVICE=zram0
    echo "============== BEFORE ==============="
    free
    echo "============== AFTER ==============="
    echo $SIZE > /sys/block/zram0/disksize
    mkswap /dev/$DEVICE
    swapon -p 100 /dev/$DEVICE
    free

And there you have it. Two simple ways to minimize disk space and maximize your RAM. To give you an idea, the virtual image that i use to run with 8GB of RAM allocated to the VM, now flies with 6GB and occupies 60% less disk space. Obviously your mileage will wary but these two features have done wonders for my image ;-)

iPhone 4S Display Yellow Tint Problem

The one on the left is my wife’s iPhone 4S and on the right is mine. Both from AT&T. I think her’s is the correct tint whereas mine is screwed. I also compared mine with my older iPhone 4 and again mine seemed less crisp and too yellow. There is something definitely “different” in the manufacturing process. Maybe diffrent is a kind work and it should be “wrong”.

DSC_2272 DSC_2273

Installing FreeBSD 8.x via pxelinux and ISO image

The Problem: The main problem is that my old laptop which happens to be my router has a broken cdrom drive and BIOS does not support USB booting but it does support network/pxe booting. And all documents out there are about how to install FreeBSD via PXE using boot/pxeboot file or disk images. I want to use my existing pxelinux based enviroment to do the install.

Solution: Syslinux/pxelinux can actually read ISO images and hence the FreeBSD bootonly iso file can be used.

Ingredients:

  • isc-dhcpd
  • tftpd-hpa
  • vsftpd (optional)
  • syslinux (pxelinux)
  • FreeBSD bootonly iso file
  • FreeBSD disk1 iso file (optional)

Recipie: Setup dhcpd.conf for tftp as described on serveral blogs and docs. Here is the relevant section from mine

subnet 192.168.2.0 netmask 255.255.255.192 {
     range dynamic-bootp 192.168.2.30 192.168.2.60; #dynamic or bootp 
     next-server 192.168.2.1;  # TFTP server address
     filename "gpxelinux.0";   # PXE boot loader filename

Edit the pxelinug.cfg/default menu and add the following section.

label fbsd
  menu label FreeBSD ISO Install
  kernel memdisk
  initrd images/freebsd/FreeBSD-8.2-RELEASE-i386-bootonly.iso.gz
  append iso raw

Note i gzipped the iso image and had to use the “raw” option.

And that’s it folks. Very simple. No hanky panky with mdconfig, extracting files, editing loader.conf etc etc. I installed using local ftp but you can use the other options too.

Improving Directory Import Rate Through ZFS Caching

As we all know, the process of importing data into the directory database is the first step in building a directory service. Importing is an equally important step in recovering from a directory disaster such as an inadvertent corruption of the database due to hardware failure or an application with a bug. In this scenario, a nightly database binary backup or an archived ldif could save the day for you. Furthermore, if your directory has a large number of entries (tens of millions) then the import process can be time consuming. Therefore, it is very important to fine tune the import process in order to reduce initialization and recovery time.

Most import tuning recommendations have focused on the write capabilities of the disk subsystem. Undeniably, it is the most important ingredient of the import process. However as we all know, the input to the import process is a ldif file which is used to initialize and (re)build the directory database. As demonstrated by our recent performance testing effort, the location of the ldif file is also very important. I’ll mainly concentrate on ZFS in this post as time and again it has proven to be the ideal filesystem for the Directory. Note in some cases, you can save hour’s of time by even the smallest gain in the import rate. Especially if your ldif file has tens of millions of entries.

Generally speaking there are few gotchas that need to be kept in mind for the import process. First thing is to ensure that you have a separate partition for your database, logs and transaction logs (this is actually true for any filesystem). For ZFS this translates into separate Pools. Similarly it is recommended to place the ldif file on a pool that is not being used for any other purpose during importing. This maximizes the read I/O for that pool without having to share it with any other process. In ZFS, the Adaptive Replacement Cache (ARC) cache plays an important role in the import process as seen in the table below. ZFS caches can be controlled via the primarycache and secondarycache properties that can be set via the zfs set command. This excellent blog explains these caches in detail. To understand and prove the effectiveness of these caches we ran few tests of imports on a SunFire X4150 system with ldif files of 3 million and 10 million entries each. The ldif file was generated using the telco.template via make-ldif. Details about the Hardware, OS and ZFS configuration and other useful commands are listed in the Appendix.

Dataset primarycache (6GB) secondarycache Time Taken (sec) Import Rate (entries/sec)
3 Million all all 887 3382.19
metadata metadata 1144 2622.38
metadata none 1140 2631.58
none none 1877 1598.3
all none 909 3300.33
10 Million all all 3026 3304.69
metadata metadata 3724 2685.29
metadata none 3710 2695.42
none none 7945 1258.65
all none 3016 3315.65

The table shows the results of various combinations of primarycache and secondarycache on the ldifpool only. The db pool where the directory database is created always had primarycache and secondarycache set to all. The astute reader will observe from the Appendix that the ZFS Intent Log (ZIL) is actually configured on flash memory. This did not skew our results as we are concerned with the ldifpool (where the ldif file resides).

So going back to the table, as expected the primarycache (ARC in DRAM) is obviously the key catalyst in the read performance. Disabling it causes a catastrophic drop in the import rate primarily because prefetching also gets disabled and a lot more reads have to go to the disk directly. The charts below (data obtained via iostat -xc) depicts this very clearly as the disk are lot busier in reading when the primarycache is set to none for the 3 Million ldif file import.

So far, I have concentrated on discussing the primarycache (ARC). What about the secondarycache (L2ARC)? Typically the secondarycache is utilized optimally when used with a flash memory device. We did have flash memory device (Sun Flash F20) added to the ldifpool, however our reads were sequential and by design the L2ARC does not cache sequential data. So for this particular use case the secondarycache did not come into play as evident by the results in the table. Maybe if we limited the size of the ARC to just 1GB or less, the pre-fetches would have “spilled” over to the L2ARC and hence the L2ARC would have contributed more.

Finally a disclaimer, since the intent of this exercise is to show the effect of ZFS caches, the import rate results in the table are for comparison and not a benchmark. And i would also like to thank my colleagues who help me with this blog. These specialists are Brad Diggs, Pedro Vazquez, Ludovic Poitou, Arnaud Lacour, Mark Craig, Fabio Pistolesi, Nick Wooler and Jerome Arnou.

Appendix

zm1 # uname -a
SunOS zm1 5.10 Generic_141445-09 i86pc i386 i86pc

zm1 # cat /etc/release
	              Solaris 10 10/09 s10x_u8wos_08a X86
	       Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
	                 Use is subject to license terms.
	                 Assembled 16 September 2009

zm1 # cat /etc/system | grep -i zfs
* Limit ZFS ARC to 6 GB
set zfs:zfs_arc_max = 0x180000000
set zfs:zfs_mdcomp_disable = 1
set zfs:zfs_nocacheflush = 1

zm1 # zfs set primarycache=all ldifpool
zm1 # zfs set secondarycache=all ldifpool

zm1 # echo "::memstat" | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     189405               739    2%
ZFS File Data               52657               205    1%
Anon                       184176               719    2%
Exec and libs                4624                18    0%
Page cache                   7575                29    0%
Free (cachelist)             3068                11    0%
Free (freelist)           7944877             31034   95%

Total                     8386382             32759
Physical                  8177488             31943

NOTE: The system had three ZFS pools.  The “db” pool for storing the directory database and striped across 6 SATA disks with the ZIL on a flash memory. The “ldifpool” pool was were the ldif file, transaction and access logs were located.  In the import process the transaction and access logs are not used therefore the pool was entirely dedicated to the ldif file.

zm1 # zfs get all ldifpool | grep cache
ldifpool  primarycache          none                   local
ldifpool  secondarycache        none                   local

zm1 # zpool list
NAME     SIZE  USED   AVAIL    CAP  HEALTH  ALTROOT
db       816G  2.25G  814G     0%   ONLINE  -
ldifpool 136G  93.0G  43.0G    68%  ONLINE  -
rpool    136G  75.6G  60.4G    55%  ONLINE  -

zm1 # zpool status -v
   pool: db
	state: ONLINE
	scrub: none requested
	config:

	        NAME        STATE     READ WRITE CKSUM
	        db          ONLINE       0     0     0
	          c0t1d0    ONLINE       0     0     0
	          c0t2d0    ONLINE       0     0     0
	          c0t3d0    ONLINE       0     0     0
	          c0t4d0    ONLINE       0     0     0
	          c0t5d0    ONLINE       0     0     0
	          c0t6d0    ONLINE       0     0     0
	        logs
	          c2t0d0    ONLINE       0     0     0

	errors: No known data errors

	pool: ldifpool
	state: ONLINE
	scrub: none requested
	config:

	        NAME        STATE     READ WRITE CKSUM
	        ldifpool    ONLINE       0     0     0
	          c0t7d0    ONLINE       0     0     0
	        cache
	          c2t3d0    ONLINE       0     0     0

	errors: No known data errors

	pool: rpool
	state: ONLINE
	scrub: none requested
	config:

	        NAME        STATE     READ WRITE CKSUM
	        rpool       ONLINE       0     0     0
	          c0t0d0s0  ONLINE       0     0     0

	errors: No known data errors

ds@dsee1$ du -h telco_*
48G   telco_10M.ldif
14G   telco_3M.ldif

ds@dsee1$ grep cache dse.ldif | grep size
nsslapd-dn-cachememsize: 104857600
nsslapd-dbcachesize: 104857600
nsslapd-import-cachesize: 2147483648
nsslapd-cachesize: -1
nsslapd-cachememsize: 1073741824

SAMLv2 Account Mapping with OpenSSO and Transient Federation

By default OpenSSO uses Persistent Federation for account linking between an IDP and SP when SAMLv2 is used.  This means two things from the point of view of a LDAP administrator.

1.  Ideally the data stores on both IDP and SP should have OpenSSO schema
2.  And the user entry should also be writable by the BIND DN defined in the Data Store.
 
To recap for persistent federation OpenSSO writes two attributes namely sun-fm-saml2-nameid-infokey and sun-fm-saml2-nameid-info to the users entry.  The sun-fm-saml2-nameid-infokey acts at the Opaque Handle.  It holds a uniquely generated random key that is common between the IDP and SP so that the two accounts can be linked.  BTW instead of using these two attributes, one can specify your own too.  This can be done under Configuration->Global->SAMLv2 Service Configuration.
 
To achieve this linking, a first time user first authenticates to the IDP and then to the SP.  This way the user is manually providing the link between the two accounts.  Once this link is established (by writing the above two attributes to the user's entry on each repository), the user no longer has to provide his credentials at the SP.  This is actually what federating an account is all about.
 
There are however scenarios where one or both of the (LDAP) repositories that hold the user entry are read-only and/or no schema modification is allowed.  This mandates the use of Transient Federation which basically does not write back anything to the user repositories, thus eliminating the need to worry about adding custom schema and also allows the ability to use a read-only repository.
 
To use transient federation all you have to do is to pass NameIDFormat=transient as a query parameter to the federation (SOAP) end point servlets.  For example


http://myidp.wfoo.net:8080/opensso/idpssoinit?NameIDFormat=transient&spEntityId=...

  
However by default transient federation account mapping on the SP sides maps to the anonymous user as OpenSSO needs a physical object to create a session (this is not entirely true but that is a topic for another day).  That means there is a many to one mapping from the IDP to the SP.  If you are passing in attributes or some other information, this is not very desirable.
 
To overcome the issue of anonymous mapping you need alternate ways to link the two disparate accounts together which is the responsibility of the Account Mapper.  OpenSSO engineers have already thought about these scenarios and added out-of-box functionality to the account mapper to support these scenarios.

Below are two ways of doing it without any customizations to the account mapper.  Both require user repositories and obviously require a common attribute (and value) that links the accounts together (we would like to read your mind and provide a mind mapper but it is not possible with today's technology).  Also both methods utilize transient federation so that nothing is written to the data store (user repository).
 

Method 1

On the hosted IDP

1.  Click on Federation->IDP name->Assertion Content
2.  Modify (delete and add) "transient" to as follows
 
urn:oasis:names:tc:SAML:2.0:nameid-format:transient=<myattribute>
     For example:  urn:oasis:names:tc:SAML:2.0:nameid-format:transient=uid
 

On the hosted SP

1.  Click on Federation->SP name->Assertion Processinog
2.  On the account mapper check  "Use name ID as user ID"
 
 *** Note the above method requires Express build 8  or later on the SP side.
 

Method 2

 

On the hosted IDP

1.  Click on Federation->IDP name->Assertion Processing
2.  In the Attribute Map  add idpattribute=spattribute
     For example uid=uid
 

On the hosted SP

1.   Click on Federation->SP name->Assertion Processing
2.   Check Auto Federation. And provide the attribute name specified in step 2 above.  For example uid.
 
*** Make sure that the NameIDFormat=transient is used as a query parameter to either the idpssoinit or spssoinit servlet.
 

Installing Oracle WebLogic 10.3.1 (11gR1) on Mac OS X

Yes there are quite a few blogs on this but none of them is as complete as i would like it to be.  So i am documenting my experience here.

 
 

1) First download the bits from Oracle.  Key is to download the "Generic" Package Installer from here

 
 

2) Before starting to install you have to trick the installer into thinking that the local JDK is the generic Sun JDK.  If you skip this step the installer will not accept the default Mac OS X JDK and complain that it is Invalid.

  

  $ cd /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home 

  $ sudo mkdir -p jre/lib

  $ cd jre/lib

  $ sudo touch rt.jar

  $ sudo touch core.jar

 
 

3) Now install WLS using the following command.  The installation is pretty straightforward.

 
 

  $ java -Xmx1024m -Dos.name=unix -jar wls1031_generic.jar

 
 

4) The next challange is to overcome the java.lang.OutOfMemory error that occurs when you try to access the console at http://localhost:7001/console.  As a result the server hangs. 

 
 

To recover from this error you actually have to kill the JVM.  So edit the user_projects/domains/mydomain/bin/setDomainEnv.sh script first and change the line "if [ "${JAVA_VENDOR}" = "Unknown" ] ; then" to "if [ "${JAVA_VENDOR}" = "Sun" ] ; then"

 
 

5) One last thing that is recommended is to set USER_MEM_ARGS="-Xms256m -Xmx512m -XX:MaxPermSize=128m" in startWebLogic.sh script.  I added this as the first line.

 
 

6) Finally start the server

 
 

  $ cd user_projects/domains/mydomain && ./startWebLogic.sh

 

Shrinking Windows Disk: A huge challenge!

Usually i don’t blog about Windows issues since i don’t use it. However recently i bought a Laptop for my father. It had Windows Vista Ultimate Home on it. The hard disk is 250GB in size so i wanted to partition it into smaller ones so that i could have “D” drive for data and possible slap Ubuntu on it too. The problem was that there was one cluster of data right at the end of the partition that could not be moved by defrag or any of the commercial available Defrags out there.

So first i did the usual tricks i.e. set page file to zero, disabled system restore, disabled dumps, disabled hibernation. Deleted all the related files etc.

Then i ran defrag. Still no joy. Then i ran defrag.exe from command line with -w switch. Still no joy.

Then i downloaded a commercial utility called O&O Defrag. This utility still did not move the file(s). But it did help to identify the name which was “$Extend/$UsnJrnl…”

Further research reveled that this journal file was actually being used by the windows indexing service. So naturally i disabled the indexing service. This did release/delete some of the journal file but a small cluster of them still remained. I could not figure out what application was using them.

Then i attempted to use “fsutil usn deletejournal /D C:” command from a System Administror command prompt. I would always get “Access Denied”.

So i downloaded PEbuilder and created a Windows XP SP3 BartPE disk. I booted from the disk and then i ran “fsutil usn deletejournal /D C:” again. This time the command worked since the journal was not opened by any process.

I rebooted and ran a free defrag utility called Auslogics Disk Defrag. Everthing now consolodated to my liking and i was able to resize the partition to my hearts content!

OpenSSO 8 & SAML v2 AttributeStatement

A very useful and essential feature of OpenSSO is to allow attribute mappings.  This enables you to send addtional attributes in the SAMLv2  assertion/response to the Service Provider.  Once the attribute mapping is defined (can be done either from the GUI under the entities “Assertion Processing” tab or in the metadata itself), the map is sent as a name-value pair to the Service Provider.  Also keep in mind that the mapping can and should be defined on the remote service provider so that if your hosted IDP is shared amongst multiple SP’s, each can have their own mapping.  For example here the map was defined from the GUI as USERID=employeeNumber for one of the remote SP’s.

<saml:AttributeStatement><saml:Attribute Name="USERID"><saml:AttributeValue xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">121898</saml:AttributeValue><saml:AttributeValue xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">007</saml:AttributeValue></saml:Attribute></saml:AttributeStatement>

Once the Service Provider receives the assertion and has been configured to look for the attribute name USERID, it will grab the value and do whatever it needs to.  One such real life example is SalesForce.com CRM.  In OpenSSO 8 Express Build 8, there is a wizard to support easy configuration of federation with SalesForce.com which results in a map definition automatically.

One problem that i ran into (not related to the product, phew..) was that however many maps i defined i could not see them in the assertion.  As a matter of fact i could not even see the <saml:AttributeStatement> tag.  Turns out that earlier i had modified the Authentication->Core setting from Profile=required to Profile=ignored.  Reverting back to Profile=required fixed the issue and the assertion started to pass the attributes.

OpenSSO Agent 3.0 and SSL Termination

While working with a customer whose J2EE 3.0 agent was (deployed on Tomcat 6 in non-SSL mode) behind a SSL enabled load balancer i.e., SSL Terminated, we had to set these two properties in the centralized configuration of the Agent under the “Advanced” tab to make it work properly otherwise obviously the agent was redirecting everything to http and port 80 which the Load Balancer was blocking.  Also note that we created the policies with a resource of “https” (as desired).


 

Identity Suites Essentials OpenSSO Tutorials

We have just made available an internal suite of tutorials to the public on http://wikis.sun.com/display/ISE

These tutorials include step by step installation and configuration information both for OpenSSO and Identity Manager.   They also include common post installation use cases that we have seen deployed at various customers.
 
I intend to continue contributing additional tutorials/use cases to the OpenSSO Tutorial so keep watching it.