Java performance on Niagara processors ( UltraSPARC T1/T2/T3 )

This problem known as much time as Niagara processor but still a lot of Java applications suffer from it
Some JVM parameters vary depending on the machine type. For example JVM will set different garbage collectors for different types of machines.
At start JVM detects that Niagara system has a lot of “cpu’s” and so this is a server and if garbage collector is not set it will use parallel gc. And this garbage collector creates a number of gc threads, which is comparable to number of “cpu’s”. ( looks like correct formula for today is : (ncpus <= 8) ? ncpus : 3 + ((ncpus * 5) / 8) ) So you will have really a lot gc threads per JVM at your Niaraga system. And it isn't really a big problem until you use just a few number of JVM's. But if your application use really a lot of JVM's or, for example, you use your Niagara server to consolidate some Java applications this overhead will be significant. Last time I saw this problem at T5440 with just 74 JVM's. And it led to periodical hangs of the whole server. Yes, this is a bug. But there are a lot of Java applications at Niagara servers which use rather old JVM right now. And some of them experiencing performance problems... Solution is really simple. You can set number of gc threads by your hands :

-XX:ParallelGCThreads=N 

Where 4 will be a good N for most of applications.
This can not only solve performance problems with some apps but also free some servers resources.

network load balancing with Solaris

I know that everybody forgot about this blog but I’ll try to liven it up.
Some days ago at Moscow OpenSolaris User Group meeting was 2 presentations about network. First one was about Crossbow and all around and second one was about network load balancing … with Linux and FreeBSD and nothing about Solaris.
So as nobody knew anything about network load balancing with Solaris i have decided to write about it.
There are some solutions for network balancing in FreeBSD and Linux and really nothing like this exists in Solaris 10.
Time stands still and Solaris 11 Express is already exists and OpenSolaris already doesn’t exists 🙂
It is in the OpenSolaris appeared support ILB ( Integrated Load Balancer ) and now it is included to Solaris 11 Express.
So let’s try to configure Solaris ILB.
First of all you need to check that ilb is installed and if no, install it.

root@solaris:~# pkg search ilbadm
INDEX      ACTION VALUE           PACKAGE
basename   file   usr/sbin/ilbadm pkg:/service/network/load-balancer/ilb@0.5.11-0.151.0.1
root@solaris:~# man pkg
root@solaris:~# pkg install pkg:/service/network/load-balancer/ilb@0.5.11-0.151.0.1
               Packages to install:     1
           Create boot environment:    No
               Services to restart:     1
DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       11/11      0.2/0.2

PHASE                                        ACTIONS
Install Phase                                  38/38 

PHASE                                          ITEMS
Package State Update Phase                       1/1 
Image State Update Phase                         2/2 

PHASE                                          ITEMS
Reading Existing Index                           8/8 
Indexing Packages                                1/1
root@solaris:~#

Now ILB installed and we can configure load balancing.
First of all let’s start ilb daemon. But before this we ought to start ip forwarding. If you forget to do so ilbadm output would be really helpful.

root@solaris:~# ilbadm show-rule
ilbadm: socket() failed
root@solaris:~#

So let’s start ip forwarding and ilb daemon.

 root@solaris:~# svcadm enable svc:/network/ipv4-forwarding
root@solaris:~# svcadm enable svc:/network/loadbalancer/ilb:default 

ILB supports 3 modes :
– DST
– Full NAT
– Half NAT

In the DSR ( Direct Server Return ) mode, ILB balances the incoming requests to the back-end servers, and letting the return traffic from the servers bypass the load balancer by being sent directly to the client.
NAT-based load balancing involves rewriting of IP header information,and handles both the request and the response traffic.
In Full NAT mode ILB involves rewriting of both source and destination IP fields, making it appear to the back-end servers that all connections are originating at the load balancer. Clients will also receive packets from preconfigured IP range.
In the half-NAT mode ILB rewrites only destination IP address.

I have used 3 servers to try ILB. Two of them i used as back-end servers. I have started nginx there. There IPs : 192.168.57.102,192.168.57.103. ILB was started at the third server with IP 192.168.57.101.

First of all I’ve configured server group, which will process http traffic :

root@solaris:~# ilbadm create-servergroup -s servers=192.168.57.102:80,192.168.57.103:80 websg
root@solaris:~# ilbadm show-servergroup
root@solaris:/tmp# ilbadm show-sg
SGNAME         SERVERID            MINPORT MAXPORT IP_ADDRESS
websg          _websg.0            80      80      192.168.57.102
websg          _websg.1            80      80      192.168.57.103

And then I’ve configured balance rule. First of all let’s try full NAT.

root@solaris:/tmp# ilbadm create-rule -e -i vip=192.168.56.101,port=80 -m lbalg=rr,type=NAT,proxy-src=192.168.56.101 -o servergroup=websg webrule
root@solaris:/tmp# ilbadm show-rule
RULENAME            STATUS LBALG       TYPE    PROTOCOL VIP         PORT
webrule             E      roundrobin  NAT     TCP  192.168.56.101  80

And that’s all!
I have configured that http traffic to 192.168.56.101 ( virtual IP ) will be load balanced across 2 servers using round robin algorithm. Clients will receive replies from IP 192.168.56.101.
Let’s change ILB mode from full NAT to Half NAT.

ilbadm  delete-rule -a
ilbadm create-rule -e -i vip=192.168.56.101,port=80 -m lbalg=rr,type=h -o servergroup=websg webrule
root@solaris:/tmp# ilbadm show-rule
RULENAME            STATUS LBALG       TYPE    PROTOCOL VIP         PORT
webrule             E      roundrobin  HALF-NAT TCP 192.168.56.101  80

This is really easy to configure. Of course there are a lot of options. For example you can select load balancing algorithm Moreover, ILB offers an optional server monitoring feature that can provide server health checks. Another great options are : Session persistence and Connection draining
More details about the ILB can be read on wiki
I’m not really an expert in load balancers but ILB looks much more useful and functional than FreeBSD or Linux analogs.

UFS internals

I wrote about fsdb usage for vxfs some time ago. But now rather often questions about ufs asked, in spite of file system age and rather good manual. So I decide to write about fsdb for ufs.

When you use fsdb you must be very accurate. One mistake and all your data lost.
Run fsdb with write permissions :

fsdb -o w /dev/rdsk/_your_file_system_ 

Fsdb syntax is rather unusual and similar to adb. All commands start with “:”.
For example

:ls 

command.
Also will work

 :ls / 

But you must remember that / is a root of file system, which you open with fsdb, and if you, for example, open var, it will be root of var file system, not a root file system.
:ls have only 2 options. -l will return list of files with inode numbers and -R will do recursive listing.
You can use :cd command to change directory.
Very useful command :base can change numeral system from hexadecimal ( by default) to octal :

:base=0t10 

In fsdb use concept of “dot” . First of all you must select an object, to work with, and so give to “dor” value – address of this object. And all following commands will work regarding value of “dot”.
So if you decide to do something with inode 5, at first you must select it.
It can be done so :

5:inode 

And after this you can display info about this inode :

 ?i 

Or you can unite this to commands to one :

5:inode?i

/dev/rdsk/c1t3d0s0 > 5:inode?i
i#: 5 md: d---rwxr-xr-x uid: 0 gid: 3
ln: 4 bs: 2 sz : c_flags : 0 200

db#0: 2fc
accessed: Wed Apr 29 16:24:50 2009
modified: Wed Feb 25 13:40:05 2009
created : Wed Feb 25 13:40:05 2009

I think that output is rather logical and can be easily understood. I’d like to look more attentively at the value of db field. DB is a direct block. Actually speaking in db you can find file data. I hope that everybody remember that inode in ufs consists of 12 direct blocks, 3 indirect blocks. IB, is a block, that consists as much as 2048 links to other blocks, and no data at all. It being known that only first one consists of links to db, and if it isn’t enough – second IB will be used. This IB also known as double indirect block. It consists of 2048 links to ib, which consists of links to db. As third ib is triple indirect block and I think you can understand what contains in it by yourself.
Going back to my output we see, that inode contain only one db ( zero ) in block 2fc.,
But we look aside.
Type of inode, file or directory can be easily understood from md (
mode ) field. If it contains a flag than it is directory, if not – file.
If it turn out that this is a directory, :ls will show it’s content.

As everybody remember directory in ufs nothing else but array, where correspondence of inode to file name listed.
After you select inode, which prove to be a directory, you can list and modify these fields.
Actually speaking :ls is showing them for you, but in another order. It can be done in the same order :

0:dir?d
1:dir?d
2:dir?d

If you are rather lazy to write next (3:dir?d) command you can just
press Enter and command :dir?d will be done for the next element.
If you so lazy even to press Enter 20 times you can display 20 elements from 0 block from 2 inode using this command sequence :

2:ino; 0:db:block,20?d 

Or just

308:fragment,20?d 

If you decide that any field ( let it bee 5’th ) must link not to the 22
inode, but, for example, to the 66, you can change it yourself by this
command :

5:dir?d=42 

because 42 – 66 in hex
Note, that file name will stay the same.

5:dir:nm="test"

will also change the name.
I think now you can do with directories everything you want. Lats go to files.
Everything pretty the same.

/dev/rdsk/c1t3d0s0 > :ls -l /etc/passwd
/etc:
i#: a317 passwd
/dev/rdsk/c1t3d0s0 > 0xa317:inode?i
i#: a317 md: ----rw-r--r-- uid: 0 gid: 3
ln: 1 bs: 2 sz : c_flags : 0 395

db#0: 6a8db
accessed: Wed Apr 29 16:20:06 2009
modified: Mon Apr 27 11:59:48 2009
created : Mon Apr 27 11:59:48 2009

/dev/rdsk/c1t3d0s0 > 0:db:block,100/c

And we have content of /etc/passwd on the screen. Now the question is how we can change it?
Easily!
It can be done by some ways.
To fill some pert of file with zero’s :

6a889:fragment,4=fill=0x0 

Or just to write some date to any address :

1aa22400=0xffff 

If you like to write text, it can be done with this command :

1aa36c00="root" 

So, which way to remove inode at not mounted file system is the easiest one? Of cause by clri command 🙂

vxfs debugging

Some days ago I have a good weekend, trying to restore vxfs after disaster. And I can say it is rather easy.
So,

 fsdb -F vxfs /dev/dsk/c5t4849544143484920443630303136303330303033d0s6 

Superblock usually have address 8192б so we can read it :

> 8192B.p S
super-block at 00000008.0000
magic a501fcf5 version 6
ctime 1162462873 656465 (Thu Nov 2 13:21:13 2006 MSD)
log_version 11 logstart 0 logend 0
bsize 1024 size 10176000 dsize 10176000 ninode 0 nau 0
defiextsize 0 oilbsize 0 immedlen 96 ndaddr 10
aufirst 0 emap 0 imap 0 iextop 0 istart 0
bstart 0 femap 0 fimap 0 fiextop 0 fistart 0 fbstart 0
nindir 2048 aulen 32768 auimlen 0 auemlen 8
auilen 0 aupad 0 aublocks 32768 maxtier 15
inopb 4 inopau 0 ndiripau 0 iaddrlen 8 bshift 10
inoshift 2 bmask fffffc00 boffmask 3ff checksum ea56180b
free 10156420 ifree 0
efree 0 0 1 2 1 1 1 2 0 2 1 0 1 1 1 1 2 2 1 2 2 1 1 0 0 0 0 0 0 0 0 0
flags 0 mod 0 clean 5a
time 1162468635 951515 (Thu Nov 2 14:57:15 2006 MSD)
oltext[0] 32 oltext[1] 19458 oltsize 1
iauimlen 1 iausize 4 dinosize 256
checksum2 0 checksum3 0
fsetquotaction 0
fsetquotahardlimit 0 fsetquotasoftlimit 0
log_gen 2
fs_metadevid 0 fs_metablkno 0 fs_metatype 0
fs_bsoffset 0 fs_bsdevid 0 fs_bssize 0

Alternative superblock can be fount in inode 33 in 1 fileset. Haw to
look at it read this paper 🙂

VXFS use filesets. Each fileset have it’s own inode list and data blocks and even can be mounted separately. So checkpoints implemented for example.

> listfset
fset index fset name
 1 ATTRIBUTE
 999 UNNAMED

listfset command shows all fsets. I have no checkpoints, so I have only 2. You can see information about fileset using command fset .

> 999fset
fset header structure at 0x0000000a.0000
fsh_fsindex 999 fsh_fsetname "UNNAMED"
fsh_version 5 fsh_checksum 0xcffdaf6b
fsh_time 1162465202 992960 (Thu Nov 2 14:00:02 2006 MSD)
fsh_ninode 32 fsh_nau 1 fsh_old_ilesize 0 fsh_eopdata 0
fsh_fsextop 0x0 fsh_dflags 0x311 fsh_quota 0 fsh_maxinode 4294967295
fsh_ilistino[65 97] fsh_iauino 64 fsh_lctino 0
fsh_uquotino 69 fsh_gquotino 70
fsh_attr_ninode 256 fsh_attr_nau 1 fsh_attr_eopdata 0
fsh_attr_ilistino[67 99] fsh_attr_iauino 66 fsh_attr_lctino 68
fsh_features 0x1
fsh_previx 0 fsh_nextix 0
fsh_ctime 1162462873 656465 (Thu Nov 2 13:21:13 2006 MSD)
fsh_mtime 1162462909 466010 (Thu Nov 2 13:21:49 2006 MSD)

Here we can see a lot of useful information and meaning of variables can be easily understandable. Detailed info you can read at man inode_vxfs and fs_vxfs. Typically inode is 256 bytes but also can be 512 (version >5). Fileset “Attribute” is a structural fileset, it contains list of inodes, This fileset used internally by filesystem and it always has index 1. I have found only 2 inodes in this fileset, which are referencing inode list file. They are 65 and 97 (fsh_ilistino) Type IFILT confirms that this inode is inode list file. I think that fixextsize/fsindex field told to us to which fileset this inode list is delivered. We can look at this inode :

> 1fset.65i
inode structure at 0x00000018.0100
type IFILT mode 4000000777 nlink 1 uid 0 gid 0 size 8192
atime 1162462873 656465 (Thu Nov 2 13:21:13 2006 MSD)
mtime 1162462873 656465 (Thu Nov 2 13:21:13 2006 MSD)
ctime 1162462873 656465 (Thu Nov 2 13:21:13 2006 MSD)
aflags 0 orgtype 1 eopflags 0 eopdata 0
fixextsize/fsindex 999 rdev/reserve/dotdot/matchino 97
blocks 8 gen 15307 version 0 0 iattrino 0
dotdot 0 inattrino 0
de: 16680 0 0 0 0 0 0 0 0 0
des: 8 0 0 0 0 0 0 0 0 0
ie: 0 0
ies: 0

de field list direct extends and des – their sizes in file system blocks. So

> 16680b.p6I - will print info for first 6 inodes.
.....
type IFDIR mode 40755 nlink 5 uid 0 gid 1 size 96
atime 1162462924 488018 (Thu Nov 2 13:22:04 2006 MSD)
mtime 1162462955 527947 (Thu Nov 2 13:22:35 2006 MSD)
ctime 1162462955 527947 (Thu Nov 2 13:22:35 2006 MSD)
aflags 0 orgtype 2 eopflags 0 eopdata 0
fixextsize/fsindex 0 rdev/reserve/dotdot/matchino 2
blocks 0 gen 4682 version 0 6 iattrino 0
dotdot 2 inattrino 0
inode structure at 0x00004129.0100
type IFDIR mode 40755 nlink 2 uid 0 gid 1 size 96
atime 1162462925 824823 (Thu Nov 2 13:22:05 2006 MSD)
mtime 1162462925 824823 (Thu Nov 2 13:22:05 2006 MSD)
ctime 1162462925 824823 (Thu Nov 2 13:22:05 2006 MSD)
aflags 0 orgtype 2 eopflags 0 eopdata 0
fixextsize/fsindex 0 rdev/reserve/dotdot/matchino 2
blocks 0 gen 21988 version 0 2 iattrino 0
dotdot 2 inattrino 0

First 2 inodes are reserved, and not used. 2 inode is root directory of file system and 3 – lost+found dir. Also you can read inode information from fileset directly :

> 999fset.13i
inode structure at 0x0000412b.0100
type IFREG mode 100644 nlink 1 uid 0 gid 1 size 38
atime 1162462955 528044 (Thu Nov 2 13:22:35 2006 MSD)
mtime 1162468630 420948 (Thu Nov 2 14:57:10 2006 MSD)
ctime 1162468630 420948 (Thu Nov 2 14:57:10 2006 MSD)
aflags 0 orgtype 1 eopflags 0 eopdata 0
fixextsize/fsindex 0 rdev/reserve/dotdot/matchino 0
blocks 1 gen 17493 version 0 10 iattrino 0
dotdot 4 inattrino 0
de: 16715 0 0 0 0 0 0 0 0 0
des: 1 0 0 0 0 0 0 0 0 0
ie: 0 0
ies: 0

As block size is (bsezie in superblock) 1024, we can read this file, be reading 1 block from 16715 address.

> 16715b.p1024c 

and we see our file. c means print in chars but you can output ix hex for example, just change c to x. So you can read all information about file system and where files directly situated.

After this basic info you can play with vxfs yourself 🙂
good luck

What processes are swapped? (w vmstat field)

There are some fields in vmstat output, which are hard to interpret. One of them is “w” I think. Yes, from man page we know, that it is “the number of swapped out lightweight processes (LWPs) that are waiting for processing resources to finish.”. So what we ought to do if it is null?
In most cases, it means that some time ago (now?) this server is low on memory. But what can we do if we solve this problem but this value doesn’t change?

bash-3.2# vmstat 5
 kthr      memory            page            disk          faults     cpu
r b w   swap  free  re  mf pi po fr de sr s6 sd sd sd   in   sy   cs us sy id
0 0 84 5041144 290400 1  6  0  0  0 1176 0 0  0  0  7 4231  164  838  0 2 98
0 0 84 5041136 290456 0  0  0  0  0 376 0  0  0  0  0 4228  148  575  0 2 98
0 0 84 5041136 290456 0  0  0  0  0  0  0  0  0  0  0 4232  162  579  0 2 98
0 0 84 5041128 290448 0  0  0  0  0  0  0  0  0  0  0 4227  149  637  0 2 98

First of all we can see lwp’s from what processes are swapped :

echo "::walk thread thr |::print kthread_t t_schedflag|::grep .==0 |::eval p_pidp->pid_id" | mdb -k

Or command name :

echo "::walk thread thr |::print kthread_t t_schedflag|::grep .==0 |::eval p_user.u_comm" | mdb -k

So we know lwps of what processes are still swapped. If this number in vmstat doesn’t grow, in most cases it isn’t a problem, but for a lot of people it is alarm. So to take processes date back to memory from swap we must ask this process to do something, for example sending SIGHUP to it, or just trying to truss it. Truss will need some date about process ant it will go out from swap.

Konsole tab names by script. Ssh wrapper

I have a lot of small scripts, which are very useful for me, but the famous one by my friends is ssh wrapper for Konsole.
At work I usually have a lot of opened ssh sessions to different servers. My work system is linux notebook and Konsole as default terminal emulator. So some years ago I wrote small ssh wrapper, which set name to Konsole tab by last argument to ssh (hostname).

antony@amaliya:$ cat /usr/local/bin/ssh
#!/bin/bash
# Kocsole wrapper around ssh to rename tabs
# anton@pavlenko.net

REAL_SSH=/usr/bin/ssh
DCOP=/opt/kde/bin/dcop

if [ ! -z "$KONSOLE_DCOP_SESSION" ]
then

 # Use the last argument as the title
 for arg in $@; do
 NEW_TITLE="$arg"
 done

 OLD_TITLE=`dcop "$KONSOLE_DCOP_SESSION" sessionName`
 $DCOP "$KONSOLE_DCOP_SESSION" renameSession "$NEW_TITLE"

 function restore_title() {
 $DCOP "$KONSOLE_DCOP_SESSION" renameSession "$OLD_TITLE"
 }

 # If SSH is interrupted (CTRL-C), restore the old title
 trap "restore_title" SIGINT
 $REAL_SSH $*
 restore_title

else
 $REAL_SSH $*
fi

So if you have the same problems – use this wrapper!

resource controls and zones

Nowadays Solaris Zones and resource control became more and more popular. But when you use it output of usual Solaris commands became confusing. For example if you use zone.max-swap rctl swap -l in and other tools in Zone will return info about swap for global zone, without of any restrictions. The same for nlwps resource control and others.
So how can we get real amount of used swap or lwps? And not only from global zone.
kstat can help.
There is zone_caps class in kstat, so kstat -c zone_caps -n swapresv_zone_1 will show all info about zone.max-swap for zone, with id 1.

To list oll used resource controls :

kstat -p caps:::usage