ReVirt/UML installation howto July 2006 Note: For purposes of this howto, I used the following guest and host ip addresses/hostnames, and guest mac address: Host: 141.212.108.157 covirt8.eecs.umich.edu Guest: 141.212.108.107 uml8.eecs.umich.edu 00:09:09:09:09:08 Adjust according to your network info. Notation: Stuff that you should type at a host prompt is shown with 'host%'. Most things other than compiling need to be done as root. I. Linux installation A. The first step is to install Linux (RHEL 3 WS). - Newer updates to RHEL 3 WS have conflicts between the C runtime and the ReVirt kernel, so start with an older one. I used RHEL 3 WS Update 2. - Accept defaults during the installation. Fill in host networking info as appropriate. - Make sure to leave enough unpartitioned HD space for guest disk image, say at least 10 GB. B. If setting up RHN updates, skip the following packages to avoid the C runtime problem: kernel* gcc* glibc* libgcc* nptl* cpp* libf2c* libgnat* libobjc* II. Obtain ReVirt, compile, and set up host kernel A. Download and decompress tarball. B. Compile everything host% cd uml-kd host% ./build.sh all Note: I think ReVirt only compiles correctly on 2.4-based distributions, so make sure you compile on a 2.4-based machine. (The machine you are installing on is a safe bet.) C. Copy host kernel into /boot and add an entry in grub.conf: host% cp revirt-skas-2.4.18/arch/i386/boot/bzImage \ /boot/vmlinuz-2.4.18-revirt host% vi /boot/grub/grub.conf --- Example grub.conf entry --- title ReVirt 2.4.18 (2.4.18-revirt) root (hd0,0) kernel /boot/vmlinuz-2.4.18-revirt ro root=/dev/hda1 hdc=ide-scsi --- End example grub.conf entry --- D. Reboot into revirt kernel. E. Build/install the ReVirt version of tunctl: host% cd uml-kd/tunctl host% make host% make install F. Build/install host kernel modules: host% cd ../revirt-skas-2.4.18 host% make modules host% make modules_install G. Compile e1000 network card driver (or whatever network card you have): host% tar -zxvf e1000-4.4.19.tar.gz host% cd e1000-4.4.19/src host% make install H. Reboot again. Now the network should work fine. III. Obtain root filesystem for guest and prepare disk image A. Create a root filesystem yourself or download one from somewhere. See the uml website (http://user-mode-linux.sourceforge.net) for more information on creating/downloading filesystems. I downloaded a RedHat 9 based root filesystem from: http://www.stearns.org/uml-root/ The filename is: root_fs.rh-9-full.pristine.20030724.bz2 B. If you downloaded a filesystem, bunzip2 it. C. Copy the _filesystem_ image into a _disk_ image. This will allow for multiple partitions on the guest's disk. 1. Make a big empty file of the desired guest disk size. Use "standard" disk geometry of 16 heads, 63 sectors/track, and 512 bytes/sector. This means each cylinder is 16*63*512=516096 bytes, or approximately 500KB. Decide how big you want your guest disk image to be (make sure it's at least big enough to hold your root fs plus some swap space) and calculate the appropriate number of cylinders. I'll use a 3 GB guest disk, so that's approximately 6000 cylinders. So, host% dd if=/dev/zero of=guest_disk_image bs=516096c count=6000 2. Attach the new file to a loopback device host% losetup /dev/loop0 guest_disk_image 3. Partition the disk image. I decided to make a 1GB partition for the guest root filesystem, and a 512 MB partition for guest swap, and leave the rest unpartitioned. host% fdisk -C6000 -S63 -H16 /dev/loop0 (Obviously, the arguments to fdisk specify the disk geometry.) --- Example fdisk session --- The number of cylinders for this disk is set to 6000. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): o Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 6000. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): p Disk /dev/loop0: 3096 MB, 3096576000 bytes 16 heads, 63 sectors/track, 6000 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-6000, default 1): 1 Last cylinder or +size or +sizeM or +sizeK (1-6000, default 6000): +1500M Command (m for help): p Disk /dev/loop0: 3096 MB, 3096576000 bytes 16 heads, 63 sectors/track, 6000 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System /dev/loop0p1 1 2907 1465096+ 83 Linux Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (2908-6000, default 2908): Using default value 2908 Last cylinder or +size or +sizeM or +sizeK (2908-6000, default 6000): +512M Command (m for help): p Disk /dev/loop0: 3096 MB, 3096576000 bytes 16 heads, 63 sectors/track, 6000 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System /dev/loop0p1 1 2907 1465096+ 83 Linux /dev/loop0p2 2908 3900 500472 83 Linux Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): 82 Changed system type of partition 2 to 82 (Linux swap) Command (m for help): p Disk /dev/loop0: 3096 MB, 3096576000 bytes 16 heads, 63 sectors/track, 6000 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System /dev/loop0p1 1 2907 1465096+ 83 Linux /dev/loop0p2 2908 3900 500472 82 Linux swap Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 22: Invalid argument. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. --- End example fdisk session --- 4. Detach the loop device, and attach it again. host% losetup -d /dev/loop0 host% losetup /dev/loop0 guest_disk_image 5. View the partition table (the u is to display units in sectors instead of cylinders) host% fdisk -lu /dev/loop0 --- Example partition table for guest disk --- Disk /dev/loop0: 3096 MB, 3096576000 bytes 16 heads, 63 sectors/track, 6000 cylinders, total 6048000 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System /dev/loop0p1 63 2930255 1465096+ 83 Linux /dev/loop0p2 2930256 3931199 500472 82 Linux swap --- End example partition table for guest disk --- You will need some of these values in the next few steps. 6. Detach loop device again host% losetup -d /dev/loop0 7. Now reattach _only the first partition of the image_ to the loop device. To do this, you need to specify the byte offset at which the partition starts, because the partition table is at the beginning of the disk image. In my example, the start sector was 63, and each sector is 512 bytes, so that means the byte offset is 63*512=32256. host% losetup -o32256 /dev/loop0 guest_disk_image 8. Format the partition. Using an ext3 filesystem should be fine. The last argument to mkfs.ext3 is the number of blocks in the partition, which you can get from your partition table (1465096 in my example). host% mkfs.ext3 -b1024 /dev/loop0 1465096 9. Mount the newly-created filesystem host% mount /dev/loop0 /mnt/dummy0 10. Copy the files from your root fs image into the new filesystem. First, attach the root fs image to another loopback device, then mount it somewhere. Then copy the files. First, attach the root fs image (example filename root_fs) to a loopback device host% losetup /dev/loop1 root_fs Mount it somewhere host% mount /dev/loop1 /mnt/dummy1 Copy all the files into your new fs on the partitioned disk image: host% cd /mnt/dummy1 host% cp -a * /mnt/dummy0 We don't need the plain filesystem image anymore. host% cd /mnt host% umount /mnt/dummy1 host% losetup -d /dev/loop1 11. Now you need to edit some files in the guest fs, in the /mnt/dummy0 directory: host% cd /mnt/dummy0 Note: exactly which files you need to edit depends on what distribution your guest fs is based on. This is for a RedHat 9 based fs. a. Guest resolv.conf should look the same as host resolv.conf. b. Guest /etc/hosts should have, at minimum, lines for loopback, guest, and host: --- Example guest hosts file --- 127.0.0.1 localhost 141.212.108.107 uml8.eecs.umich.edu uml8 141.212.108.157 covirt8.eecs.umich.edu covirt8 --- End example guest hosts file --- c. Guest /etc/sysconfig/network should have guest hostname and gateway: --- Example guest /etc/sysconfig/network file --- NETWORKING=yes HOSTNAME=uml8.eecs.umich.edu FORWARD_IPV4=false GATEWAY=141.212.108.157 --- End example guest /etc/sysconfig/network file --- d. Guest /etc/sysconfig/network-scripts/ifcfg-eth0 needs to set up a static IP and some other info for the guest eth0 interface. (Note: even though HWADDR is a valid field, in this file, make sure you do NOT put the MAC address in this file! It seems to mess up the ifup script when you boot the guest.) --- Example guest /etc/sysconfig/network-scripts/ifcfg-eth0 file --- DEVICE=eth0 BOOTPROTO=static ONBOOT=YES IPADDR=141.212.108.107 GATEWAY=141.212.108.157 NETMASK=255.255.255.0 NETWORK=141.212.108.0 BROADCAST=141.212.108.255 --- End example guest /etc/sysconfig/network-scripts/ifcfg-eth0 file --- e. Guest /etc/sysconfig/static-routes should route everything through the host: --- Example guest /etc/sysconfig/static-routes file --- any host 141.212.108.157 dev eth0 any net 0.0.0.0 netmask 0.0.0.0 gw 141.212.108.157 --- End example guest /etc/sysconfig/static-routes file --- f. Guest /etc/inittab needs to have tty's commented out. Open up that file and either comment out or delete the lines at the end that look something like this: 1:2345:respawn:/sbin/mingetty tty1 2:2345:respawn:/sbin/mingetty tty2 3:2345:respawn:/sbin/mingetty tty3 4:2345:respawn:/sbin/mingetty tty4 5:2345:respawn:/sbin/mingetty tty5 6:2345:respawn:/sbin/mingetty tty6 I think it is actually ok to leave the tty1 line, but comment out the rest. Otherwise, when you boot the guest you may get messages like: INIT: Id "0" respawning too fast: disabled for 5 minutes INIT: Id "2" respawning too fast: disabled for 5 minutes INIT: Id "c" respawning too fast: disabled for 5 minutes INIT: Id "1" respawning too fast: disabled for 5 minutes 12. Now we need to set up the ubd devices for the guest. This has to be done with the host chroot'ed so it will make the devices in the guest /dev, not the host /dev. You may be able to skip this step if your guest fs already has the required nodes (mine didn't). If you cd into the guest's /dev and see devices like ubda, ubda1, ubda2, then you don't need to do this. If instead of those you see a directory called ubd with devices named 0 through 7 in it, then you do need to do this. The ubd/0 through ubd/7 devices are an old way of naming ubd devices. There is a helpful script called makeUBDdev.sh for creating the device nodes. host% cd / host% chroot /mnt/dummy0 host-chroot% cd /dev <--- so this is really /mnt/dummy0/dev host-chroot% wget http://www.linode.com/~caker/uml/makeUBDdev.sh host-chroot% chmod a+x makeUBDdev.sh host-chroot% ./makeUBDdev.sh You should now have the nodes named /dev/ubda, /dev/ubda1, and so forth. host-chroot% rm makeUBDdev.sh host-chroot% exit <--- exits the chroot 13. The guest /etc/fstab may need to be edited to reflect the correct ubd device nodes. --- Example guest /etc/fstab file --- /dev/ubda1 / ext3 defaults 1 1 /proc /proc proc defaults devpts /dev/pts devpts mode=0622 0 0 /dev/ubda2 none swap sw --- End example guest /etc/fstab file --- 14. Now do a make modules and make modules install for the guest kernel. In the default revirt guest .config, there are no components set up as modules, but might as well do this anyway. host% cd uml-kd/uml-2.4.20 host% make ARCH=um modules 15. Unmount and detach the guest root filesystem partition host% umount /mnt/dummy0 host% losetup -d /dev/loop0 16. Now set up some swap space for the guest. From the partition table of your disk image (which was printed all the way back in step III.C.5), look at the number of blocks of the partition that you intended for swap. For my example, it is 500472. Attach the disk image to loopback device, using the offset of your intended swap partition (for my example it is 2930256*512=1500291072) host% losetup -o1500291072 /dev/loop0 guest_disk_image Then make a swap space of the same size as the partition. host% mkswap /dev/loop0 500472 Done. host% losetup -d /dev/loop0 IV. Setting up the host dbraw device First, create a partition "sufficiently larger than" your disk image. I used a 10 GB partition for my 3 GB disk, it might not need to be that much bigger though. Suppose the partition is /dev/hda3. First, bind raw device to /dev/hda3: host% raw /dev/raw/raw3 /dev/hda3 Zero out 128 MB map for dbraw disk map: host% dd if=/dev/zero of=/dev/raw/raw3 bs=1M count=128 Set ReVirt and dbraw environment variables in .bashrc: # For the dbraw disk export DBRAW_MAJOR=3 export DBRAW_MINOR=3 export DBRAW_NUM_GUEST_BLOCKS=6048000 # For revirt export HOSTIP=141.212.108.157 export VMIP=141.212.108.107 export UMLHDA=/dev/raw/dbraw3 export UMLHDA_ROOT=/dev/ubda1 export UMLMEM=256M export VMMAC=00:09:09:09:09:08 (Note about the DBRAW_NUM_GUEST_BLOCKS: dbraw uses a block size of 512 bytes, so this is simply the # of bytes of your entire guest disk image, divided by 512. Don't forget to source .bashrc now. cd into uml-kd/revirt-skas-2.4.18/uml and do: ./insmoduml ./dbrawctl -s -j $DBRAW_MAJOR -n $DBRAW_MINOR ./dbrawctl -i $DBRAW_NUM_GUEST_BLOCKS -n $DBRAW_MINOR dd if=guest_disk_image of=/dev/raw/dbraw3 bs=1M ./dbrawctl -f -n $DBRAW_MINOR 10. Ok, now try netrun. host% cd ../../uml-2.4.20 host% ./netrun.sh Make sure the guest boots up and you can ping/ssh to it from the host, and from another machine. ssh to the machine and halt it. 11. If that works, try netlog and replay. host% ./netlog.sh ssh to the machine, halt it. host% ./replay.sh Some miscellaneous notes: - The first time I tried to ssh to my guest, I got the error Read from remote host : Connection reset by peer Connection to closed. immediately after entering the password. I guess there can be multiple reasons for this, but the problem for me was that the guest /etc/shadow had the password set to expire, which it already had. It may be worthwhile to make sure this is not the case before you copy the root filesystem to the dbraw partition.