_________________________________________________________________ Virtual Environment HOWTO (C) SWSoft Pte Ltd., 2000 Denis Lunev den@asp-linux.com Yuri Pudgorodsky yur@asp-linux.com Alexander Tormasov tor@asp-linux.com v0.0, 21 July 2000 _________________________________________________________________ This document came about to satisfy the ever increasing need to know how to make your own virtual environment. _________________________________________________________________ 1. Copyright/Distribution 2. Introduction * 2.1 Project goal * 2.2 Knowledge Required * 2.3 Other sources of information * 2.4 Feedback * 2.5 Revision History 3. How it works. * 3.1 Virtual Environments * 3.2 Filesystem * 3.3 Quality of Service 4. Availability * 4.1 Download by FTP * 4.2 Anonymous CVS 5. Kernel Configuration 6. Installing support utilities. 7. Filesystem setup * 7.1 VEFS * 7.2 Setting up chroot tree * 7.3 Sample chroot fs setup script 8. Running virtual environment * 8.1 Starting virtual environment * 8.2 envutil * 8.3 Runtime access of virtual environment * 8.4 Configuring virtual environment * 8.5 Stopping virtual environment 9. Comparison with other similar solution * 9.1 Implementation strategies * 9.2 Comparison Table 10. Acknowledgement _________________________________________________________________ 1. Copyright/Distribution This document is Copyright (c) 2000 by SWsoft Pte Ltd. A verbatim copy may be reproduced or distributed in any medium physical or electronic without permission of the author. Translations are similarly permitted without express permission if it includes a notice on who translated it. Commercial redistribution is allowed and encouraged; however please notify SWsoft Pte Ltd of any such distributions. Excerpts from the document may be used without prior consent provided that the derivative work contains the verbatim copy or a pointer to a verbatim copy. Permission is granted to make and distribute verbatim copies of this document provided the copyright notice and this permission notice are preserved on all copies. In short, we wish to promote dissemination of this information through as many channels as possible. However, I do wish to retain copyright on this HOWTO document, and would like to be notified of any plans to redistribute this HOWTO. _________________________________________________________________ 2. Introduction 2.1 Project goal The goal of this project is to develop, maintain and deploy software solutions and policies that support scalable highly available platform for Applications Service Providing (ASP). This support is transparent to the user and does not require modification of existing applications. The purpose of virtual environment is to allow a single machine to serve a number of different instances of normal operation system with a minimal overhead. This includes starting system services from the 'init' process, and surely include individual IP address for each running virtual environment. The kernel then multiplexes (swaps between them very fast) in the background and to the user it appears like you have more than one server machine. 2.2 Knowledge Required Enabling your system for virtual environment support is not all that difficult. It just requires some basic knowledge about general aspects of UNIX system configuration and Linux system configuration in particular. You also requires to patch, compile and install custom Linux kernel. This document is not a primer of how to fully configure a Linux machine. You should seek knowledge in some other places if you do not have basic abilities to fine tune your Linux system. Remember that configuration of VE itself (from inside) requires the same knowledge as simple Linux box configuration. Here we describe only add-ons to configuration procedure dealing with VE-specific information. In order to understand this HOWTO document it is assumed that you are familiar with the following: * Compiling a Linux kernel Linux Kernel HOWTO * Setting up and configuring of network devices Linux Networking HOWTO * IP aliasing feature IP alias mini-HOWTO * Various network packages like Apache * Setting up DNS DNS HOWTO * Understanding basic system administration Linux Systems Administrators's Guide If you are uncertain of how to proceed with any of the above it is STRONGLY recommended that you use the html links provided to familiarize yourself with all packages. We will NOT reply to mail regarding any of the above. Please direct your questions to the appropriate author of the HOWTO. 2.3 Other sources of information * Linux Virtual Server Project * IP QoS Efforts * An API for Linux QoS Support * Linux Soft Real Time project * QLinux: A QoS enhanced Linux Kernel for Multimedia Computing * ALTQ: Alternate Queueing * Free Network Project (Freenet) - peer-to-peer, decentralized network designed to allow the distribution of information in an efficient manner without censorship * VMware computer emulator for Linux, Windows and FreeBSD * Ensim ServerXchange * Short bibliography on QoS and traffic control in IP networks * References on CBQ (Class-Based Queueing) * Packet Scheduling Links * FreeVSD project to provide Linux virtualization in user space 2.4 Feedback This document will expand as packages are updated and source or configuration modifications change. If there are any portions of this document that are unclear please feel free to email us with your suggestions or questions. So that we do not have to go searching through the entire HOWTO please make certain that all comments are as specific as possible and include the section where the uncertainty lies. It is important that all mail be addressed with VIRTSERVICES HOWTO in the subject line. This will greatly decrease responce time. Please note that examples are just that, examples and should not be copied verbatim. You may have to insert your own values. If you are having trouble, send us mail. Include all the pertinent configuration files and the error messages you get when installing and I will look them over and reply with my suggestions. 2.5 Revision History V0.1 Initial version _________________________________________________________________ 3. How it works. 3.1 Virtual Environments Virtual environments (VE's) are a set of "OS inside OS" - full-featured Linux box being multiplied inside the only hardware unit. Each box could run inside virtually any Linux application (except ones working with specific hardware), have separate file system root files and effectively share resources of hardware (memory, CPU, disk, etc). Effectiveness in CPU and memory (RAM) sharing achieved because of all processes inside are standard Linux processes handled by single Linux kernel. Modern investigations and experiments in evaluation of amount of processes effectively running in Linux shows that it could be up to the at least 10K. Our evaluations shows that amount of VE inside typical hardware box could vary up to 1000 - depending upon a heaviness of each environment (more typical up to 500). This scalability achieved because of selected model implementation with single kernel approach. Traditionally multi kernel approach implementation (starting first probably in IBM OS/390 operational system) implements full emulation of hardware inside one process of mother OS, and later allows user to install any OS inside (in this moment most known solution of such a type is provided by VMware and Ensim). In VE we use another approach all processes inside computer share the same single kernel and other computer resources, including file system. Another extremely important feature of VE is a real effectiveness of disk data sharing. All files share between different VE roots are the same from the file cache point of view and appears only once both in disk and RAM (as a shared code between processes running the same executable). Security of VE is another advantage of our approach. Users from different VE's are completely isolated from each other and cannot disturb each other. Technically implementation of VE is based on the set of additional kernel syscalls. The main idea behind the implementation is to mark all valuable objects (like entries in the process table) with VE id and patch searching alghoritms to hide details about other virtual environments. This is quite easy and straightforward approach minimizing size of patches to the kernel required. This approach can even slightly increase overall system performance as required resources will be found faster. 3.2 Filesystem To minimize problems with shared files a "copy-on-write" concept utilization is proposed. There are set of common OS files visible for all VE's as it was in it's own local file system. The same hierarchical (but initially empty) directory structure exists in each VE. When user process inside VE tries to open a file from common place OS first try to open own VE's copy of file (if exist in own directory structure) and, in case of failure, common files. If open mode is read no additional operations performed, if write - OS copy file first inside VE local file system and later open it to write. 3.3 Quality of Service Resource limits For now, please look to UserBeanCounter Patch It is applied, but is to be described. In general, when VE is started up, luid is assigned to its starting process, i.e. 'init'. For more information, see url supplied. Fair scheduling To be applied using some available two-level scheduler, see bibliography. Network bandwidth To be applied using some shaper, see bibliography. _________________________________________________________________ 4. Availability 4.1 Download by FTP Most up-to-date release of Virtual Environments for Linux is available for download free of charge at: * ftp://ftp.asplinux.com.sg/pub/aspcomplete/ (Singapore) * ftp://ftp.asp-linux.com/pub/aspcomplete/ (USA) * ftp://ftp.asplinux.ru/pub/aspcomplete/ (Russia) * ftp://ftp.asp-linux.co.kr/pub/aspcomplete/ (Korea) 4.2 Anonymous CVS We provide Anonymous CVS access to srv.asplinux.ru using CVSROOT=:pserver:anoncvs@srv.asplinux.ru:/home/cvs password: anoncvs module: ASPcomplete Here you may check out modules: * ASPcomplete/linux - current version of full kernel sources with all patches applied. !It containts complete Linux kernel sources of appropriate version with VE patches applied - totally about 110Mb.! * ASPcomplete/docs - documentations * ASPcomplete/utils - utilities Please use the following commands to get: export CVSROOT=:pserver:anoncvs@srv.asplinux.ru:/home/cvs cvs login (complete login with "anoncvs" password) cvs -z9 co ASPcomplete cvs logout _________________________________________________________________ 5. Kernel Configuration Virtual environment requires you to build your own custom kernel. If you are not familiar, goto Linux Kernel HOWTO The Virtual environment patch is created againts some particular version of Linux kernel, for example 2.4.0-test6 kernel. You can get this kernel from central Linux kernel repository or other well known sources. Unpack kernel sources bzip2 -dc linux-2.4.0-XXX.tar.bz2 | tar xvf - Also, if you want enhance reliability, you may wish to get and install Reiserfs Journalled Filesystem. Please consult Reiserfs Home Site before doing so. Run make menuconfig or make xconfig or make config or cp sample.config .config make oldconfig The following options should have status listed in order to receive worked on system: CONFIG_EXPERIMENTAL y CONFIG_VE y CONFIG_VE_NET y CONFIG_VE_IRQ y CONFIG_VE_LINK y CONFIG_VE_SIGIO y CONFIG_IP_ALIAS y CONFIG_USER_RESOURCE y CONFIG_USER_RESOURCE_PROC y Build your kernel, update lilo.conf and reboot the system. For any problems see Linux Kernel HOWTO. _________________________________________________________________ 6. Installing support utilities. Make also sure that /usr/include/linux /usr/include/asm are pointed to your freshly patched kernel sources include/linux include/asm This is required to correctly build VE utilities. On the other hand (on the freshly installed Linux system), maintaining symlink /usr/src/linux to sources of your current kernel will be slightly easer, as links above are set by default to the /usr/src/linux/include/linux /usr/src/linux/include/asm Unpack utilities and build them tar zxf utils.tgz cd utils make Put envutil, ve-start, ve-stop, veroot-setup executable to some system folder inside your PATH, i.e. cp envutil ve-start ve-stop veroot-setup /sbin _________________________________________________________________ 7. Filesystem setup Each virtual environment should get their own directory structure. Since we are using chroot you will require duplicate copies of the shared libraries, binaries, conf files, etc. We use by default /ve_root/VEID for each virtual environment have been created. 7.1 VEFS Currently, special filesystem support (VEFS) for union mount of read-only template with read-write private per-VE tree is under development. Until it will be ready, you need to prepare isolated chroot tree for each using regular UNIX abilities. We prepared a sample shell script "veroot-setup", that simplifies this process. Please look carefully at it before running and understand what does this script suppose to perform. 7.2 Setting up chroot tree * Changing important configuration files * Removing unneeded cron entries * Fixing inittab * Setting hostname and /etc/hosts 7.3 Sample chroot fs setup script Sample chroot fs setup script is included into utils-X.X.tar.gz package veroot-setup. _________________________________________________________________ 8. Running virtual environment 8.1 Starting virtual environment Use ve-start [] to startup desired number of virtual environments beginning from veid. By default one VE is started. This utility do the following (in order) * initializes (if required) private part of VE directory structure * setups IP alias on the basis of VEid * chroots into /ve_root/ * invokes envutil program, which in turn starts new process, marks it with veid supplied and assigns it with PID 1. After that this remarkable new process executes 'init', as on the standalone Unix system 8.2 envutil This utility allows to create, kill, enter or modify access mode for hardware devices. It should be used with one of options below --create veid IP [--bash] --kill veid --enter veid --perms veid (--chr or --blk ) major minor mask create allows to create virtual environment and to it dedicated IP address, which will be used as VE identity for outside world. IP alias on some interface should be setup before. --bash forces to invoke /ve_root//bin/bash instead of init. This can be useful for debugging kill gently shutdowns VE enter enters into VE. This option is a backdoor for host root only and currently used for administrative purposes, for example, statistics gathering perms modifies access mode for hardware devices of node. These rules are checked in the following order. Zero minor and major are accepted. If specific device is required to be opened, kernel tries to find access mode using following (major,minor) pairs + (major,minor) of the device + (major,0) + (0,0) Each such search is succesfull, as by default access to all hardware devices is denied by kernel specifying (0,0) rule. 8.3 Runtime access of virtual environment Virtual environment can be accessed either by host administrator using envutil --enter or from the network using standard networking utilities like http, telnet, ssh etc. 8.4 Configuring virtual environment Each virtual environment should be configured in the same manner as standalone Linux host. Please refer to Linux Systems Administrators's Guide Our starting scripts perform some basic administrative actions, so they should run. 8.5 Stopping virtual environment Use ve-stop or envutil --kill _________________________________________________________________ 9. Comparison with other similar solution There are following system more or less similar to ours: * VMWare Virtual machine * Ensim ServerXChange * User Mode Linux * FreeVSD * FreeBSD jail 9.1 Implementation strategies Basically, virtual server support can be implemented in the userspace or inside OS kernel. Apache, for example, have support for virtual httpd servers. This approach have either advatages and disadvantages. First of all it is portable and can be used above any OS. It is relatively easy to implement and debug. On the other hand it is impossible to allow full administrative access to such server from end user. If system is implemented inside OS kernel, two different strategies available: * single kernel * multiple kernels (one for each environment) From first look, multiple kernel solution, again, is somethat easy to implement and maintain. But this is a questionable thing and requires comprehend examination. The major advantage of this approach is ability to run desired OS inside virtual server. On the other hand single-kernel solutions are more efficient because * they introduce less additional costs, because they share the hardware straightforward, rather than in two stages. They pass through filesystem level to hardware level only once. Access to other resources including networking one are given directly, rather in two stages * single-kernel solutions share resources more efficiently in the case one environment is less loaded than the other. This point is much more important than the previous. With multiple kernel solutions, the kernels do not know about each other, can not compare resource usage of their counterparts (memory pressure, CPU load, etc) and share resources in a way which gives more resources to those who need them more. Moreover, resource management is almost impossible with multiple kernel solutions due to the same reason: the kernels have limited knowledge. With a single-kernel solution any necessary resource management policy may be implemented * single-kernel solutions may benefit from cache sharing (filesystem cache, route cache etc). * single kernel solution may benefit from sharing code of application in memory 9.2 Comparison Table ASPbeta ServerXchange VMWare FreeBSD jail FreeVSD UML Project goal Complete ASP platform Complete ASP platform Virtual computer suitable to run foreigh OS Sandbox for dangerous applications ISP platform Kernel debug tool Implementation Uni-kernel Multiple kernels Multiple kernels Uni-kernel User space tools Multiple kernels Isolation level Linux capabilities, namespace separation Unknown, virtual hardware(?) Virtual hardware Access control filters User-level utilities substitution Syscall redirection Performance loss Very low, gains somewhere possible Unknown Low for CPU bound, high for IO Very low Low Significant Resource sharing All Unknown No All, but disk ones All, but disk ones No Scalability High Medium Low High High No QoS support By OS: resource limits, some additions planned Yes No By OS No No Fault tolerance Fault isolation Fault isolation Fault isolation Fault isolation No Fault isolation Administration tools Yes Yes Limited No public, present third party providers Becomes not public No _________________________________________________________________ 10. Acknowledgement Thanks to Alexey Kuznetsov and Andrey V. Savochkin for helping make this document and for their contribution to the whole idea. Special thanks to Brian Ackerman for providing SGML template for writing this document with his nice Virtual Services Howto