Index: linux/ChangeLog diff -uN /dev/null linux/ChangeLog:1.7 --- /dev/null Mon Aug 14 20:39:30 2000 +++ linux/ChangeLog Mon Aug 14 20:05:54 2000 @@ -0,0 +1,28 @@ +2000-08-14 Yuri Pudgorodsky + + * 0.4.2 release + * Merged reiserfs 3.6.12 + +2000-08-14 Denis Lunev + + * Fixed pty/pts allocation OOPS + +2000-08-12 Yuri Pudgorodsky + + * Prepared for 0.4.2: 2.4.0-test6 port + * UNIX98 pty works + +2000-08-07 Yuri Pudgorodsky + + * v0.4.1: 2.4.0-test6-pre6 port + * SOCK_PACKET traffic has been isolated within VE + (tcpdump now works inside VE as expected) + * Fixed fragmented IP traffic inside VE. + * user beancounter IV-0007.fix2 + * CONFIG_NETFILTER, CONFIG_KHTTPD compile fixes + * misc cleanup + * IP route work in progress (CONFIG_VE_ROUTE) + +2000-07-25 Yuri Pudgorodsky + + * v0.4: initial public release Index: linux/Makefile diff -u linux/Makefile:1.1.1.7 ASPcomplete/linux/Makefile:1.9 --- linux/Makefile:1.1.1.7 Mon Aug 14 19:41:09 2000 +++ linux/Makefile Sat Aug 12 19:36:41 2000 @@ -1,7 +1,10 @@ VERSION = 2 PATCHLEVEL = 4 SUBLEVEL = 0 -EXTRAVERSION = -test6 +EXTRAVERSION-y = smp +EXTRAVERSION- = up +EXTRAVERSION-n = up +EXTRAVERSION = -test6-ve0.4.2-$(EXTRAVERSION-$(CONFIG_SMP)) KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION) Index: linux/Documentation/Configure.help diff -u linux/Documentation/Configure.help:1.1.1.10 ASPcomplete/linux/Documentation/Configure.help:1.9 --- linux/Documentation/Configure.help:1.1.1.10 Mon Aug 14 19:43:27 2000 +++ linux/Documentation/Configure.help Sat Aug 12 22:02:28 2000 @@ -14221,6 +14221,32 @@ better 32 MB RAM to avoid excessive linking time. This is only useful for kernel hackers. If unsure, say N. +Virtual Environment Support (EXPIRIMENTAL) +CONFIG_VE + + This option is experimental. It adds support of virtual Linux running + on the original box with fully supported virtual network driver, tty + subsystem and configurable access for hardware and other resources. + +Advanced resource control +CONFIG_USER_RESOURCE + This patch provides accounting and allows to configure + limits for user's consumption of exhaustible system resources. + The most important resource controlled by this patch is unswappable memory + (either mlock'ed or used by internal kernel structures and buffers). + The main goal of this patch is to protect processes + from running short of important resources because of an accidental + misbehavior of processes or malicious activity aiming to ``kill'' the system. + It's worth to mention that resource limits configured by setrlimit(2) + do not give an acceptable level of protection because they cover only small + fraction of resources and work on a per-process basis. Per-process + accounting doesn't prevent malicious users from spawning a lot of + resource-consuming processes. + +Report resource usage in /proc +CONFIG_USER_RESOURCE_PROC + Allows a system administrator to inspect resource accounts and limits. + Magic System Request Key support CONFIG_MAGIC_SYSRQ If you say Y here, you will have some control over the system even Index: linux/Documentation/ve.txt diff -uN /dev/null linux/Documentation/ve.txt:1.4 --- /dev/null Mon Aug 14 20:39:31 2000 +++ linux/Documentation/ve.txt Sat Aug 12 22:02:29 2000 @@ -0,0 +1,76 @@ + VE Overview + -------------- + (C) SWSoft Pte Ltd., 2000, http://www.sw.com.sg/ + +Virtual environments (VE's) are, from userspace point of view, a set of "OS +inside OS" - full-featured Linux box being multiplied inside the only hardware +unit. Each box could run inside virtually any Linux application (except ones +working with specific hardware), have separate file system root files and +effectively share resources of hardware (memory, CPU, disk, etc). + +From kernel point of view, each VE is an isolated set of processes starting +from their own 'init'. Current release should properly implement namespace +separation for all userspace-visible objects, including restrictions for VE +superuser to access the global namespace and to change the global parameters. + +VE is intergated with A. Savotchkin user beancounters +patch. It provides accounting and allows to configure limits for user's +consumption of exhaustible system resources. The most important resource +controlled by this patch is unswappable memory (either mlock'ed or used by +internal kernel structures and buffers). Per-process accounting doesn't +prevent malicious users from spawning a lot of resource-consuming processes. + +Changed kernel subsystems: +- process table searching + for_each_task is replaced with for_each_task_all and for_each_task_ve + to ensure appropriate separation + find_task_by_pid is replaced similarly with find_task_by_pid_ve and + find_task_by_pid_all +- orphanes now connected to per environment init, i.e. first process spawned + inside VE (by the way it also has PID=1) +- sys_reboot shutdowns VE gracefully +- device access additionally regulated on the per VE basis +- /proc is made readonly inside VE +- BSD pty driver is supplied for each VE +- only loopback and specified IP are accessible inside VE + additional parameter is added to ip_route_output and some structures to + allow VE based routing (being developed) + lookup procedures of UNIX and IPV4 sockets have VE mark as a parameter + added +- Console is not accessible from VE +- SysVIPC ids have been multiplexed +- sysctl is readonly inside VE, besides uts_name related ones +- SIGIO never sends signals across VE boundary +- chroot is secured + +The following options should have status listed in order to receive +kernel with VE system enabled: + CONFIG_VE y + CONFIG_VE_NET y + CONFIG_VE_IRQ y + CONFIG_VE_LINK y + CONFIG_VE_SIGIO y + CONFIG_UNIX98_PTYS n + CONFIG_IP_ALIAS y + CONFIG_USER_RESOURCE y + CONFIG_USER_RESOURCE_PROC y + +We recommend you to periodically check + ftp://ftp.asplinux.com.sg/pub/aspcomplete/ +for newer versions. + +More information about the system is in the + ftp://ftp.asplinux.com.sg/pub/aspcomplete/VE-HOWTO.html + +Mirrors: + ftp.asplinux.ru (Russia) + ftp.asp-linux.com (USA) + ftp.asplinux.co.kr (Korea) + +All information related user beancounters patch is in the + http://www.asplinux.com.sg/install/ubpatch.shtml + +Authors: + Denis Lunev + Yuri Pugrorodsky + Alexander Tormasov Index: linux/arch/alpha/kernel/ptrace.c diff -u linux/arch/alpha/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/alpha/kernel/ptrace.c:1.2 --- linux/arch/alpha/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:53 2000 +++ linux/arch/alpha/kernel/ptrace.c Sun Jun 18 13:04:03 2000 @@ -257,7 +257,7 @@ goto out_notsk; ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/arm/kernel/ptrace.c diff -u linux/arch/arm/kernel/ptrace.c:1.1.1.5 ASPcomplete/linux/arch/arm/kernel/ptrace.c:1.3 --- linux/arch/arm/kernel/ptrace.c:1.1.1.5 Mon Aug 14 19:41:43 2000 +++ linux/arch/arm/kernel/ptrace.c Mon Jul 17 15:19:43 2000 @@ -560,7 +560,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/i386/config.in diff -u linux/arch/i386/config.in:1.1.1.6 ASPcomplete/linux/arch/i386/config.in:1.9 --- linux/arch/i386/config.in:1.1.1.6 Mon Aug 14 19:41:45 2000 +++ linux/arch/i386/config.in Sat Aug 12 22:02:29 2000 @@ -341,6 +341,24 @@ source drivers/usb/Config.in mainmenu_option next_comment +comment 'Resource management' +if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then + bool 'Virtual Environment support (EXPERIMENTAL)' CONFIG_VE + if [ "$CONFIG_VE" = "y" ]; then + bool 'SIGIO patch' CONFIG_VE_SIGIO + bool 'IRQ patch' CONFIG_VE_IRQ + bool 'NET patch' CONFIG_VE_NET + bool 'VE link driver' CONFIG_VE_LINK + bool 'VE route (Work In Progress)' CONFIG_VE_ROUTE + fi + bool 'Per user resource management (EXPERIMENTAL)' CONFIG_USER_RESOURCE + if [ "$CONFIG_USER_RESOURCE" = "y" -a "$CONFIG_PROC_FS" = "y" ]; then + bool ' Report resource usage in /proc' CONFIG_USER_RESOURCE_PROC + fi +fi +endmenu + +mainmenu_option next_comment comment 'Kernel hacking' #bool 'Debug kmalloc/kfree' CONFIG_DEBUG_MALLOC Index: linux/arch/i386/kernel/entry.S diff -u linux/arch/i386/kernel/entry.S:1.1.1.8 ASPcomplete/linux/arch/i386/kernel/entry.S:1.11 --- linux/arch/i386/kernel/entry.S:1.1.1.8 Mon Aug 14 19:41:47 2000 +++ linux/arch/i386/kernel/entry.S Sat Aug 12 22:02:29 2000 @@ -649,6 +649,21 @@ * entries. Don't panic if you notice that this hasn't * been shrunk every time we add a new system call. */ + .long SYMBOL_NAME(sys_getluid) /* 220 */ + .long SYMBOL_NAME(sys_setluid) + .long SYMBOL_NAME(sys_setublimit) +#ifdef CONFIG_VE + .long SYMBOL_NAME(sys_env_create) /* 223 */ + .long SYMBOL_NAME(sys_mark_env_to_down) + .long SYMBOL_NAME(sys_setdevperms) + .long SYMBOL_NAME(sys_ve_conf_request) +#else + .long SYMBOL_NAME(sys_ni_syscall) /* 223 */ + .long SYMBOL_NAME(sys_ni_syscall) + .long SYMBOL_NAME(sys_ni_syscall) + .long SYMBOL_NAME(sys_ni_syscall) +#endif + .rept NR_syscalls-219 .long SYMBOL_NAME(sys_ni_syscall) .endr Index: linux/arch/i386/kernel/ptrace.c diff -u linux/arch/i386/kernel/ptrace.c:1.1.1.5 ASPcomplete/linux/arch/i386/kernel/ptrace.c:1.4 --- linux/arch/i386/kernel/ptrace.c:1.1.1.5 Mon Aug 14 19:41:46 2000 +++ linux/arch/i386/kernel/ptrace.c Thu Jul 20 19:10:10 2000 @@ -152,7 +152,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/i386/kernel/vm86.c diff -u linux/arch/i386/kernel/vm86.c:1.1.1.6 ASPcomplete/linux/arch/i386/kernel/vm86.c:1.4 --- linux/arch/i386/kernel/vm86.c:1.1.1.6 Mon Aug 14 19:41:47 2000 +++ linux/arch/i386/kernel/vm86.c Mon Jul 17 15:19:43 2000 @@ -602,7 +602,7 @@ int ret = 0; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if ((p == tsk) && (p->sig)) { ret = 1; break; Index: linux/arch/i386/mm/fault.c diff -u linux/arch/i386/mm/fault.c:1.1.1.3 ASPcomplete/linux/arch/i386/mm/fault.c:1.3 --- linux/arch/i386/mm/fault.c:1.1.1.3 Mon Aug 14 19:41:46 2000 +++ linux/arch/i386/mm/fault.c Sat Aug 12 22:02:29 2000 @@ -205,6 +205,7 @@ goto out_of_memory; } +normal_return: /* * Did it hit the DOS screen memory VA from vm86 mode? */ @@ -293,10 +294,28 @@ * us unable to handle the page fault gracefully. */ out_of_memory: + + if (error_code & 4) { + struct task_struct *worst; + read_lock(&tasklist_lock); + worst = select_worst_task(); + printk(KERN_ERR "VM: killing process %s\n", worst->comm); + if (worst != current) { + force_sig(SIGKILL, worst); + worst->policy = SCHED_FIFO; + worst->rt_priority = 1000; + current->policy |= SCHED_YIELD; + read_unlock(&tasklist_lock); + schedule(); + goto normal_return; + } else { + read_unlock(&tasklist_lock); + up(&mm->mmap_sem); + do_exit(SIGKILL); + } + /* Never reached */ + } up(&mm->mmap_sem); - printk("VM: killing process %s\n", tsk->comm); - if (error_code & 4) - do_exit(SIGKILL); goto no_context; do_sigbus: Index: linux/arch/i386/mm/init.c diff -u linux/arch/i386/mm/init.c:1.1.1.4 ASPcomplete/linux/arch/i386/mm/init.c:1.3 --- linux/arch/i386/mm/init.c:1.1.1.4 Mon Aug 14 19:41:46 2000 +++ linux/arch/i386/mm/init.c Sat Aug 12 19:36:41 2000 @@ -133,7 +133,7 @@ return (pte_t *) pmd_page(*pmd) + offset; } -pte_t *get_pte_slow(pmd_t *pmd, unsigned long offset) +pte_t *get_pte_slow(struct user_beancounter *bc, pmd_t *pmd, unsigned long offset) { unsigned long pte; @@ -145,9 +145,11 @@ return (pte_t *)pte + offset; } set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(get_bad_pte_table()))); + uncharge_kmem(bc, PAGE_SIZE); return NULL; } free_page(pte); + uncharge_kmem(bc, PAGE_SIZE); if (pmd_bad(*pmd)) { __handle_bad_pmd(pmd); return NULL; @@ -163,9 +165,9 @@ if(pgd_quicklist) free_pgd_slow(get_pgd_fast()), freed++; if(pmd_quicklist) - free_pmd_slow(get_pmd_fast()), freed++; + free_pmd_slow_kernel(get_pmd_fast()), freed++; if(pte_quicklist) - free_pte_slow(get_pte_fast()), freed++; + free_pte_slow_kernel(get_pte_fast()), freed++; } while(pgtable_cache_size > low); } return freed; Index: linux/arch/ia64/ia32/sys_ia32.c diff -u linux/arch/ia64/ia32/sys_ia32.c:1.1.1.7 ASPcomplete/linux/arch/ia64/ia32/sys_ia32.c:1.5 --- linux/arch/ia64/ia32/sys_ia32.c:1.1.1.7 Mon Aug 14 19:41:47 2000 +++ linux/arch/ia64/ia32/sys_ia32.c Mon Aug 7 19:07:59 2000 @@ -2390,7 +2390,7 @@ ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); read_unlock(&tasklist_lock); if (!child) goto out; Index: linux/arch/ia64/kernel/perfmon.c diff -u linux/arch/ia64/kernel/perfmon.c:1.1.1.3 ASPcomplete/linux/arch/ia64/kernel/perfmon.c:1.2 --- linux/arch/ia64/kernel/perfmon.c:1.1.1.3 Mon Aug 14 19:41:48 2000 +++ linux/arch/ia64/kernel/perfmon.c Sun Jun 18 13:04:03 2000 @@ -112,7 +112,7 @@ * IPI to take care of MP systems. See blurb above. */ lock_kernel(); - for_each_task(p) { + for_each_task_all(p) { regs = (struct pt_regs *) (((char *)p) + IA64_STK_OFFSET) -1 ; ia64_psr(regs)->pp = 1; } @@ -182,7 +182,7 @@ * IPI to take care of MP systems. See blurb above. */ lock_kernel(); - for_each_task(p) { + for_each_task_all(p) { regs = (struct pt_regs *) (((char *)p) + IA64_STK_OFFSET) - 1; ia64_psr(regs)->pp = 0; } Index: linux/arch/ia64/kernel/ptrace.c diff -u linux/arch/ia64/kernel/ptrace.c:1.1.1.5 ASPcomplete/linux/arch/ia64/kernel/ptrace.c:1.4 --- linux/arch/ia64/kernel/ptrace.c:1.1.1.5 Mon Aug 14 19:41:47 2000 +++ linux/arch/ia64/kernel/ptrace.c Mon Aug 7 19:08:00 2000 @@ -533,7 +533,7 @@ struct task_struct *p; read_lock(&tasklist_lock); { - for_each_task(p) { + for_each_task_ve(p) { if (p->mm == mm && p->state != TASK_RUNNING) sync_kernel_register_backing_store(p, 0, make_writable); } @@ -952,7 +952,7 @@ ret = -ESRCH; read_lock(&tasklist_lock); { - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); } Index: linux/arch/ia64/mm/tlb.c diff -u linux/arch/ia64/mm/tlb.c:1.1.1.5 ASPcomplete/linux/arch/ia64/mm/tlb.c:1.4 --- linux/arch/ia64/mm/tlb.c:1.1.1.5 Mon Aug 14 19:41:47 2000 +++ linux/arch/ia64/mm/tlb.c Mon Aug 7 19:08:00 2000 @@ -142,7 +142,7 @@ ia64_next_context = (1UL << IA64_HW_CONTEXT_BITS) + 1; read_lock(&tasklist_lock); - for_each_task (task) { + for_each_task_all (task) { if (task->mm == mm) continue; flush_tlb_mm(mm); Index: linux/arch/m68k/atari/stram.c diff -u linux/arch/m68k/atari/stram.c:1.1.1.5 ASPcomplete/linux/arch/m68k/atari/stram.c:1.6 --- linux/arch/m68k/atari/stram.c:1.1.1.5 Mon Aug 14 19:41:49 2000 +++ linux/arch/m68k/atari/stram.c Sat Aug 12 19:36:41 2000 @@ -866,7 +866,7 @@ while( map[i] ) { read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (unswap_process( p->mm, SWP_ENTRY( stram_swap_type, i ), entry, 1 )) { read_unlock(&tasklist_lock); @@ -936,7 +936,7 @@ if (page_map) { page = (unsigned long) page_address(page_map); read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) unswap_process(p->mm, entry, page /* , 0 */); read_unlock(&tasklist_lock); Index: linux/arch/m68k/kernel/ptrace.c diff -u linux/arch/m68k/kernel/ptrace.c:1.1.1.3 ASPcomplete/linux/arch/m68k/kernel/ptrace.c:1.2 --- linux/arch/m68k/kernel/ptrace.c:1.1.1.3 Mon Aug 14 19:41:49 2000 +++ linux/arch/m68k/kernel/ptrace.c Sun Jun 18 13:04:03 2000 @@ -106,7 +106,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); read_unlock(&tasklist_lock); /* FIXME!!! */ if (!child) goto out; Index: linux/arch/mips/kernel/ptrace.c diff -u linux/arch/mips/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/mips/kernel/ptrace.c:1.3 --- linux/arch/mips/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:52 2000 +++ linux/arch/mips/kernel/ptrace.c Mon Jul 17 15:19:43 2000 @@ -50,7 +50,7 @@ } res = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/mips/kernel/sysirix.c diff -u linux/arch/mips/kernel/sysirix.c:1.1.1.5 ASPcomplete/linux/arch/mips/kernel/sysirix.c:1.4 --- linux/arch/mips/kernel/sysirix.c:1.1.1.5 Mon Aug 14 19:41:52 2000 +++ linux/arch/mips/kernel/sysirix.c Wed Jul 19 15:12:09 2000 @@ -105,7 +105,7 @@ printk("irix_prctl[%s:%d]: Wants PR_ISBLOCKED\n", current->comm, current->pid); read_lock(&tasklist_lock); - task = find_task_by_pid(regs->regs[base + 5]); + task = find_task_by_pid_all(regs->regs[base + 5]); error = -ESRCH; if (error) error = (task->run_list.next != NULL); @@ -289,7 +289,7 @@ if (retval) break; read_lock(&tasklist_lock); - p = find_task_by_pid(pid); + p = find_task_by_pid_all(pid); if (!p) { read_unlock(&tasklist_lock); retval = -ESRCH; Index: linux/arch/mips64/kernel/ptrace.c diff -u linux/arch/mips64/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/mips64/kernel/ptrace.c:1.4 --- linux/arch/mips64/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:57 2000 +++ linux/arch/mips64/kernel/ptrace.c Wed Jul 19 15:13:07 2000 @@ -50,7 +50,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); @@ -315,7 +315,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/ppc/kernel/idle.c diff -u linux/arch/ppc/kernel/idle.c:1.1.1.4 ASPcomplete/linux/arch/ppc/kernel/idle.c:1.3 --- linux/arch/ppc/kernel/idle.c:1.1.1.4 Mon Aug 14 19:41:45 2000 +++ linux/arch/ppc/kernel/idle.c Mon Jul 17 15:19:43 2000 @@ -120,7 +120,7 @@ if (!reclaim_ptr->v) continue; valid = 0; - for_each_task(p) + for_each_task_all(p) { if ( current->need_resched ) goto out; Index: linux/arch/ppc/kernel/ppc_htab.c diff -u linux/arch/ppc/kernel/ppc_htab.c:1.1.1.4 ASPcomplete/linux/arch/ppc/kernel/ppc_htab.c:1.2 --- linux/arch/ppc/kernel/ppc_htab.c:1.1.1.4 Mon Aug 14 19:41:45 2000 +++ linux/arch/ppc/kernel/ppc_htab.c Sun Jun 18 13:04:03 2000 @@ -163,7 +163,7 @@ { /* make sure someone is using this context/vsid */ valid = 0; - for_each_task(p) + for_each_task_all(p) { if (p->mm && (ptr->vsid >> 4) == p->mm->context) { Index: linux/arch/ppc/kernel/ptrace.c diff -u linux/arch/ppc/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/ppc/kernel/ptrace.c:1.2 --- linux/arch/ppc/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:44 2000 +++ linux/arch/ppc/kernel/ptrace.c Sun Jun 18 13:04:03 2000 @@ -98,7 +98,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/ppc/mm/fault.c diff -u linux/arch/ppc/mm/fault.c:1.1.1.4 ASPcomplete/linux/arch/ppc/mm/fault.c:1.3 --- linux/arch/ppc/mm/fault.c:1.1.1.4 Mon Aug 14 19:41:43 2000 +++ linux/arch/ppc/mm/fault.c Sat Aug 12 22:02:29 2000 @@ -157,6 +157,7 @@ goto out_of_memory; } +normal_return: up(&mm->mmap_sem); /* * keep track of tlb+htab misses that are good addrs but @@ -189,9 +190,24 @@ */ out_of_memory: up(&mm->mmap_sem); - printk("VM: killing process %s\n", current->comm); - if (user_mode(regs)) - do_exit(SIGKILL); + if (user_mode(regs)) { + struct task_struct *worst; + read_lock(&tasklist_lock); + worst = select_worst_task(); + printk(KERN_ERR "VM: killing process %s\n", worst->comm); + if (worst != current) { + force_sig(SIGKILL, worst); + worst->policy = SCHED_FIFO; + worst->rt_priority = 1000; + current->policy |= SCHED_YIELD; + read_unlock(&tasklist_lock); + schedule(); + goto normal_return; + } else { + read_unlock(&tasklist_lock); + do_exit(SIGKILL); + } + } bad_page_fault(regs, address); return; Index: linux/arch/ppc/mm/init.c diff -u linux/arch/ppc/mm/init.c:1.1.1.6 ASPcomplete/linux/arch/ppc/mm/init.c:1.4 --- linux/arch/ppc/mm/init.c:1.1.1.6 Mon Aug 14 19:41:43 2000 +++ linux/arch/ppc/mm/init.c Sat Aug 12 19:36:41 2000 @@ -298,7 +298,7 @@ printk(" %3s", "CPU"); #endif /* CONFIG_SMP */ printk("\n"); - for_each_task(p) + for_each_task_all(p) { printk("%-8.8s %3d %8ld %8ld %8ld %c%08lx %08lx ", p->comm,p->pid, @@ -591,7 +591,7 @@ printk(KERN_DEBUG "mmu_context_overflow\n"); read_lock(&tasklist_lock); - for_each_task(tsk) { + for_each_task_all(tsk) { if (tsk->mm) tsk->mm->context = NO_CONTEXT; } Index: linux/arch/s390/kernel/ptrace.c diff -u linux/arch/s390/kernel/ptrace.c:1.1.1.3 ASPcomplete/linux/arch/s390/kernel/ptrace.c:1.2 --- linux/arch/s390/kernel/ptrace.c:1.1.1.3 Mon Aug 14 19:41:52 2000 +++ linux/arch/s390/kernel/ptrace.c Sun Jun 18 13:04:03 2000 @@ -218,7 +218,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); read_unlock(&tasklist_lock); if (!child) goto out; Index: linux/arch/sh/kernel/ptrace.c diff -u linux/arch/sh/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/sh/kernel/ptrace.c:1.2 --- linux/arch/sh/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:41 2000 +++ linux/arch/sh/kernel/ptrace.c Sun Jun 18 13:04:03 2000 @@ -161,7 +161,7 @@ } ret = -ESRCH; read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); if (child) get_task_struct(child); read_unlock(&tasklist_lock); Index: linux/arch/sparc/kernel/ptrace.c diff -u linux/arch/sparc/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/sparc/kernel/ptrace.c:1.3 --- linux/arch/sparc/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:54 2000 +++ linux/arch/sparc/kernel/ptrace.c Fri Jun 30 22:49:09 2000 @@ -305,7 +305,7 @@ goto out; } #endif - if(!(child = find_task_by_pid(pid))) { + if(!(child = find_task_by_pid_ve(pid))) { pt_error_return(regs, ESRCH); goto out; } Index: linux/arch/sparc/mm/fault.c diff -u linux/arch/sparc/mm/fault.c:1.1.1.3 ASPcomplete/linux/arch/sparc/mm/fault.c:1.3 --- linux/arch/sparc/mm/fault.c:1.1.1.3 Mon Aug 14 19:41:54 2000 +++ linux/arch/sparc/mm/fault.c Sat Aug 12 22:02:29 2000 @@ -261,6 +261,7 @@ default: goto out_of_memory; } +normal_return: up(&mm->mmap_sem); return; @@ -325,9 +326,24 @@ */ out_of_memory: up(&mm->mmap_sem); - printk("VM: killing process %s\n", tsk->comm); - if (from_user) - do_exit(SIGKILL); + if (from_user) { + struct task_struct *worst; + read_lock(&tasklist_lock); + worst = select_worst_task(); + printk(KERN_ERR "VM: killing process %s\n", worst->comm); + if (worst != current) { + force_sig(SIGKILL, worst); + worst->policy = SCHED_FIFO; + worst->rt_priority = 1000; + current->policy |= SCHED_YIELD; + read_unlock(&tasklist_lock); + schedule(); + goto normal_return; + } else { + read_unlock(&tasklist_lock); + do_exit(SIGKILL); + } + } goto no_context; do_sigbus: Index: linux/arch/sparc/mm/srmmu.c diff -u linux/arch/sparc/mm/srmmu.c:1.1.1.8 ASPcomplete/linux/arch/sparc/mm/srmmu.c:1.6 --- linux/arch/sparc/mm/srmmu.c:1.1.1.8 Mon Aug 14 19:41:54 2000 +++ linux/arch/sparc/mm/srmmu.c Sat Aug 12 19:36:41 2000 @@ -487,7 +487,7 @@ struct task_struct * p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *srmmu_pgd_offset(p->mm,address) = entry; Index: linux/arch/sparc64/kernel/ptrace.c diff -u linux/arch/sparc64/kernel/ptrace.c:1.1.1.4 ASPcomplete/linux/arch/sparc64/kernel/ptrace.c:1.3 --- linux/arch/sparc64/kernel/ptrace.c:1.1.1.4 Mon Aug 14 19:41:56 2000 +++ linux/arch/sparc64/kernel/ptrace.c Fri Jun 30 22:49:09 2000 @@ -156,7 +156,7 @@ } #endif read_lock(&tasklist_lock); - child = find_task_by_pid(pid); + child = find_task_by_pid_ve(pid); read_unlock(&tasklist_lock); if(!child) { Index: linux/arch/sparc64/kernel/setup.c diff -u linux/arch/sparc64/kernel/setup.c:1.1.1.4 ASPcomplete/linux/arch/sparc64/kernel/setup.c:1.3 --- linux/arch/sparc64/kernel/setup.c:1.1.1.4 Mon Aug 14 19:41:56 2000 +++ linux/arch/sparc64/kernel/setup.c Mon Aug 7 19:08:00 2000 @@ -140,7 +140,7 @@ pmd_t *pmdp; pte_t *ptep; - for_each_task(p) { + for_each_task_all(p) { mm = p->mm; if (CTX_HWBITS(mm->context) == ctx) break; Index: linux/arch/sparc64/mm/fault.c diff -u linux/arch/sparc64/mm/fault.c:1.1.1.4 ASPcomplete/linux/arch/sparc64/mm/fault.c:1.4 --- linux/arch/sparc64/mm/fault.c:1.1.1.4 Mon Aug 14 19:41:55 2000 +++ linux/arch/sparc64/mm/fault.c Sat Aug 12 22:02:29 2000 @@ -308,6 +308,7 @@ goto out_of_memory; } +normal_return: up(&mm->mmap_sem); goto fault_done; @@ -329,9 +330,24 @@ */ out_of_memory: up(&mm->mmap_sem); - printk("VM: killing process %s\n", current->comm); - if (!(regs->tstate & TSTATE_PRIV)) - do_exit(SIGKILL); + if (!(regs->tstate & TSTATE_PRIV)) { + struct task_struct *worst; + read_lock(&tasklist_lock); + worst = select_worst_task(); + printk(KERN_ERR "VM: killing process %s\n", worst->comm); + if (worst != current) { + force_sig(SIGKILL, worst); + worst->policy = SCHED_FIFO; + worst->rt_priority = 1000; + current->policy |= SCHED_YIELD; + read_unlock(&tasklist_lock); + schedule(); + goto normal_return; + } else { + read_unlock(&tasklist_lock); + do_exit(SIGKILL); + } + } goto handle_kernel_fault; do_sigbus: Index: linux/drivers/char/pty.c diff -u linux/drivers/char/pty.c:1.1.1.4 ASPcomplete/linux/drivers/char/pty.c:1.4 --- linux/drivers/char/pty.c:1.1.1.4 Mon Aug 14 19:42:34 2000 +++ linux/drivers/char/pty.c Tue Aug 1 15:30:45 2000 @@ -29,14 +29,20 @@ #define BUILDING_PTY_C 1 #include +#ifndef CONFIG_VE struct pty_struct { int magic; wait_queue_head_t open_wait; }; +#endif #define PTY_MAGIC 0x5001 + +#ifndef CONFIG_VE +static +#endif +struct tty_driver pty_driver, pty_slave_driver; -static struct tty_driver pty_driver, pty_slave_driver; static int pty_refcount; /* Note: one set of tables for BSD and one for Unix98 */ @@ -98,6 +104,12 @@ tty_unregister_devfs (&tty->link->driver, MINOR (tty->device)); tty_vhangup(tty->link); } + if ((tty->driver.subtype == PTY_TYPE_MASTER) && + (test_bit(TTY_BEANCOUNTER_CHARGED, &tty->flags))) + { + uncharge_pty(tty->charged_bc); + clear_bit(TTY_BEANCOUNTER_CHARGED, &tty->flags); + } } /* @@ -317,6 +329,16 @@ line = MINOR(tty->device) - tty->driver.minor_start; if ((line < 0) || (line >= NR_PTYS)) goto out; + + if ((tty->driver.subtype == PTY_TYPE_MASTER) && + (!test_bit(TTY_BEANCOUNTER_CHARGED, &tty->flags))) + { + if (charge_pty(tty->charged_bc)) + goto out; + set_bit(TTY_BEANCOUNTER_CHARGED, &tty->flags); + } + /* for all further failures uncharge is done in pty_close() */ + pty = (struct pty_struct *)(tty->driver.driver_state) + line; tty->driver_data = pty; Index: linux/drivers/char/sysrq.c diff -u linux/drivers/char/sysrq.c:1.1.1.4 ASPcomplete/linux/drivers/char/sysrq.c:1.4 --- linux/drivers/char/sysrq.c:1.1.1.4 Mon Aug 14 19:42:35 2000 +++ linux/drivers/char/sysrq.c Mon Aug 14 20:05:54 2000 @@ -44,7 +44,7 @@ { struct task_struct *p; - for_each_task(p) { + for_each_task_ve(p) { if (p->mm) { /* Not swapper nor kernel thread */ if (p->pid == 1 && even_init) /* Ugly hack to kill init */ p->pid = 0x8000; Index: linux/drivers/char/tty_io.c diff -u linux/drivers/char/tty_io.c:1.1.1.7 ASPcomplete/linux/drivers/char/tty_io.c:1.9 --- linux/drivers/char/tty_io.c:1.1.1.7 Mon Aug 14 19:42:36 2000 +++ linux/drivers/char/tty_io.c Fri Aug 11 20:51:59 2000 @@ -127,7 +127,8 @@ * redirect is the pseudo-tty that console output * is redirected to if asked by TIOCCONS. */ -struct tty_struct * redirect; +struct tty_struct * redirect = NULL; +rwlock_t tty_driver_guard = RW_LOCK_UNLOCKED; static void initialize_tty_struct(struct tty_struct *tty); @@ -330,20 +331,36 @@ { int major, minor; struct tty_driver *p; - + minor = MINOR(device); major = MAJOR(device); + read_lock( &tty_driver_guard ); for (p = tty_drivers; p; p = p->next) { if (p->major != major) continue; if (minor < p->minor_start) continue; if (minor >= p->minor_start + p->num) + continue; +#ifdef CONFIG_VE + if( in_interrupt_correct() ) + break; + if( major!=PTY_MASTER_MAJOR && major!=PTY_SLAVE_MAJOR +#ifdef CONFIG_UNIX98_PTYS + && (majorUNIX98_PTY_MASTER_MAJOR+UNIX98_NR_MAJORS-1) && + (majorUNIX98_PTY_SLAVE_MAJOR+UNIX98_NR_MAJORS-1) +#endif + ) break; + if( is_super_ve(p) && is_super_ve(current) ) + break; + if( !check_ve_strict(p,current) ) continue; - return p; +#endif + break; } - return NULL; + read_unlock( &tty_driver_guard ); + return p; } /* @@ -502,7 +519,7 @@ } read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if ((tty->session > 0) && (p->session == tty->session) && p->leader) { send_sig(SIGHUP,p,1); @@ -600,7 +617,7 @@ tty->pgrp = -1; read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) if (p->session == current->session) p->tty = NULL; read_unlock(&tasklist_lock); @@ -843,6 +860,11 @@ tp = o_tp = NULL; ltp = o_ltp = NULL; + if (driver->type == TTY_DRIVER_TYPE_PTY) { + if (charge_kmem(current->login_bc, PAGE_SIZE, 1)) + goto fail_no_mem; + get_beancounter(current->login_bc); + } tty = alloc_tty_struct(); if(!tty) goto fail_no_mem; @@ -852,6 +874,10 @@ tp_loc = &driver->termios[idx]; if (!*tp_loc) { + if (driver->type == TTY_DRIVER_TYPE_PTY) + if (charge_kmem(current->login_bc, + sizeof(struct termios), 1)) + goto free_mem_out; tp = (struct termios *) kmalloc(sizeof(struct termios), GFP_KERNEL); if (!tp) @@ -869,16 +895,23 @@ } if (driver->type == TTY_DRIVER_TYPE_PTY) { + if (charge_kmem(current->login_bc, PAGE_SIZE, 1)) + goto free_mem_out; o_tty = alloc_tty_struct(); if (!o_tty) goto free_mem_out; initialize_tty_struct(o_tty); + tty->charged_bc = current->login_bc; + o_tty->charged_bc = current->login_bc; o_tty->device = (kdev_t) MKDEV(driver->other->major, driver->other->minor_start + idx); o_tty->driver = *driver->other; o_tp_loc = &driver->other->termios[idx]; if (!*o_tp_loc) { + if (charge_kmem(current->login_bc, + sizeof(struct termios), 1)) + goto free_mem_out; o_tp = (struct termios *) kmalloc(sizeof(struct termios), GFP_KERNEL); if (!o_tp) @@ -994,9 +1027,23 @@ kfree(ltp); if (tp) kfree(tp); + if (driver->type == TTY_DRIVER_TYPE_PTY) { + if (o_tp) + uncharge_kmem(current->login_bc, + sizeof(struct termios)); + if (o_tty) + uncharge_kmem(current->login_bc, PAGE_SIZE); + if (tp) + uncharge_kmem(current->login_bc, + sizeof(struct termios)); + } free_tty_struct(tty); fail_no_mem: + if (driver->type == TTY_DRIVER_TYPE_PTY) { + uncharge_kmem(current->login_bc, PAGE_SIZE); + put_beancounter(current->login_bc); + } retval = -ENOMEM; goto end_init; @@ -1015,6 +1062,9 @@ { struct tty_struct *o_tty; struct termios *tp; + /* We don't check that tty is pty, because otherwise + * tty->charged_bc is NULL */ + struct user_beancounter *bc = tty->charged_bc; if ((o_tty = tty->link) != NULL) { o_tty->driver.table[idx] = NULL; @@ -1022,10 +1072,12 @@ tp = o_tty->driver.termios[idx]; o_tty->driver.termios[idx] = NULL; kfree(tp); + uncharge_kmem(bc, sizeof(struct termios)); } o_tty->magic = 0; (*o_tty->driver.refcount)--; free_tty_struct(o_tty); + uncharge_kmem(bc, PAGE_SIZE); } tty->driver.table[idx] = NULL; @@ -1033,10 +1085,13 @@ tp = tty->driver.termios[idx]; tty->driver.termios[idx] = NULL; kfree(tp); + uncharge_kmem(bc, sizeof(struct termios)); } tty->magic = 0; (*tty->driver.refcount)--; free_tty_struct(tty); + uncharge_kmem(bc, PAGE_SIZE); + put_beancounter(bc); } /* @@ -1228,7 +1283,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (p->tty == tty || (o_tty && p->tty == o_tty)) p->tty = NULL; } @@ -1307,12 +1362,20 @@ #ifdef CONFIG_VT if (device == CONSOLE_DEV) { extern int fg_console; +#ifdef CONFIG_VE + if( !is_super_ve(current) ) + return -ENODEV; +#endif device = MKDEV(TTY_MAJOR, fg_console + 1); noctty = 1; } #endif if (device == SYSCONS_DEV) { struct console *c = console_drivers; +#ifdef CONFIG_VE + if( !is_super_ve(current) ) + return -ENODEV; +#endif while(c && !c->device) c = c->next; if (!c) @@ -1568,7 +1631,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) if (p->tty == tty) p->tty = NULL; read_unlock(&tasklist_lock); @@ -1837,7 +1900,8 @@ if (tty->driver.flush_buffer) tty->driver.flush_buffer(tty); read_lock(&tasklist_lock); - for_each_task(p) { + + for_each_task_all(p){ if ((p->tty == tty) || ((session > 0) && (p->session == session))) send_sig(SIGKILL, p, 1); @@ -2061,10 +2125,12 @@ if (!driver->put_char) driver->put_char = tty_default_put_char; + write_lock_irq( &tty_driver_guard ); driver->prev = 0; driver->next = tty_drivers; if (tty_drivers) tty_drivers->prev = driver; tty_drivers = driver; + write_unlock_irq( &tty_driver_guard ); if ( !(driver->flags & TTY_DRIVER_NO_DEVFS) ) { for(i = 0; i < driver->num; i++) @@ -2088,12 +2154,14 @@ if (*driver->refcount) return -EBUSY; + read_lock( &tty_driver_guard ); for (p = tty_drivers; p; p = p->next) { if (p == driver) found++; else if (p->major == driver->major) othername = p->name; } + read_unlock( &tty_driver_guard ); if (!found) return -ENOENT; @@ -2105,6 +2173,7 @@ } else devfs_register_chrdev(driver->major, othername, &tty_fops); + write_lock_irq( &tty_driver_guard ); if (driver->prev) driver->prev->next = driver->next; else @@ -2112,6 +2181,7 @@ if (driver->next) driver->next->prev = driver->prev; + write_unlock_irq( &tty_driver_guard ); /* * Free the termios and termios_locked structures because Index: linux/fs/binfmt_elf.c diff -u linux/fs/binfmt_elf.c:1.1.1.7 ASPcomplete/linux/fs/binfmt_elf.c:1.4 --- linux/fs/binfmt_elf.c:1.1.1.7 Mon Aug 14 19:41:10 2000 +++ linux/fs/binfmt_elf.c Mon Aug 7 19:08:00 2000 @@ -1201,11 +1201,11 @@ pte_t *pte; pgd = pgd_offset(vma->vm_mm, addr); - pmd = pmd_alloc(pgd, addr); + pmd = pmd_alloc(vma->vm_mm->beancounter, pgd, addr); if (!pmd) goto end_coredump; - pte = pte_alloc(pmd, addr); + pte = pte_alloc(vma->vm_mm->beancounter, pmd, addr); if (!pte) goto end_coredump; if (!pte_present(*pte) && Index: linux/fs/block_dev.c diff -u linux/fs/block_dev.c:1.1.1.6 ASPcomplete/linux/fs/block_dev.c:1.4 --- linux/fs/block_dev.c:1.1.1.6 Mon Aug 14 19:41:09 2000 +++ linux/fs/block_dev.c Sat Aug 12 19:36:41 2000 @@ -460,6 +460,14 @@ { const struct block_device_operations *ret = NULL; +#ifdef CONFIG_VE + int pmask = get_device_perms_ve( current->envid->veid, + S_IFBLK, + MKDEV(major,0) ); + if( (pmask&S_IRWXO)!=S_IRWXO ) + return NULL; +#endif + /* major 0 is used for non-device mounts */ if (major && major < MAX_BLKDEV) { #ifdef CONFIG_KMOD @@ -576,8 +584,7 @@ int ret = -ENODEV; kdev_t rdev = to_kdev_t(bdev->bd_dev); /* this should become bdev */ down(&bdev->bd_sem); - if (!bdev->bd_op) - bdev->bd_op = get_blkfops(MAJOR(rdev)); + bdev->bd_op = get_blkfops(MAJOR(rdev)); if (bdev->bd_op) { /* * This crockload is due to bad choice of ->open() type. @@ -615,8 +622,7 @@ struct block_device *bdev = inode->i_bdev; down(&bdev->bd_sem); lock_kernel(); - if (!bdev->bd_op) - bdev->bd_op = get_blkfops(MAJOR(inode->i_rdev)); + bdev->bd_op = get_blkfops(MAJOR(inode->i_rdev)); if (bdev->bd_op) { ret = 0; if (bdev->bd_op->open) Index: linux/fs/devices.c diff -u linux/fs/devices.c:1.1.1.6 ASPcomplete/linux/fs/devices.c:1.3 --- linux/fs/devices.c:1.1.1.6 Mon Aug 14 19:41:09 2000 +++ linux/fs/devices.c Mon Aug 7 19:08:00 2000 @@ -68,6 +68,13 @@ { struct file_operations *ret = NULL; +#ifdef CONFIG_VE + int pmask = get_device_perms_ve( current->envid->veid, + S_IFCHR, + MKDEV(major,minor) ); + if( (pmask&S_IRWXO)!=S_IRWXO ) + return NULL; +#endif if (!major || major >= MAX_CHRDEV) return NULL; Index: linux/fs/exec.c diff -u linux/fs/exec.c:1.1.1.7 ASPcomplete/linux/fs/exec.c:1.7 --- linux/fs/exec.c:1.1.1.7 Mon Aug 14 19:41:10 2000 +++ linux/fs/exec.c Mon Aug 7 19:08:00 2000 @@ -117,7 +117,7 @@ if (error) goto exit; - file = dentry_open(nd.dentry, nd.mnt, O_RDONLY); + file = dentry_open(nd.dentry, nd.mnt, FMODE_READ, O_RDONLY); error = PTR_ERR(file); if (IS_ERR(file)) goto out; @@ -264,13 +264,13 @@ if (page_count(page) != 1) printk("mem_map disagrees with %p at %08lx\n", page, address); pgd = pgd_offset(tsk->mm, address); - pmd = pmd_alloc(pgd, address); + pmd = pmd_alloc(tsk->mm->beancounter, pgd, address); if (!pmd) { __free_page(page); force_sig(SIGKILL, tsk); return; } - pte = pte_alloc(pmd, address); + pte = pte_alloc(tsk->mm->beancounter, pmd, address); if (!pte) { __free_page(page); force_sig(SIGKILL, tsk); @@ -299,6 +299,10 @@ bprm->loader += stack_base; bprm->exec += stack_base; + mpnt = NULL; + if (!charge_memory(current->mm->beancounter, + STACK_TOP - (PAGE_MASK & (unsigned long) bprm->p), + VM_STACK_FLAGS, 1)) mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (!mpnt) return -ENOMEM; @@ -351,7 +355,7 @@ int err = permission(inode, MAY_EXEC); file = ERR_PTR(err); if (!err) { - file = dentry_open(nd.dentry, nd.mnt, O_RDONLY); + file = dentry_open(nd.dentry, nd.mnt, FMODE_READ, O_RDONLY); if (!IS_ERR(file)) { err = deny_write_access(file); if (err) { @@ -398,7 +402,7 @@ return 0; } - mm = mm_alloc(); + mm = mm_alloc(current->login_bc); if (mm) { struct mm_struct *active_mm = current->active_mm; Index: linux/fs/fcntl.c diff -u linux/fs/fcntl.c:1.1.1.6 ASPcomplete/linux/fs/fcntl.c:1.5 --- linux/fs/fcntl.c:1.1.1.6 Mon Aug 14 19:41:09 2000 +++ linux/fs/fcntl.c Sat Aug 12 19:36:41 2000 @@ -350,16 +350,24 @@ int pid = fown->pid; read_lock(&tasklist_lock); - if ( (pid > 0) && (p = find_task_by_pid(pid)) ) { + if ( (pid > 0) && (p = find_task_by_pid_all(pid)) ) { +#ifdef CONFIG_VE_SIGIO + if( !check_ve( p, fown )) + goto out; +#endif send_sigio_to_task(p, fown, fa, band); goto out; } - for_each_task(p) { + for_each_task_all(p) { int match = p->pid; if (pid < 0) match = -p->pgrp; if (pid != match) continue; +#ifdef CONFIG_VE_SIGIO + if( !check_ve( p, fown )) + continue; +#endif send_sigio_to_task(p, fown, fa, band); } out: Index: linux/fs/file_table.c diff -u linux/fs/file_table.c:1.1.1.7 ASPcomplete/linux/fs/file_table.c:1.5 --- linux/fs/file_table.c:1.1.1.7 Mon Aug 14 19:41:10 2000 +++ linux/fs/file_table.c Mon Aug 7 19:08:00 2000 @@ -46,6 +46,9 @@ f->f_version = ++event; f->f_uid = current->fsuid; f->f_gid = current->fsgid; +#ifdef CONFIG_VE_SIGIO + f->f_owner.envid = current->envid; +#endif file_list_lock(); list_add(&f->f_list, &anon_list); file_list_unlock(); Index: linux/fs/locks.c diff -u linux/fs/locks.c:1.1.1.5 ASPcomplete/linux/fs/locks.c:1.2 --- linux/fs/locks.c:1.1.1.5 Mon Aug 14 19:41:09 2000 +++ linux/fs/locks.c Tue Aug 1 15:30:45 2000 @@ -118,16 +118,36 @@ static kmem_cache_t *filelock_cache; /* Allocate an empty lock structure. */ -static struct file_lock *locks_alloc_lock(void) +static struct file_lock *locks_alloc_lock(int do_charge) { - struct file_lock *fl; - fl = kmem_cache_alloc(filelock_cache, SLAB_KERNEL); - return fl; + struct file_lock *fl = NULL; + struct user_beancounter *bc = current->login_bc; + if (do_charge) { + get_beancounter(bc); + if (charge_flock(bc, sizeof(struct file_lock))) + goto out_with_put; + } + /* Okay, let's make a new file_lock structure... */ + fl = (struct file_lock *) kmem_cache_alloc(filelock_cache, SLAB_KERNEL); + if (!fl) + goto out_uncharge; + fl->charged_bc = bc; + return fl; + +out_uncharge: + if (do_charge) { + uncharge_flock(bc, sizeof(struct file_lock)); +out_with_put: + put_beancounter(bc); + } + return fl; } /* Free a lock which is not in use. */ -static inline void locks_free_lock(struct file_lock *fl) +static inline void locks_free_lock(struct file_lock *fl, int do_uncharge) { + struct user_beancounter *bc; + if (fl == NULL) { BUG(); return; @@ -142,6 +162,11 @@ if (!list_empty(&fl->fl_link)) panic("Attempting to free lock on active lock list"); + if (do_uncharge) { + bc = fl->charged_bc; + uncharge_flock(bc, sizeof(struct file_lock)); + put_beancounter(bc); + } kmem_cache_free(filelock_cache, fl); } @@ -168,6 +193,7 @@ */ static void locks_copy_lock(struct file_lock *new, struct file_lock *fl) { + new->charged_bc = fl->charged_bc; new->fl_owner = fl->fl_owner; new->fl_pid = fl->fl_pid; new->fl_file = fl->fl_file; @@ -184,7 +210,7 @@ /* Fill in a file_lock structure with an appropriate FLOCK lock. */ static struct file_lock *flock_make_lock(struct file *filp, unsigned int type) { - struct file_lock *fl = locks_alloc_lock(); + struct file_lock *fl = locks_alloc_lock(1); if (fl == NULL) return NULL; @@ -371,7 +397,7 @@ fl->fl_type = F_UNLCK; lock(fl->fl_file, F_SETLK, fl); } - locks_free_lock(fl); + locks_free_lock(fl, 1); } /* Determine if lock sys_fl blocks lock caller_fl. Common functionality @@ -517,9 +543,13 @@ size_t count) { struct file_lock *fl; - struct file_lock *new_fl = locks_alloc_lock(); - int error; + struct file_lock *new_fl = locks_alloc_lock(1); + int error = -ENOLCK; + + if (!new_fl) + goto out; + new_fl->charged_bc = current->login_bc; new_fl->fl_owner = current->files; new_fl->fl_pid = current->pid; new_fl->fl_file = filp; @@ -569,7 +599,8 @@ } } unlock_kernel(); - locks_free_lock(new_fl); + locks_free_lock(new_fl, 1); +out: return error; } @@ -594,7 +625,7 @@ error = -ENOLCK; new_fl = flock_make_lock(filp, lock_type); if (!new_fl) - goto out; + goto out_return; } error = 0; @@ -643,12 +674,13 @@ goto repeat; } locks_insert_lock(&inode->i_flock, new_fl); - new_fl = NULL; error = 0; + goto out_return; out: if (new_fl) - locks_free_lock(new_fl); + locks_free_lock(new_fl, 1); +out_return: return error; } @@ -679,10 +711,13 @@ * We may need two file_lock structures for this operation, * so we get them in advance to avoid races. */ - new_fl = locks_alloc_lock(); - new_fl2 = locks_alloc_lock(); + if (caller->fl_type == F_UNLCK) + new_fl = NULL; + else + new_fl = locks_alloc_lock(1); + new_fl2 = locks_alloc_lock(0); error = -ENOLCK; /* "no luck" */ - if (!(new_fl && new_fl2)) + if (!((caller->fl_type == F_UNLCK || new_fl) && new_fl2)) goto out; if (caller->fl_type != F_UNLCK) { @@ -809,9 +844,15 @@ if (!added) { if (caller->fl_type == F_UNLCK) goto out; + error = -ENOLCK; + if (right && (left == right)) + if (charge_flock(current->login_bc, + sizeof(struct file_lock))) + goto out; locks_copy_lock(new_fl, caller); locks_insert_lock(before, new_fl); new_fl = NULL; + error = 0; } if (right) { if (left == right) { @@ -819,10 +860,16 @@ * so we have to use the second new lock (in this * case, even F_UNLCK may fail!). */ + error = -ENOLCK; + if (added) + if (charge_flock(current->login_bc, + sizeof(struct file_lock))) + goto out; locks_copy_lock(new_fl2, right); locks_insert_lock(before, left); left = new_fl2; new_fl2 = NULL; + error = 0; } right->fl_start = caller->fl_end + 1; locks_wake_up_blocks(right, 0); @@ -836,9 +883,9 @@ * Free any unused locks. */ if (new_fl) - locks_free_lock(new_fl); + locks_free_lock(new_fl, 1); if (new_fl2) - locks_free_lock(new_fl2); + locks_free_lock(new_fl2, 0); return error; } @@ -893,10 +940,14 @@ int fcntl_getlk(unsigned int fd, struct flock *l) { struct file *filp; - struct file_lock *fl, *file_lock = locks_alloc_lock(); + struct file_lock *fl, *file_lock = locks_alloc_lock(0); struct flock flock; int error; + error = -ENOLCK; + if (!file_lock) + goto out_return; + error = -EFAULT; if (copy_from_user(&flock, l, sizeof(flock))) goto out; @@ -941,7 +992,8 @@ out_putf: fput(filp); out: - locks_free_lock(file_lock); + locks_free_lock(file_lock, 0); +out_return: return error; } @@ -951,11 +1003,15 @@ int fcntl_setlk(unsigned int fd, unsigned int cmd, struct flock *l) { struct file *filp; - struct file_lock *file_lock = locks_alloc_lock(); + struct file_lock *file_lock = locks_alloc_lock(0); struct flock flock; struct inode *inode; int error; + error = -ENOLCK; + if (!file_lock) + goto out_return; + /* * This might block, so we do it before checking the inode. */ @@ -1040,7 +1096,8 @@ out_putf: fput(filp); out: - locks_free_lock(file_lock); + locks_free_lock(file_lock, 0); +out_return: return error; } Index: linux/fs/namei.c diff -u linux/fs/namei.c:1.1.1.6 ASPcomplete/linux/fs/namei.c:1.8 --- linux/fs/namei.c:1.1.1.6 Mon Aug 14 19:41:09 2000 +++ linux/fs/namei.c Mon Aug 7 19:08:00 2000 @@ -383,6 +383,13 @@ read_unlock(¤t->fs->lock); break; } +#ifdef CONFIG_VE + if (nd->dentry == current->envid->fs_root && + nd->mnt == current->envid->fs_rootmnt) { + read_unlock(¤t->fs->lock); + break; + } +#endif read_unlock(¤t->fs->lock); spin_lock(&dcache_lock); if (nd->dentry != nd->mnt->mnt_root) { @@ -514,10 +521,10 @@ err = -ENOENT; inode = nd->dentry->d_inode; if (!inode) - break; + goto out_dput; err = -ENOTDIR; if (!inode->i_op) - break; + goto out_dput; } else { dput(nd->dentry); nd->dentry = dentry; Index: linux/fs/open.c diff -u linux/fs/open.c:1.1.1.6 ASPcomplete/linux/fs/open.c:1.4 --- linux/fs/open.c:1.1.1.6 Mon Aug 14 19:41:10 2000 +++ linux/fs/open.c Sat Aug 12 19:36:41 2000 @@ -587,12 +587,12 @@ } /* - * Note that while the flag value (low two bits) for sys_open means: + * Note that while the flags value (low two bits) for sys_open means: * 00 - read-only * 01 - write-only * 10 - read-write * 11 - special - * it is changed into + * when it is copied into open_flags, it is changed into * 00 - no permissions needed * 01 - read-permission * 10 - write-permission @@ -602,23 +602,24 @@ */ struct file *filp_open(const char * filename, int flags, int mode) { - int namei_flags, error; + int namei_flags, open_flags, error; struct nameidata nd; namei_flags = flags; - if ((namei_flags+1) & O_ACCMODE) + open_flags = ((flags + 1) & O_ACCMODE); + if (open_flags) namei_flags++; if (namei_flags & O_TRUNC) namei_flags |= 2; error = open_namei(filename, namei_flags, mode, &nd); if (!error) - return dentry_open(nd.dentry, nd.mnt, flags); + return dentry_open(nd.dentry, nd.mnt, open_flags, flags); return ERR_PTR(error); } -struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags) +struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int mode, int flags) { struct file * f; struct inode *inode; @@ -629,7 +630,7 @@ if (!f) goto cleanup_dentry; f->f_flags = flags; - f->f_mode = (flags+1) & O_ACCMODE; + f->f_mode = mode; inode = dentry->d_inode; if (f->f_mode & FMODE_WRITE) { error = get_write_access(inode); Index: linux/fs/super.c diff -u linux/fs/super.c:1.1.1.7 ASPcomplete/linux/fs/super.c:1.6 --- linux/fs/super.c:1.1.1.7 Mon Aug 14 19:41:10 2000 +++ linux/fs/super.c Sat Aug 12 19:36:41 2000 @@ -767,6 +767,15 @@ error = -EACCES; if (IS_NODEV(inode)) goto out; +#ifdef CONFIG_VE + { + int pmask = get_device_perms_ve(current->envid->veid, + S_IFBLK, + inode->i_rdev); + if ((pmask & S_IRWXO) != S_IRWXO) + goto out; + } +#endif bdev = inode->i_bdev; bdops = devfs_get_ops ( devfs_get_handle_from_inode (inode) ); if (bdops) bdev->bd_op = bdops; @@ -1569,7 +1578,7 @@ struct fs_struct *fs; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { task_lock(p); fs = p->fs; if (fs) { Index: linux/fs/proc/array.c diff -u linux/fs/proc/array.c:1.1.1.5 ASPcomplete/linux/fs/proc/array.c:1.7 --- linux/fs/proc/array.c:1.1.1.5 Mon Aug 14 19:41:32 2000 +++ linux/fs/proc/array.c Sat Aug 12 22:02:29 2000 @@ -172,6 +172,10 @@ buffer += sprintf(buffer, "%d ", p->groups[g]); buffer += sprintf(buffer, "\n"); + +#ifdef CONFIG_VE + buffer += sprintf(buffer, "envID:\t%d\n", p->envid->veid); +#endif return buffer; } @@ -352,7 +356,7 @@ read_unlock(&tasklist_lock); res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \ %lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %lu %lu %ld %lu %lu %lu %lu %lu \ -%lu %lu %lu %lu %lu %lu %lu %lu %d %d\n", +%lu %lu %lu %lu %lu %lu %lu %lu %d %d %d\n", task->pid, task->comm, state, @@ -374,7 +378,11 @@ nice, 0UL /* removed */, task->it_real_value, +#ifndef CONFIG_VE task->start_time, +#else + task->start_time - current->envid->init_entry->start_time, +#endif vsize, mm ? mm->rss : 0, /* you might want to shift this left 3 */ task->rlim ? task->rlim[RLIMIT_RSS].rlim_cur : 0, @@ -395,7 +403,13 @@ task->nswap, task->cnswap, task->exit_signal, - task->processor); + task->processor, +#ifdef CONFIG_VE + task->envid->veid +#else + 0 +#endif + ); if(mm) mmput(mm); return res; Index: linux/fs/proc/base.c diff -u linux/fs/proc/base.c:1.1.1.6 ASPcomplete/linux/fs/proc/base.c:1.4 --- linux/fs/proc/base.c:1.1.1.6 Mon Aug 14 19:41:32 2000 +++ linux/fs/proc/base.c Mon Jul 17 15:19:43 2000 @@ -934,7 +934,7 @@ } read_lock(&tasklist_lock); - task = find_task_by_pid(pid); + task = find_task_by_pid_ve(pid); if (task) get_task_struct(task); read_unlock(&tasklist_lock); @@ -983,7 +983,7 @@ index--; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { int pid = p->pid; if (!pid) continue; Index: linux/fs/proc/inode.c diff -u linux/fs/proc/inode.c:1.1.1.4 ASPcomplete/linux/fs/proc/inode.c:1.4 --- linux/fs/proc/inode.c:1.1.1.4 Mon Aug 14 19:41:32 2000 +++ linux/fs/proc/inode.c Fri Jul 7 22:07:33 2000 @@ -166,6 +166,10 @@ inode->i_op = de->proc_iops; if (de->proc_fops) inode->i_fop = de->proc_fops; +#ifdef CONFIG_VE + if (S_ISREG(de->mode) && !is_super_ve(current)) + inode->i_mode &= ~S_IWUGO; +#endif } } @@ -194,7 +198,7 @@ * Fixup the root inode's nlink value */ read_lock(&tasklist_lock); - for_each_task(p) if (p->pid) root_inode->i_nlink++; + for_each_task_ve(p) if (p->pid) root_inode->i_nlink++; read_unlock(&tasklist_lock); s->s_root = d_alloc_root(root_inode); if (!s->s_root) Index: linux/fs/proc/proc_misc.c diff -u linux/fs/proc/proc_misc.c:1.1.1.6 ASPcomplete/linux/fs/proc/proc_misc.c:1.6 --- linux/fs/proc/proc_misc.c:1.1.1.6 Mon Aug 14 19:41:32 2000 +++ linux/fs/proc/proc_misc.c Mon Aug 7 19:08:00 2000 @@ -97,7 +97,11 @@ unsigned long idle; int len; +#ifndef CONFIG_VE uptime = jiffies; +#else + uptime = jiffies - current->envid->init_entry->start_time; +#endif idle = init_tasks[0]->times.tms_utime + init_tasks[0]->times.tms_stime; /* The formula for the fraction parts really is ((t * 100) / HZ) % 100, but @@ -290,6 +294,9 @@ unsigned int sum = 0, user = 0, nice = 0, system = 0; int major, disk; +#ifdef CONFIG_VE + jif -= current->envid->init_entry->start_time; +#endif for (i = 0 ; i < smp_num_cpus; i++) { int cpu = cpu_logical_map(i), j; Index: linux/fs/proc/proc_tty.c diff -u linux/fs/proc/proc_tty.c:1.1.1.3 ASPcomplete/linux/fs/proc/proc_tty.c:1.2 --- linux/fs/proc/proc_tty.c:1.1.1.3 Mon Aug 14 19:41:32 2000 +++ linux/fs/proc/proc_tty.c Sun Jun 18 13:04:04 2000 @@ -15,6 +15,7 @@ #include extern struct tty_driver *tty_drivers; /* linked list of tty drivers */ +extern rwlock_t tty_driver_guard; extern struct tty_ldisc ldiscs[]; @@ -40,6 +41,7 @@ char range[20], deftype[20]; char *type; + read_lock( &tty_driver_guard ); for (p = tty_drivers; p; p = p->next) { if (p->num > 1) sprintf(range, "%d-%d", p->minor_start, @@ -89,6 +91,7 @@ len = 0; } } + read_unlock( &tty_driver_guard ); if (!p) *eof = 1; if (off >= len+begin) Index: linux/include/asm-alpha/pgalloc.h diff -u linux/include/asm-alpha/pgalloc.h:1.1.1.5 ASPcomplete/linux/include/asm-alpha/pgalloc.h:1.4 --- linux/include/asm-alpha/pgalloc.h:1.1.1.5 Mon Aug 14 19:43:37 2000 +++ linux/include/asm-alpha/pgalloc.h Sat Aug 12 19:36:41 2000 @@ -384,7 +384,7 @@ pgd_t *pgd; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/asm-arm/pgalloc.h diff -u linux/include/asm-arm/pgalloc.h:1.1.1.3 ASPcomplete/linux/include/asm-arm/pgalloc.h:1.2 --- linux/include/asm-arm/pgalloc.h:1.1.1.3 Mon Aug 14 19:43:42 2000 +++ linux/include/asm-arm/pgalloc.h Sun Jun 18 13:04:04 2000 @@ -191,7 +191,7 @@ struct task_struct * p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/asm-arm/proc-armo/cache.h diff -u linux/include/asm-arm/proc-armo/cache.h:1.1.1.4 ASPcomplete/linux/include/asm-arm/proc-armo/cache.h:1.3 --- linux/include/asm-arm/proc-armo/cache.h:1.1.1.4 Mon Aug 14 19:43:43 2000 +++ linux/include/asm-arm/proc-armo/cache.h Mon Aug 7 19:08:00 2000 @@ -31,7 +31,7 @@ struct task_struct *p; cpu_memc_update_all(init_mm.pgd); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; cpu_memc_update_all(p->mm->pgd); Index: linux/include/asm-i386/pgalloc-2level.h diff -u linux/include/asm-i386/pgalloc-2level.h:1.1.1.3 ASPcomplete/linux/include/asm-i386/pgalloc-2level.h:1.2 --- linux/include/asm-i386/pgalloc-2level.h:1.1.1.3 Mon Aug 14 19:43:38 2000 +++ linux/include/asm-i386/pgalloc-2level.h Tue Jul 25 11:52:10 2000 @@ -12,12 +12,16 @@ extern __inline__ void free_pmd_fast(pmd_t *pmd) { } extern __inline__ void free_pmd_slow(pmd_t *pmd) { } +extern __inline__ void free_pmd_slow_kernel(pmd_t *pmd) { } -extern inline pmd_t * pmd_alloc(pgd_t *pgd, unsigned long address) +extern inline pmd_t * pmd_alloc_kernel(pgd_t *pgd, unsigned long address) { if (!pgd) BUG(); return (pmd_t *) pgd; } +#define pmd_free_kernel(pmd) free_pmd_slow(pmd) +#define pmd_alloc(bc, pgd, address) pmd_alloc_kernel(pgd, address) +#define pmd_free(bc, pmd) pmd_free_kernel(pmd) #endif /* _I386_PGALLOC_2LEVEL_H */ Index: linux/include/asm-i386/pgalloc-3level.h diff -u linux/include/asm-i386/pgalloc-3level.h:1.1.1.3 ASPcomplete/linux/include/asm-i386/pgalloc-3level.h:1.2 --- linux/include/asm-i386/pgalloc-3level.h:1.1.1.3 Mon Aug 14 19:43:39 2000 +++ linux/include/asm-i386/pgalloc-3level.h Tue Jul 25 11:52:10 2000 @@ -37,13 +37,19 @@ pgtable_cache_size++; } -extern __inline__ void free_pmd_slow(pmd_t *pmd) +extern __inline__ void free_pmd_slow_kernel(pmd_t *pmd) { free_page((unsigned long)pmd); } -extern inline pmd_t * pmd_alloc(pgd_t *pgd, unsigned long address) +extern __inline__ void free_pmd_slow(struct user_beancounter *bc, pmd_t *pmd) { + free_page((unsigned long)pmd); + uncharge_kmem(bc, PAGE_SIZE); +} + +extern inline pmd_t * pmd_alloc_kernel(pgd_t *pgd, unsigned long address) +{ if (!pgd) BUG(); address = (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); @@ -64,5 +70,39 @@ } return (pmd_t *)pgd_page(*pgd) + address; } + +extern inline pmd_t * pmd_alloc(struct user_beancounter *bc, pgd_t *pgd, unsigned long address) +{ + if (!pgd) + BUG(); + address = (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); + if (pgd_none(*pgd)) { + pmd_t *page; + + if (charge_kmem(bc, PAGE_SIZE, 0)) + return NULL; + + page = get_pmd_fast(); + if (!page) + page = get_pmd_slow(); + if (page) { + if (pgd_none(*pgd)) { + set_pgd(pgd, __pgd(1 + __pa(page))); + __flush_tlb(); + return page + address; + } else { + free_pmd_fast(page); + uncharge_kmem(bc, PAGE_SIZE); + } + } else { + uncharge_kmem(bc, PAGE_SIZE); + return NULL; + } + } + return (pmd_t *)pgd_page(*pgd) + address; +} + +#define pmd_free_kernel free_pmd_slow_kernel +#define pmd_free free_pmd_slow #endif /* _I386_PGALLOC_3LEVEL_H */ Index: linux/include/asm-i386/pgalloc.h diff -u linux/include/asm-i386/pgalloc.h:1.1.1.4 ASPcomplete/linux/include/asm-i386/pgalloc.h:1.4 --- linux/include/asm-i386/pgalloc.h:1.1.1.4 Mon Aug 14 19:43:38 2000 +++ linux/include/asm-i386/pgalloc.h Mon Aug 7 19:08:00 2000 @@ -65,8 +65,28 @@ free_page((unsigned long)pgd); } -extern pte_t *get_pte_slow(pmd_t *pmd, unsigned long address_preadjusted); -extern pte_t *get_pte_kernel_slow(pmd_t *pmd, unsigned long address_preadjusted); +extern __inline__ pgd_t *pgd_alloc(struct user_beancounter *bc) +{ + pgd_t *ret; + + if (charge_kmem(bc, PAGE_SIZE, 1)) + return NULL; + ret = get_pgd_fast(); + if (ret == NULL) + uncharge_kmem(bc, PAGE_SIZE); + return ret; +} + +extern __inline__ void pgd_free(struct user_beancounter *bc, pgd_t *pgd) +{ + free_pgd_slow(pgd); + uncharge_kmem(bc, PAGE_SIZE); +} + +extern pte_t *get_pte_slow(struct user_beancounter *bc, pmd_t *pmd, + unsigned long address_preadjusted); +extern pte_t *get_pte_kernel_slow(pmd_t *pmd, + unsigned long address_preadjusted); extern __inline__ pte_t *get_pte_fast(void) { @@ -80,23 +100,28 @@ return (pte_t *)ret; } -extern __inline__ void free_pte_fast(pte_t *pte) +extern __inline__ void free_pte_fast(struct user_beancounter *bc, pte_t *pte) { *(unsigned long *)pte = (unsigned long) pte_quicklist; pte_quicklist = (unsigned long *) pte; pgtable_cache_size++; + uncharge_kmem(bc, PAGE_SIZE); } -extern __inline__ void free_pte_slow(pte_t *pte) +extern __inline__ void free_pte_slow_kernel(pte_t *pte) { free_page((unsigned long)pte); } -#define pte_free_kernel(pte) free_pte_slow(pte) -#define pte_free(pte) free_pte_slow(pte) -#define pgd_free(pgd) free_pgd_slow(pgd) -#define pgd_alloc() get_pgd_fast() +#define pte_free_kernel(pte) free_pte_slow_kernel(pte) +#define pte_free(bc, pte) free_pte_slow(bc, pte) +extern __inline__ void free_pte_slow(struct user_beancounter *bc, pte_t *pte) +{ + free_page((unsigned long)pte); + uncharge_kmem(bc, PAGE_SIZE); +} + extern inline pte_t * pte_alloc_kernel(pmd_t * pmd, unsigned long address) { if (!pmd) @@ -117,7 +142,7 @@ return (pte_t *) pmd_page(*pmd) + address; } -extern inline pte_t * pte_alloc(pmd_t * pmd, unsigned long address) +extern inline pte_t * pte_alloc(struct user_beancounter *bc, pmd_t * pmd, unsigned long address) { address = (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); @@ -128,10 +153,13 @@ return (pte_t *)pmd_page(*pmd) + address; getnew: { - unsigned long page = (unsigned long) get_pte_fast(); + unsigned long page; + if (charge_kmem(bc, PAGE_SIZE, 0)) + return NULL; + page = (unsigned long) get_pte_fast(); if (!page) - return get_pte_slow(pmd, address); + return get_pte_slow(bc, pmd, address); set_pmd(pmd, __pmd(_PAGE_TABLE + __pa(page))); return (pte_t *)page + address; } @@ -140,16 +168,6 @@ return NULL; } -/* - * allocating and freeing a pmd is trivial: the 1-entry pmd is - * inside the pgd, so has no extra memory associated with it. - * (In the PAE case we free the page.) - */ -#define pmd_free(pmd) free_pmd_slow(pmd) - -#define pmd_free_kernel pmd_free -#define pmd_alloc_kernel pmd_alloc - extern int do_check_pgt_cache(int, int); extern inline void set_pgdir(unsigned long address, pgd_t entry) @@ -161,7 +179,7 @@ #endif read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/asm-i386/unistd.h diff -u linux/include/asm-i386/unistd.h:1.1.1.3 ASPcomplete/linux/include/asm-i386/unistd.h:1.6 --- linux/include/asm-i386/unistd.h:1.1.1.3 Mon Aug 14 19:43:38 2000 +++ linux/include/asm-i386/unistd.h Tue Jul 25 11:52:10 2000 @@ -225,6 +225,14 @@ #define __NR_mincore 218 #define __NR_madvise 219 #define __NR_madvise1 219 /* delete when C lib stub is removed */ +#define __NR_getluid 220 +#define __NR_setluid 221 +#define __NR_setublimit 222 +#define __NR_env_create 223 +#define __NR_mark_env_to_down 224 +#define __NR_setdevperms 225 +#define __NR_ve_conf_request 226 + /* user-visible error numbers are in the range -1 - -124: see */ Index: linux/include/asm-mips/pgalloc.h diff -u linux/include/asm-mips/pgalloc.h:1.1.1.3 ASPcomplete/linux/include/asm-mips/pgalloc.h:1.2 --- linux/include/asm-mips/pgalloc.h:1.1.1.3 Mon Aug 14 19:43:40 2000 +++ linux/include/asm-mips/pgalloc.h Sun Jun 18 13:04:04 2000 @@ -195,7 +195,7 @@ #endif read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/asm-ppc/pgalloc.h diff -u linux/include/asm-ppc/pgalloc.h:1.1.1.3 ASPcomplete/linux/include/asm-ppc/pgalloc.h:1.2 --- linux/include/asm-ppc/pgalloc.h:1.1.1.3 Mon Aug 14 19:43:43 2000 +++ linux/include/asm-ppc/pgalloc.h Sun Jun 18 13:04:04 2000 @@ -60,7 +60,7 @@ #endif read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/asm-sh/pgalloc.h diff -u linux/include/asm-sh/pgalloc.h:1.1.1.3 ASPcomplete/linux/include/asm-sh/pgalloc.h:1.2 --- linux/include/asm-sh/pgalloc.h:1.1.1.3 Mon Aug 14 19:43:35 2000 +++ linux/include/asm-sh/pgalloc.h Sun Jun 18 13:04:04 2000 @@ -147,7 +147,7 @@ pgd_t *pgd; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (!p->mm) continue; *pgd_offset(p->mm,address) = entry; Index: linux/include/linux/beancounter.h diff -uN /dev/null linux/include/linux/beancounter.h:1.2 --- /dev/null Mon Aug 14 20:39:33 2000 +++ linux/include/linux/beancounter.h Tue Aug 1 15:30:45 2000 @@ -0,0 +1,202 @@ +#ifndef _LINUX_BEANCOUNTER_H +#define _LINUX_BEANCOUNTER_H + +/* + * Resource list. + */ + +#define UB_KMEMSIZE 0 /* Unswappable kernel memory size including + struct task, page directories, etc. + Still unimplemeted are: + socket buffers, files, temporary + space for poll, siginfo + */ +#define UB_LOCKEDPAGES 1 /* Mlock()ed pages. */ +#define UB_TOTVMPAGES 2 /* Address space size in pages. */ +#define UB_SHMPAGES 3 /* IPC SHM segment size. */ +#define UB_ZSHMPAGES 4 /* Anonymous shared memory. */ +#define UB_NUMPROC 5 /* Number of processes. */ +#define UB_RESPAGES 6 /* All resident pages, for swapout guarantee. */ +#define UB_SPCGUARPAGES 7 /* Guarantees for address space allocations. + Only barier is used, no accounting. + */ +#define UB_OOMGUARPAGES 8 /* Guarantees against OOM kill. + Only limit is used, no accounting. + */ +#define UB_NUMSOCK 9 /* Number of sockets. */ +#define UB_NUMFLOCK 10 /* Number of file locks. */ +#define UB_NUMPTY 11 /* Number of PTYs. */ +#define UB_NUMSIGINFO 12 /* Number of siginfos. */ +#define UB_RESOURCES 13 + + +#ifdef __KERNEL__ + +#include +#include + +/* + * UB_MAXVALUE is essentially LONG_MAX declared in a cross-compiling safe form. + */ +#define UB_MAXVALUE ( (1UL << (sizeof(unsigned long)*8-1)) - 1) + + +/* + * Resource management structures + * Serialization issues: + * beancounter list management is protected via per hash entry lock + * task pointers and luid are set only for current task and only once + * refcount is managed atomically + * value and limit comparison and change are protected by per-node spinlock + */ + +struct user_beancounter +{ + atomic_t ub_refcount; + struct user_beancounter *ub_next; + spinlock_t ub_lock; + uid_t ub_uid; + /* dynamic swap-out priority */ + signed long ub_swp_pri; + /* total weight of resident pages, see mm/kubd.c */ + unsigned long long ub_held_pages; + /* consumed resources */ + unsigned long ub_held[UB_RESOURCES]; + /* maximum amount of consumed resources through the whole lifetime */ + unsigned long ub_maxheld[UB_RESOURCES]; /* Max held resources. */ + /* A barrier over which resource allocations are failed gracefully. + * If the amount of consumed memory is over the barrier further sbrk() + * or mmap() calls fail, the existing processes are not killed. */ + unsigned long ub_barrier[UB_RESOURCES]; + /* hard resource limit */ + unsigned long ub_limit[UB_RESOURCES]; +}; + + +#ifndef CONFIG_USER_RESOURCE + +extern inline void get_beancounter(struct user_beancounter *ub) {;} +extern inline void put_beancounter(struct user_beancounter *ub) {;} + +#define UB_DECLARE_CHARGE(name, dclargs, realargs) \ +extern inline int charge_##name dclargs { return 0; } +#define UB_DECLARE_UNCHARGE(name, dclargs, realargs) \ +extern inline void uncharge_##name dclargs { ; } + +#else + +/* + * NULL beancounter is handled separately for better performance of tasks + * without a beancounter. But I'm not sure how much it gains in real life. + * --SAW + */ + +extern void __put_beancounter(struct user_beancounter *ub); + +extern struct task_struct *select_worst_task(void); + +extern inline void put_beancounter(struct user_beancounter *ub) +{ + if (ub == NULL) + return; + __put_beancounter(ub); +} + +/* + * Create a new beancounter reference + */ + +extern inline void get_beancounter(struct user_beancounter *ub) +{ + if (ub == NULL) + return; +#if 0 + printk("get beancounter %p for %.20s pid %d\n", ub, current->comm, current->pid); +#endif + atomic_inc(&ub->ub_refcount); +} + +#define UB_DECLARE_CHARGE(name, dclargs, realargs) \ +extern int __charge_##name dclargs; \ + \ +extern inline int charge_##name dclargs \ +{ \ + if (ub == NULL) \ + return 0; \ + return __charge_##name realargs; \ +} + +#define UB_DECLARE_UNCHARGE(name, dclargs, realargs) \ +extern void __uncharge_##name dclargs; \ + \ +extern inline void uncharge_##name dclargs \ +{ \ + if (ub == NULL) \ + return; \ + __uncharge_##name realargs; \ +} + +#endif /* CONFIG_USER_RESOURCE */ + + +/* + * Resource charging + * Change user's account and compare against limits + */ + +UB_DECLARE_CHARGE(task, (struct user_beancounter *ub), (ub)) +UB_DECLARE_UNCHARGE(task, (struct user_beancounter *ub), (ub)) +UB_DECLARE_CHARGE(memory, + (struct user_beancounter *ub, unsigned long size, unsigned vm_flags, + int strict), + (ub, size, vm_flags, strict)) +UB_DECLARE_UNCHARGE(memory, + (struct user_beancounter *ub, unsigned long size, unsigned vm_flags), + (ub, size, vm_flags)) +UB_DECLARE_CHARGE(locked_mem, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_UNCHARGE(locked_mem, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_CHARGE(kmem, + (struct user_beancounter *ub, unsigned long size, int strict), + (ub, size, strict)) +UB_DECLARE_UNCHARGE(kmem, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_CHARGE(shmpages, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_UNCHARGE(shmpages, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_CHARGE(sock, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_UNCHARGE(sock, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_CHARGE(flock, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_UNCHARGE(flock, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_CHARGE(pty, + (struct user_beancounter *ub), + (ub)) +UB_DECLARE_UNCHARGE(pty, + (struct user_beancounter *ub), + (ub)) +UB_DECLARE_CHARGE(siginfo, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) +UB_DECLARE_UNCHARGE(siginfo, + (struct user_beancounter *ub, unsigned long size), + (ub, size)) + +#undef UB_DECLARE + +#endif /* __KERNEL__ */ +#endif /* _LINUX_BEANCOUNTER_H */ Index: linux/include/linux/capability.h diff -u linux/include/linux/capability.h:1.1.1.4 ASPcomplete/linux/include/linux/capability.h:1.4 --- linux/include/linux/capability.h:1.1.1.4 Mon Aug 14 19:43:34 2000 +++ linux/include/linux/capability.h Wed Jul 19 15:30:50 2000 @@ -273,6 +273,14 @@ #define CAP_MKNOD 27 +/* Allow access to all information. In the other case some structures will be + hiding to ensure different Virtual Environment non-interaction on the same + node */ + +#ifdef CONFIG_VE +#define CAP_SETVEID 28 +#endif + #ifdef __KERNEL__ /* * Bounding set @@ -337,9 +345,15 @@ #define cap_issubset(a,set) (!(cap_t(a) & ~cap_t(set))) #define cap_clear(c) do { cap_t(c) = 0; } while(0) + +#ifndef CONFIG_VE #define cap_set_full(c) do { cap_t(c) = ~0; } while(0) -#define cap_mask(c,mask) do { cap_t(c) &= cap_t(mask); } while(0) +#else +#define cap_set_full(c) \ + do {cap_t(c) = is_super_ve(current) ? ~0 : current->envid->cap_default; } while(0) +#endif +#define cap_mask(c,mask) do { cap_t(c) &= cap_t(mask); } while(0) #define cap_is_fs_cap(c) (CAP_TO_MASK(c) & CAP_FS_MASK) #endif /* __KERNEL__ */ Index: linux/include/linux/fs.h diff -u linux/include/linux/fs.h:1.1.1.11 ASPcomplete/linux/include/linux/fs.h:1.9 --- linux/include/linux/fs.h:1.1.1.11 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/fs.h Sat Aug 12 19:36:41 2000 @@ -462,10 +462,17 @@ __mark_inode_dirty(inode); } +#ifdef CONFIG_VE_SIGIO +struct ve_struct; +#endif + struct fown_struct { int pid; /* pid or -pgrp where SIGIO should be sent */ uid_t uid, euid; /* uid/euid of process setting the owner */ int signum; /* posix.1b rt signal to be delivered on IO */ +#ifdef CONFIG_VE_SIGIO + struct ve_struct *envid; +#endif }; struct file { @@ -511,6 +518,7 @@ */ typedef struct files_struct *fl_owner_t; +struct user_beancounter; struct file_lock { struct file_lock *fl_next; /* singly linked list for this inode */ struct list_head fl_link; /* doubly linked list of all locks */ @@ -531,6 +539,8 @@ union { struct nfs_lock_info nfs_fl; } fl_u; + + struct user_beancounter *charged_bc; }; /* The following constant reflects the upper bound of the file/locking space */ @@ -892,7 +902,7 @@ extern int do_truncate(struct dentry *, loff_t start); extern struct file *filp_open(const char *, int, int); -extern struct file * dentry_open(struct dentry *, struct vfsmount *, int); +extern struct file * dentry_open(struct dentry *, struct vfsmount *, int, int); extern int filp_close(struct file *, fl_owner_t id); extern char * getname(const char *); Index: linux/include/linux/inetdevice.h diff -u linux/include/linux/inetdevice.h:1.1.1.3 ASPcomplete/linux/include/linux/inetdevice.h:1.2 --- linux/include/linux/inetdevice.h:1.1.1.3 Mon Aug 14 19:43:34 2000 +++ linux/include/linux/inetdevice.h Tue Jun 20 19:45:19 2000 @@ -78,7 +78,8 @@ extern void devinet_init(void); extern struct in_device *inetdev_init(struct net_device *dev); extern struct in_device *inetdev_by_index(int); -extern u32 inet_select_addr(const struct net_device *dev, u32 dst, int scope); +extern u32 inet_select_addr(const struct net_device *dev, u32 dst, int scope, + void *envid); extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, u32 prefix, u32 mask); extern void inet_forward_change(void); Index: linux/include/linux/interrupt.h diff -u linux/include/linux/interrupt.h:1.1.1.5 ASPcomplete/linux/include/linux/interrupt.h:1.3 --- linux/include/linux/interrupt.h:1.1.1.5 Mon Aug 14 19:43:33 2000 +++ linux/include/linux/interrupt.h Mon Aug 7 19:08:00 2000 @@ -269,4 +269,16 @@ extern int probe_irq_off(unsigned long); /* returns 0 or negative on failure */ extern unsigned int probe_irq_mask(unsigned long); /* returns mask of ISA interrupts */ +/* + * Stuff for correct detecting soft irq state of the cpu + */ +#ifdef CONFIG_VE_IRQ +extern atomic_t soft_irq_state[NR_CPUS]; + +#define local_softirq_enter() atomic_inc(soft_irq_state+smp_processor_id()) +#define local_softirq_leave() atomic_dec(soft_irq_state+smp_processor_id()) +#define in_interrupt_correct() ({ register int __cpu = smp_processor_id(); \ + ((local_irq_count(__cpu) + atomic_read(soft_irq_state+__cpu))!=0); }) +#endif + #endif Index: linux/include/linux/ipc.h diff -u linux/include/linux/ipc.h:1.1.1.3 ASPcomplete/linux/include/linux/ipc.h:1.2 --- linux/include/linux/ipc.h:1.1.1.3 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/ipc.h Thu Jul 20 19:10:10 2000 @@ -54,6 +54,9 @@ #define IPCMNI 32768 /* <= MAX_INT limit for ipc arrays (including sysctl changes) */ /* used by in-kernel data structures */ +#ifdef CONFIG_VE +struct ve_struct; +#endif struct kern_ipc_perm { key_t key; @@ -63,6 +66,9 @@ gid_t cgid; mode_t mode; unsigned long seq; +#ifdef CONFIG_VE + struct ve_struct *envid; +#endif }; #endif /* __KERNEL__ */ Index: linux/include/linux/mm.h diff -u linux/include/linux/mm.h:1.1.1.6 ASPcomplete/linux/include/linux/mm.h:1.5 --- linux/include/linux/mm.h:1.1.1.6 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/mm.h Mon Aug 7 19:08:00 2000 @@ -10,6 +10,7 @@ #include #include #include +#include extern unsigned long max_mapnr; extern unsigned long num_physpages; @@ -92,6 +93,8 @@ #define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork */ +#define VM_ANON 0x00040000 /* Anonymous shared memory */ + #define VM_STACK_FLAGS 0x00000177 #define VM_READHINTMASK (VM_SEQ_READ | VM_RAND_READ) @@ -327,6 +330,9 @@ extern struct page * alloc_pages(int gfp_mask, unsigned long order); #endif /* !CONFIG_DISCONTIGMEM */ +#define alloc_page(gfp_mask) \ + alloc_pages(gfp_mask, 0) + #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) extern unsigned long FASTCALL(__get_free_pages(int gfp_mask, unsigned long order)); @@ -469,7 +475,10 @@ address &= PAGE_MASK; grow = (vma->vm_start - address) >> PAGE_SHIFT; if (vma->vm_end - address > current->rlim[RLIMIT_STACK].rlim_cur || - ((vma->vm_mm->total_vm + grow) << PAGE_SHIFT) > current->rlim[RLIMIT_AS].rlim_cur) + ((vma->vm_mm->total_vm + grow) << PAGE_SHIFT) + > current->rlim[RLIMIT_AS].rlim_cur || + charge_memory(vma->vm_mm->beancounter, vma->vm_start - address, + (vma->vm_flags & VM_LOCKED), 0)) return -ENOMEM; vma->vm_start = address; vma->vm_pgoff -= grow; @@ -506,6 +515,21 @@ #define vmlist_access_unlock(mm) spin_unlock(&mm->page_table_lock) #define vmlist_modify_lock(mm) vmlist_access_lock(mm) #define vmlist_modify_unlock(mm) vmlist_access_unlock(mm) + +/* + * Common MM functions for inclusion in the VFS + * or in other stackable file systems. Some of these + * functions were in linux/mm/ C files. + * + */ +static inline int sync_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + + if (mapping && mapping->a_ops && mapping->a_ops->sync_page) + return mapping->a_ops->sync_page(page); + return 0; +} #endif /* __KERNEL__ */ Index: linux/include/linux/net.h diff -u linux/include/linux/net.h:1.1.1.3 ASPcomplete/linux/include/linux/net.h:1.2 --- linux/include/linux/net.h:1.1.1.3 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/net.h Tue Jul 25 11:52:10 2000 @@ -62,6 +62,7 @@ #define SOCK_ASYNC_WAITDATA 1 #define SOCK_NOSPACE 2 +struct user_beancounter; struct socket { socket_state state; @@ -76,6 +77,7 @@ short type; unsigned char passcred; + struct user_beancounter *beancounter; }; #define SOCK_INODE(S) ((S)->inode) Index: linux/include/linux/netdevice.h diff -u linux/include/linux/netdevice.h:1.1.1.5 ASPcomplete/linux/include/linux/netdevice.h:1.4 --- linux/include/linux/netdevice.h:1.1.1.5 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/netdevice.h Mon Jul 17 15:19:43 2000 @@ -204,6 +204,10 @@ #define NETDEV_BOOT_SETUP_MAX 8 +#ifdef CONFIG_VE_NET +struct ve_struct; +#endif + /* * The DEVICE structure. * Actually, this whole structure is a big mistake. It mixes I/O Index: linux/include/linux/sched.h diff -u linux/include/linux/sched.h:1.1.1.6 ASPcomplete/linux/include/linux/sched.h:1.6 --- linux/include/linux/sched.h:1.1.1.6 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/sched.h Mon Aug 7 19:08:00 2000 @@ -26,6 +26,11 @@ #include #include +#ifdef CONFIG_VE +#include +#include +#endif + /* * cloning flags: */ @@ -190,6 +195,7 @@ /* Number of map areas at which the AVL tree is activated. This is arbitrary. */ #define AVL_MIN_MAP_COUNT 32 +struct user_beancounter; struct mm_struct { struct vm_area_struct * mmap; /* list of VMAs */ struct vm_area_struct * mmap_avl; /* tree of VMAs */ @@ -209,6 +215,7 @@ unsigned long cpu_vm_mask; unsigned long swap_cnt; /* number of pages to swap on next pass */ unsigned long swap_address; + struct user_beancounter * beancounter; /* * This is an architecture-specific pointer: the portable * part of Linux does not know about any segments. @@ -227,7 +234,7 @@ 0, 0, 0, \ 0, 0, 0, 0, \ 0, 0, 0, \ - 0, 0, 0, 0, NULL } + 0, 0, 0, 0, NULL, NULL } struct signal_struct { atomic_t count; @@ -343,7 +350,14 @@ gid_t groups[NGROUPS]; kernel_cap_t cap_effective, cap_inheritable, cap_permitted; int keep_capabilities:1; +/* resource accounting */ struct user_struct *user; + /* Charging object for this task */ + struct user_beancounter *task_bc; + /* The login user id */ + uid_t luid; + /* Charging object corresponding to the login id */ + struct user_beancounter *login_bc; /* limits */ struct rlimit rlim[RLIM_NLIMITS]; unsigned short used_math; @@ -373,6 +387,10 @@ u32 self_exec_id; /* Protection of (de-)allocation: mm, files, fs, tty */ spinlock_t alloc_lock; + +#ifdef CONFIG_VE + struct ve_struct *envid; +#endif }; /* @@ -440,6 +458,9 @@ cap_inheritable: CAP_INIT_INH_SET, \ cap_permitted: CAP_FULL_SET, \ keep_capabilities: 0, \ + task_bc: NULL, \ + luid: (uid_t)-1, \ + login_bc: NULL, \ rlim: INIT_RLIMITS, \ user: INIT_USER, \ comm: "swapper", \ @@ -493,7 +514,10 @@ *p->pidhash_pprev = p->pidhash_next; } -static inline struct task_struct *find_task_by_pid(int pid) +#ifndef CONFIG_VE +#define find_task_by_pid find_task_by_pid_all +#define find_task_by_pid_ve find_task_by_pid_all +static inline struct task_struct *find_task_by_pid_all(int pid) { struct task_struct *p, **htable = &pidhash[pid_hashfn(pid)]; @@ -503,6 +527,72 @@ return p; } +#define do_check_ve_strict(target,owner)1 +#define do_check_ve(target,owner) 1 +#define do_is_super_ve(owner) 1 +#define is_super_ve(owner) 1 +#define check_ve_strict(target,owner) 1 +#define check_ve(target,owner) 1 +#define check_current_ve(arg) 1 + +#else +#include + +#define do_check_ve_strict(target,owner)((owner)==(target)) +#define do_check_ve(target,owner) (do_is_super_ve(owner) || do_check_ve_strict(target,owner)) +#define do_is_super_ve(owner) (!(owner) || !(owner)->veid) +#define is_super_ve(owner) do_is_super_ve((owner)->envid) +#define check_ve_strict(target,owner) do_check_ve_strict((target)->envid,(owner)->envid) +#define check_ve(target,owner) do_check_ve((target)->envid,(owner)->envid) +#define check_current_ve(arg) do_check_current_ve((arg)->envid) +extern struct task_struct *child_reaper; + +static inline int do_check_current_ve( struct ve_struct *target ) +{ +#ifdef CONFIG_VE_IRQ + if( in_interrupt_correct() ) + return -1; +#endif + return do_check_ve(target,current->envid); +} + +static inline struct task_struct *find_task_by_pid_all(int pid) +{ + struct task_struct *p, **htable = &pidhash[pid_hashfn(pid)]; + + if( pid==1 ) + return child_reaper; + for(p = *htable; p && p->pid != pid; p = p->pidhash_next) + ; + + return p; +} +static inline struct task_struct *find_task_by_pid_ve(int pid) +{ + struct task_struct *p, **htable = &pidhash[pid_hashfn(pid)]; + +#ifdef CONFIG_VE_IRQ + if( in_interrupt_correct() ) + return find_task_by_pid_all(pid); +#endif + + if( pid==1 && current->envid ) + return current->envid->init_entry; + for(p = *htable; p && p->pid != pid; p = p->pidhash_next) + ; + + return p ? (check_current_ve(p) ? p : NULL) : NULL; +} + +static inline int is_ve_initialized(void); +static inline int is_ve_initialized(void) +{ + if( in_interrupt_correct() ) + printk("ASSERTION FAILED. Getting current in interrupt\n" ); + return child_reaper && child_reaper->envid ? 1 : 0; +} +#endif + /* per-UID process charging. */ extern struct user_struct * alloc_uid(uid_t); extern void free_uid(struct user_struct *); @@ -668,7 +758,7 @@ /* * Routines for handling mm_structs */ -extern struct mm_struct * mm_alloc(void); +extern struct mm_struct * mm_alloc(struct user_beancounter *); extern struct mm_struct * start_lazy_tlb(void); extern void end_lazy_tlb(struct mm_struct *mm); @@ -788,8 +878,18 @@ (p)->p_pptr->p_cptr = p; \ } while (0) -#define for_each_task(p) \ +#ifndef CONFIG_VE +#define for_each_task for_each_task_all +#define for_each_task_ve for_each_task_all +#define for_each_task_all(p) \ + for (p = &init_task ; (p = p->next_task) != &init_task ; ) +#else +#define for_each_task_all(p) \ for (p = &init_task ; (p = p->next_task) != &init_task ; ) +#define for_each_task_ve(p) \ + for (p = &init_task ; (p = p->next_task) != &init_task ; ) \ + if( check_current_ve(p) ) +#endif static inline void del_from_runqueue(struct task_struct * p) Index: linux/include/linux/signal.h diff -u linux/include/linux/signal.h:1.1.1.3 ASPcomplete/linux/include/linux/signal.h:1.2 --- linux/include/linux/signal.h:1.1.1.3 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/signal.h Tue Aug 1 15:30:45 2000 @@ -9,10 +9,12 @@ * Real Time signals may be queued. */ +struct user_beancounter; struct signal_queue { struct signal_queue *next; siginfo_t info; + struct user_beancounter *charged_bc; }; /* Index: linux/include/linux/skbuff.h diff -u linux/include/linux/skbuff.h:1.1.1.5 ASPcomplete/linux/include/linux/skbuff.h:1.3 --- linux/include/linux/skbuff.h:1.1.1.5 Mon Aug 14 19:43:33 2000 +++ linux/include/linux/skbuff.h Mon Jul 17 15:19:43 2000 @@ -57,6 +57,9 @@ spinlock_t lock; }; +#ifdef CONFIG_VE_NET +struct ve_struct; +#endif struct sk_buff { /* These two members must be first. */ struct sk_buff * next; /* Next buffer in list */ @@ -145,6 +148,9 @@ #ifdef CONFIG_NET_SCHED __u32 tc_index; /* traffic control index */ +#endif +#ifdef CONFIG_VE_NET + struct ve_struct *envid; #endif }; Index: linux/include/linux/tty.h diff -u linux/include/linux/tty.h:1.1.1.4 ASPcomplete/linux/include/linux/tty.h:1.2 --- linux/include/linux/tty.h:1.1.1.4 Mon Aug 14 19:43:32 2000 +++ linux/include/linux/tty.h Tue Aug 1 15:30:45 2000 @@ -256,6 +256,7 @@ * the size of this structure, and it needs to be done with care. * - TYT, 9/14/92 */ +struct user_beancounter; struct tty_struct { int magic; struct tty_driver driver; @@ -306,6 +307,7 @@ unsigned int canon_column; struct semaphore atomic_read; spinlock_t read_lock; + struct user_beancounter *charged_bc; }; /* tty magic number */ @@ -332,6 +334,7 @@ #define TTY_HW_COOK_IN 15 #define TTY_PTY_LOCK 16 #define TTY_NO_WRITE_SPLIT 17 +#define TTY_BEANCOUNTER_CHARGED 18 #define TTY_WRITE_FLUSH(tty) tty_write_flush((tty)) Index: linux/include/linux/tty_driver.h diff -u linux/include/linux/tty_driver.h:1.1.1.3 ASPcomplete/linux/include/linux/tty_driver.h:1.2 --- linux/include/linux/tty_driver.h:1.1.1.3 Mon Aug 14 19:43:33 2000 +++ linux/include/linux/tty_driver.h Sun Jun 18 13:04:04 2000 @@ -176,6 +176,10 @@ */ struct tty_driver *next; struct tty_driver *prev; + +#ifdef CONFIG_VE + struct ve_struct *envid; +#endif /* CONFIG_VE */ }; /* tty driver magic number */ Index: linux/include/linux/ubhash.h diff -uN /dev/null linux/include/linux/ubhash.h:1.1 --- /dev/null Mon Aug 14 20:39:33 2000 +++ linux/include/linux/ubhash.h Tue Aug 1 15:30:45 2000 @@ -0,0 +1,23 @@ +#ifndef _LINUX_UBHASH_H +#define _LINUX_UBHASH_H + +#ifdef __KERNEL__ + +#include + +#define UB_HASH_SIZE 256 +#define ub_hash_fun(x) ( ( ((x) >> 8) ^ (x) ) & (UB_HASH_SIZE - 1) ) + +struct ub_hash_slot { + spinlock_t ubh_lock; + struct user_beancounter *ubh_beans; +} ub_hash[UB_HASH_SIZE]; + +#define lock_beancounters(slot, flags) \ + spin_lock_irqsave(&slot->ubh_lock, flags) + +#define unlock_beancounters(slot, flags) \ + spin_unlock_irqrestore(&slot->ubh_lock, flags) + +#endif /* __KERNEL__ */ +#endif /* _LINUX_UBHASH_H */ Index: linux/include/linux/utsname.h diff -u linux/include/linux/utsname.h:1.1.1.3 ASPcomplete/linux/include/linux/utsname.h:1.2 --- linux/include/linux/utsname.h:1.1.1.3 Mon Aug 14 19:43:33 2000 +++ linux/include/linux/utsname.h Fri Jul 7 22:07:33 2000 @@ -1,6 +1,8 @@ #ifndef _LINUX_UTSNAME_H #define _LINUX_UTSNAME_H +#include + #define __OLD_UTS_LEN 8 struct oldold_utsname { @@ -30,7 +32,12 @@ char domainname[65]; }; -extern struct new_utsname system_utsname; +#ifndef CONFIG_VE +#define system_utsname _system_utsname +extern struct new_utsname _system_utsname; +#else +extern struct new_utsname _system_utsname; +#endif extern struct rw_semaphore uts_sem; #endif Index: linux/include/linux/ve.h diff -uN /dev/null linux/include/linux/ve.h:1.12 --- /dev/null Mon Aug 14 20:39:33 2000 +++ linux/include/linux/ve.h Fri Aug 11 19:48:57 2000 @@ -0,0 +1,89 @@ +#ifndef _LINUX_VE_H +#define _LINUX_VE_H + +#ifdef CONFIG_VE + +#include +#include +#include + +#define VE_CREATE 1 +#define VE_EXCLUSIVE 2 +#define VE_ENTER 4 +#define VE_TEST 8 + +typedef __u32 envid_t; + +#ifdef __KERNEL__ +struct tty_driver; +struct task_struct; +struct new_utsname; + +struct pty_struct { + int magic; + wait_queue_head_t open_wait; +}; + +struct ve_struct +{ + envid_t veid; + struct task_struct *init_entry; + kernel_cap_t cap_default; + atomic_t pcounter; + struct semaphore init_exit_guard; + struct new_utsname _system_utsname; + +/* VE's root */ + struct vfsmount *fs_rootmnt; + struct dentry *fs_root; + +/* BSD pty's */ + int pty_refcount; + struct tty_driver *pty_driver; + struct tty_driver *pty_slave_driver; + +#ifdef CONFIG_UNIX98_PTYS + struct tty_driver *ptm_driver[UNIX98_NR_MAJORS]; + struct tty_driver *pts_driver[UNIX98_NR_MAJORS]; +#endif + +/* Bound virtual network interface */ + u32 ip; + + struct ve_struct *prev; + struct ve_struct *next; +}; + +#define ve_offsetof(arg) ((void*)offsetof( struct ve_struct,_##arg )) + +unsigned char get_device_perms_ve( envid_t envid, int type, kdev_t dev ); +void do_env_cleanup( struct ve_struct *envid ); +void do_mark_env_to_down( struct ve_struct *ptr ); + +#ifdef CONFIG_VE_NET +extern struct ve_struct *get_ve_by_ip(u32 addr); +#endif +#ifdef CONFIG_VE +#define system_utsname (current->envid->_system_utsname) +#endif +#endif /* __KERNEL__ */ + +#define VELINK 12 + +#define VE_CONF_MSG 0x11 +#define VE_REGISTER 0x12 + +struct ve_conf_request +{ + envid_t veid; + ulong id; + int answer; + char data[256]; +}; + +#else /* CONFIG_VE */ + +#define offsetof(arg) (&arg) + +#endif /* CONFIG_VE */ +#endif /* _LINUX_VE_H */ Index: linux/include/net/ip_fib.h diff -u linux/include/net/ip_fib.h:1.1.1.3 ASPcomplete/linux/include/net/ip_fib.h:1.4 --- linux/include/net/ip_fib.h:1.1.1.3 Mon Aug 14 19:43:31 2000 +++ linux/include/net/ip_fib.h Mon Jul 31 12:41:59 2000 @@ -93,6 +93,9 @@ #ifdef CONFIG_IP_MULTIPLE_TABLES struct fib_rule *r; #endif +#ifdef CONFIG_VE_NET + struct ve_struct *envid; +#endif }; @@ -108,13 +111,21 @@ #endif /* CONFIG_IP_ROUTE_MULTIPATH */ +#ifndef CONFIG_VE_NET #define FIB_RES_PREFSRC(res) ((res).fi->fib_prefsrc ? : __fib_res_prefsrc(&res)) +#else +#define FIB_RES_PREFSRC(res) __fib_res_prefsrc(&res) +#endif #define FIB_RES_GW(res) (FIB_RES_NH(res).nh_gw) #define FIB_RES_DEV(res) (FIB_RES_NH(res).nh_dev) #define FIB_RES_OIF(res) (FIB_RES_NH(res).nh_oif) struct fib_table { +#ifdef CONFIG_VE_ROUTE + struct ve_struct *envid; + unsigned char allow_read; +#endif unsigned char tb_id; unsigned tb_stamp; int (*tb_lookup)(struct fib_table *tb, const struct rt_key *key, struct fib_result *res); @@ -203,7 +214,8 @@ extern int inet_rtm_getroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg); extern int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb); extern int fib_validate_source(u32 src, u32 dst, u8 tos, int oif, - struct net_device *dev, u32 *spec_dst, u32 *itag); + struct net_device *dev, u32 *spec_dst, u32 *itag, + void *envid); extern void fib_select_multipath(const struct rt_key *key, struct fib_result *res); /* Exported by fib_semantics.c */ Index: linux/include/net/raw.h diff -u linux/include/net/raw.h:1.1.1.3 ASPcomplete/linux/include/net/raw.h:1.3 --- linux/include/net/raw.h:1.1.1.3 Mon Aug 14 19:43:31 2000 +++ linux/include/net/raw.h Fri Jul 7 22:07:33 2000 @@ -35,7 +35,7 @@ extern struct sock *__raw_v4_lookup(struct sock *sk, unsigned short num, unsigned long raddr, unsigned long laddr, - int dif); + int dif, struct sk_buff *envid); extern struct sock *raw_v4_input(struct sk_buff *skb, struct iphdr *iph, int hash); Index: linux/include/net/route.h diff -u linux/include/net/route.h:1.1.1.4 ASPcomplete/linux/include/net/route.h:1.5 --- linux/include/net/route.h:1.1.1.4 Mon Aug 14 19:43:31 2000 +++ linux/include/net/route.h Mon Jul 17 15:19:43 2000 @@ -50,8 +50,19 @@ #endif __u8 tos; __u8 scope; +#ifdef CONFIG_VE_NET + struct ve_struct *envid; +#endif }; +#ifdef CONFIG_VE +#define ENVID(arg) (arg ? (arg)->envid : NULL) +#define ENVIDP(arg) (arg).envid +#else +#define ENVID(arg) NULL +#define ENVIDP(arg) NULL +#endif + struct inet_peer; struct rtable { @@ -99,7 +110,8 @@ u32 src, u8 tos, struct net_device *dev); extern void ip_rt_advice(struct rtable **rp, int advice); extern void rt_cache_flush(int how); -extern int ip_route_output(struct rtable **, u32 dst, u32 src, u32 tos, int oif); +extern int ip_route_output(struct rtable **, u32 dst, u32 src, u32 tos, int oif, + void *envid); extern int ip_route_input(struct sk_buff*, u32 dst, u32 src, u8 tos, struct net_device *devin); extern unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu); extern void ip_rt_update_pmtu(struct dst_entry *dst, unsigned mtu); @@ -111,7 +123,6 @@ extern void ip_rt_get_source(u8 *src, struct rtable *rt); extern int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb); - extern __inline__ void ip_rt_put(struct rtable * rt) { if (rt) @@ -132,17 +143,21 @@ return ip_tos2prio[IPTOS_TOS(tos)>>1]; } -extern __inline__ int ip_route_connect(struct rtable **rp, u32 dst, u32 src, u32 tos, int oif) +extern __inline__ int ip_route_connect(struct rtable **rp, u32 dst, u32 src, u32 tos, int oif, + void *envid) { int err; - err = ip_route_output(rp, dst, src, tos, oif); + + err = ip_route_output(rp, dst, src, tos, oif, envid); if (err || (dst && src)) return err; + dst = (*rp)->rt_dst; src = (*rp)->rt_src; + ip_rt_put(*rp); *rp = NULL; - return ip_route_output(rp, dst, src, tos, oif); + return ip_route_output(rp, dst, src, tos, oif, envid); } extern void rt_bind_peer(struct rtable *rt, int create); Index: linux/include/net/sock.h diff -u linux/include/net/sock.h:1.1.1.5 ASPcomplete/linux/include/net/sock.h:1.5 --- linux/include/net/sock.h:1.1.1.5 Mon Aug 14 19:43:31 2000 +++ linux/include/net/sock.h Mon Aug 7 19:08:00 2000 @@ -396,6 +396,7 @@ int linger2; }; +struct user_beancounter; /* * This structure really needs to be cleaned up. @@ -463,6 +464,10 @@ init_waitqueue_head(&((__sk)->lock.wq)); \ } while(0); +#ifdef CONFIG_VE_NET +struct ve_struct; +#endif + struct sock { /* Socket demultiplex comparisons on incoming packets. */ __u32 daddr; /* Foreign IPv4 addr */ @@ -488,6 +493,7 @@ socket_lock_t lock; /* Synchronizer... */ int rcvbuf; /* Size of receive buffer in bytes */ + int rcvbuf_charged; /* for resource control */ wait_queue_head_t *sleep; /* Sock wait queue */ struct dst_entry *dst_cache; /* Destination cache */ @@ -500,6 +506,7 @@ __u32 saddr; /* Sending source */ unsigned int allocation; /* Allocation mode */ int sndbuf; /* Size of send buffer in bytes */ + int sndbuf_charged; /* for resource control */ struct sock *prev; /* Not all are volatile, but some are, so we might as well say they all are. @@ -639,6 +646,9 @@ /* Identd and reporting IO signals */ struct socket *socket; + /* Accounting */ + struct user_beancounter *beancounter; + /* RPC layer private data */ void *user_data; @@ -651,6 +661,9 @@ int (*backlog_rcv) (struct sock *sk, struct sk_buff *skb); void (*destruct)(struct sock *sk); +#ifdef CONFIG_VE_NET + struct ve_struct *envid; +#endif }; /* The per-socket spinlock must be held here. */ @@ -1100,6 +1113,9 @@ sock_hold(sk); skb->sk = sk; skb->destructor = sock_wfree; +#ifdef CONFIG_VE_NET + skb->envid = sk->envid; +#endif atomic_add(skb->truesize, &sk->wmem_alloc); } @@ -1107,6 +1123,9 @@ { skb->sk = sk; skb->destructor = sock_rfree; +#ifdef CONFIG_VE_NET + skb->envid = sk->envid; +#endif atomic_add(skb->truesize, &sk->rmem_alloc); } @@ -1115,6 +1134,9 @@ sock_hold(sk); skb->sk = sk; skb->destructor = sock_cfree; +#ifdef CONFIG_VE_NET + skb->envid = sk->envid; +#endif } @@ -1277,5 +1299,7 @@ extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; +extern __u32 sysctl_wmem_default; +extern __u32 sysctl_rmem_default; #endif /* _SOCK_H */ Index: linux/include/net/tcp.h diff -u linux/include/net/tcp.h:1.1.1.4 ASPcomplete/linux/include/net/tcp.h:1.3 --- linux/include/net/tcp.h:1.1.1.4 Mon Aug 14 19:43:31 2000 +++ linux/include/net/tcp.h Fri Jul 7 22:07:33 2000 @@ -106,7 +106,8 @@ unsigned short snum); extern void tcp_bucket_unlock(struct sock *sk); extern int tcp_port_rover; -extern struct sock *tcp_v4_lookup_listener(u32 addr, unsigned short hnum, int dif); +extern struct sock *tcp_v4_lookup_listener(u32 addr, unsigned short hnum, int dif, + struct sk_buff *skb); /* These are AF independent. */ static __inline__ int tcp_bhashfn(__u16 lport) @@ -158,6 +159,9 @@ struct in6_addr v6_daddr; struct in6_addr v6_rcv_saddr; #endif +#ifdef CONFIG_VE + struct ve_struct *envid; +#endif }; extern kmem_cache_t *tcp_timewait_cachep; @@ -197,18 +201,27 @@ #define TCP_V4_ADDR_COOKIE(__name, __saddr, __daddr) \ __u64 __name = (((__u64)(__daddr))<<32)|((__u64)(__saddr)); #endif /* __BIG_ENDIAN */ -#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\ +#define _TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\ (((*((__u64 *)&((__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&((__sk)->dport)))== (__ports)) && \ (!((__sk)->bound_dev_if) || ((__sk)->bound_dev_if == (__dif)))) #else /* 32-bit arch */ #define TCP_V4_ADDR_COOKIE(__name, __saddr, __daddr) -#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\ +#define _TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->daddr == (__saddr)) && \ ((__sk)->rcv_saddr == (__daddr)) && \ ((*((__u32 *)&((__sk)->dport)))== (__ports)) && \ (!((__sk)->bound_dev_if) || ((__sk)->bound_dev_if == (__dif)))) #endif /* 64-bit arch */ + +#ifndef CONFIG_VE_NET +#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\ + _TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif) +#else +#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif, __skb)\ + (_TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif) && \ + check_ve_strict((__sk),(__skb))) +#endif #define TCP_IPV6_MATCH(__sk, __saddr, __daddr, __ports, __dif) \ (((*((__u32 *)&((__sk)->dport)))== (__ports)) && \ Index: linux/init/main.c diff -u linux/init/main.c:1.1.1.7 ASPcomplete/linux/init/main.c:1.7 --- linux/init/main.c:1.1.1.7 Mon Aug 14 19:41:57 2000 +++ linux/init/main.c Mon Aug 7 19:08:00 2000 @@ -88,6 +88,7 @@ extern void init_modules(void); extern void sock_init(void); extern void fork_init(unsigned long); +extern void beancounter_init(unsigned long); extern void mca_init(void); extern void sbus_init(void); extern void ppc_init(void); @@ -113,6 +114,11 @@ extern void dquot_init_hash(void); #endif +#ifdef CONFIG_VE +extern void init_ve_system(void); +extern void postinit_ve_system(void); +#endif + /* * Boot command-line arguments */ @@ -562,6 +568,7 @@ mempages = num_physpages; fork_init(mempages); + beancounter_init(mempages); vfs_caches_init(mempages); vma_init(); buffer_init(mempages); @@ -642,6 +649,10 @@ */ child_reaper = current; +#ifdef CONFIG_VE + init_ve_system(); +#endif + #if defined(CONFIG_MTRR) /* Do this after SMP initialization */ /* * We should probably create some architecture-dependent "fixup after @@ -741,6 +752,10 @@ "error %d\n",error); } } +#endif + +#ifdef CONFIG_VE + postinit_ve_system(); #endif } Index: linux/init/version.c diff -u linux/init/version.c:1.1.1.3 ASPcomplete/linux/init/version.c:1.2 --- linux/init/version.c:1.1.1.3 Mon Aug 14 19:41:57 2000 +++ linux/init/version.c Fri Jul 7 22:07:33 2000 @@ -16,7 +16,7 @@ int version_string(LINUX_VERSION_CODE) = 0; -struct new_utsname system_utsname = { +struct new_utsname _system_utsname = { UTS_SYSNAME, UTS_NODENAME, UTS_RELEASE, UTS_VERSION, UTS_MACHINE, UTS_DOMAINNAME }; Index: linux/ipc/msg.c diff -u linux/ipc/msg.c:1.1.1.3 ASPcomplete/linux/ipc/msg.c:1.7 --- linux/ipc/msg.c:1.1.1.3 Mon Aug 14 19:41:35 2000 +++ linux/ipc/msg.c Thu Jul 20 20:14:49 2000 @@ -896,3 +896,18 @@ return len; } #endif + +#ifdef CONFIG_VE +void ve_msg_ipc_cleanup(void) +{ + int i; + + down(&msg_ids.sem); + for (i = 0; i <= msg_ids.max_id; i++) { + if (!msg_lock (i)) + continue; + freeque( i ); + } + up(&msg_ids.sem); +} +#endif Index: linux/ipc/sem.c diff -u linux/ipc/sem.c:1.1.1.3 ASPcomplete/linux/ipc/sem.c:1.7 --- linux/ipc/sem.c:1.1.1.3 Mon Aug 14 19:41:35 2000 +++ linux/ipc/sem.c Thu Jul 20 20:14:49 2000 @@ -1088,3 +1088,18 @@ return len; } #endif + +#ifdef CONFIG_VE +void ve_sem_ipc_cleanup() +{ + int i; + + down(&sem_ids.sem); + for (i = 0; i <= sem_ids.max_id; i++) { + if (!sem_lock (i)) + continue; + freeary( i ); + } + up(&sem_ids.sem); +} +#endif Index: linux/ipc/shm.c diff -u linux/ipc/shm.c:1.1.1.5 ASPcomplete/linux/ipc/shm.c:1.11 --- linux/ipc/shm.c:1.1.1.5 Mon Aug 14 19:41:35 2000 +++ linux/ipc/shm.c Tue Jul 25 11:52:10 2000 @@ -75,6 +75,7 @@ struct shmid_kernel /* private to the kernel */ { struct kern_ipc_perm shm_perm; + struct user_beancounter *beancounter; size_t shm_segsz; unsigned long shm_nattch; unsigned long shm_npages; /* size of segment (pages) */ @@ -684,6 +685,23 @@ /* Now we set them to the real values */ old_dir = shp->shm_dir; old_pages = shp->shm_npages; + if (new_pages > old_pages) { + if (shp->id == zero_id) + error = charge_memory (shp->beancounter, + new_pages - old_pages, VM_ANON, 1); + else + error = charge_shmpages (shp->beancounter, + new_pages - old_pages); + } else { + if (shp->id == zero_id) + uncharge_memory (shp->beancounter, + old_pages - new_pages, VM_ANON); + else + uncharge_shmpages (shp->beancounter, + old_pages - new_pages); + } + if (error) + goto size_out; if (old_dir){ pte_t *swap; int i,j; @@ -753,6 +771,7 @@ struct shmid_kernel *shp; int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT; int id; + int retval; if (namelen > SHM_NAME_LEN) return -ENAMETOOLONG; @@ -763,12 +782,18 @@ if (shm_tot + numpages >= shm_ctlall) return -ENOSPC; + if ((retval = charge_shmpages(current->login_bc, numpages)) != 0) + return retval; + shp = seg_alloc(numpages, namelen ? namelen : SHM_FMT_LEN + 1); - if (IS_ERR(shp)) + if (IS_ERR(shp)) { + uncharge_shmpages(current->login_bc, numpages); return PTR_ERR(shp); + } id = shm_addid(shp); if(id == -1) { seg_free(shp, 1); + uncharge_shmpages(current->login_bc, numpages); return -ENOSPC; } shp->shm_perm.key = key; @@ -779,6 +804,7 @@ shp->shm_atim = shp->shm_dtim = 0; shp->shm_ctim = CURRENT_TIME; shp->id = shm_buildid(id,shp->shm_perm.seq); + shp->beancounter = current->login_bc; if (namelen != 0) { shp->shm_namelen = namelen; memcpy (shp->shm_name, name, namelen); @@ -839,6 +865,7 @@ shp = shm_rmid(shmid); shm_unlock(shmid); up(&shm_ids.sem); + uncharge_shmpages(shp->beancounter, shp->shm_npages); seg_free(shp, 1); clear_inode(ino); } @@ -1047,6 +1074,13 @@ if(err) goto out_unlock; if(cmd==SHM_LOCK) + err = charge_locked_mem(shp->beancounter, + shp->shm_npages); + else + uncharge_locked_mem(shp->beancounter, shp->shm_npages); + if(err) + goto out_unlock; + if(cmd==SHM_LOCK) shp->shm_flags |= PRV_LOCKED; else shp->shm_flags &= ~PRV_LOCKED; @@ -1249,7 +1283,7 @@ err = -ENOENT; if (!dentry->d_inode) goto bad_file; - file = dentry_open(dentry, shm_fs_type.kern_mnt, o_flags); + file = dentry_open(dentry, shm_fs_type.kern_mnt, prot, o_flags); err = PTR_ERR(file); if (IS_ERR (file)) goto bad_file1; @@ -1912,3 +1946,24 @@ return; } +#ifdef CONFIG_VE +void ve_shm_ipc_cleanup() +{ + int i; + + down(&shm_ids.sem); + for (i = 0; i <= shm_ids.max_id; i++) { + struct shmid_kernel *shp; + + if (!(shp = shm_lock (i))) + continue; + if (shp->shm_nattch) + printk(KERN_DEBUG "shm_nattch = %ld\n", shp->shm_nattch); + shp = shm_rmid(i); + shm_unlock(i); + seg_free(shp, 1); + uncharge_shmpages (shp->beancounter, shp->shm_npages); + } + up(&shm_ids.sem); +} +#endif Index: linux/ipc/util.c diff -u linux/ipc/util.c:1.1.1.5 ASPcomplete/linux/ipc/util.c:1.8 --- linux/ipc/util.c:1.1.1.5 Mon Aug 14 19:41:35 2000 +++ linux/ipc/util.c Thu Jul 20 20:14:49 2000 @@ -96,6 +96,8 @@ p = ids->entries[id].p; if(p==NULL) continue; + if( !check_ve_strict( p, current ) ) + continue; if (key == p->key) return id; } @@ -160,6 +162,9 @@ new->cuid = new->uid = current->euid; new->gid = new->cgid = current->egid; +#ifdef CONFIG_VE + new->envid = current->envid; +#endif new->seq = ids->seq++; if(ids->seq > ids->seq_max) @@ -334,6 +339,15 @@ } #endif /* __ia64__ */ + +#ifdef CONFIG_VE +void ve_ipc_cleanup(void) +{ + ve_msg_ipc_cleanup(); + ve_sem_ipc_cleanup(); + ve_shm_ipc_cleanup(); +} +#endif #else /* Index: linux/ipc/util.h diff -u linux/ipc/util.h:1.1.1.4 ASPcomplete/linux/ipc/util.h:1.5 --- linux/ipc/util.h:1.1.1.4 Mon Aug 14 19:41:35 2000 +++ linux/ipc/util.h Thu Jul 20 20:14:49 2000 @@ -12,6 +12,12 @@ void msg_init (void); void shm_init (void); +#ifdef CONFIG_VE +void ve_msg_ipc_cleanup(void); +void ve_sem_ipc_cleanup(void); +void ve_shm_ipc_cleanup(void); +#endif + struct ipc_ids { int size; int in_use; @@ -58,6 +64,8 @@ return NULL; out = ids->entries[lid].p; + if(!out || !check_ve_strict(ids->entries[lid].p,current)) + return NULL; return out; } @@ -74,8 +82,12 @@ spin_lock(&ids->ary); out = ids->entries[lid].p; - if(out==NULL) + + if(out==NULL || !check_ve_strict(ids->entries[lid].p,current)) + { spin_unlock(&ids->ary); + return NULL; + } return out; } Index: linux/kernel/Makefile diff -u linux/kernel/Makefile:1.1.1.5 ASPcomplete/linux/kernel/Makefile:1.4 --- linux/kernel/Makefile:1.1.1.5 Mon Aug 14 19:43:25 2000 +++ linux/kernel/Makefile Mon Aug 7 19:08:00 2000 @@ -10,7 +10,7 @@ O_TARGET := kernel.o O_OBJS = sched.o dma.o fork.o exec_domain.o panic.o printk.o \ module.o exit.o itimer.o info.o time.o softirq.o resource.o \ - sysctl.o acct.o capability.o ptrace.o timer.o user.o + sysctl.o acct.o capability.o ptrace.o timer.o user.o beancounter.o OX_OBJS += signal.o sys.o @@ -30,6 +30,13 @@ OX_OBJS += pm.o endif +ifeq ($(CONFIG_VE),y) +OX_OBJS += ve.o +endif + CFLAGS_sched.o := $(PROFILING) -fno-omit-frame-pointer + +# debugging: +CFLAGS_beancounter.o := -fno-omit-frame-pointer include $(TOPDIR)/Rules.make Index: linux/kernel/beancounter.c diff -uN /dev/null linux/kernel/beancounter.c:1.4 --- /dev/null Mon Aug 14 20:39:33 2000 +++ linux/kernel/beancounter.c Tue Aug 8 20:30:43 2000 @@ -0,0 +1,859 @@ +/* + * linux/kernel/beancounter.c + * + * Copyright (C) 1998 Alan Cox + * 1998-2000 Andrey V. Savochkin + * + * TODO: + * - more intelligent limit check in mremap(): currently the new size is + * charged and _then_ old size is uncharged + * - limits on number of file descriptors + * - problem: bad pmd page handling + * - think about unserialized accesses to guarantee fields + * - sizeof(struct inode) and omem for sockets + * - think about sizeof(struct inode) charge in general and limits for number + * of files + * + * Changes: + * 1999/08/17 Marcelo Tosatti + * - Set "barrier" and "limit" parts of limits atomically. + * 1999/10/06 Marcelo Tosatti + * - setublimit system call. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#ifdef CONFIG_USER_RESOURCE_PROC +#include +#endif + +/* + * Various debugging stuff + * 1 creation/destruction + * 2 charge operations + * 4 resource limit change + * 8 proc output + */ +#define UB_DEBUG 0 + + + +#ifdef CONFIG_USER_RESOURCE + + + +extern int max_threads; + +static kmem_cache_t *ub_cachep; + +static struct user_beancounter default_beancounter; + +/* + * Per user resource beancounting. Resources are tied to their + * luid. The resource structure itself is tagged both to the process + * and the charging resources (a socket doesnt want to have to search + * for things at irq time for example). Reference counters keep things + * in hand. + * + * The case where a user creates resource, kills all his processes and + * then starts new ones is correctly handled this way. The refcounters + * will mean the old entry is still around with resource tied to it. + */ +static struct user_beancounter *get_beancounter_byuid(uid_t uid, int create) +{ + struct user_beancounter *ub, *walkp; + unsigned long flags; + struct ub_hash_slot *slot = &ub_hash[ub_hash_fun(uid)]; + + ub = (struct user_beancounter *)kmem_cache_alloc(ub_cachep, GFP_KERNEL); + if (ub == NULL) + return NULL; + + lock_beancounters(slot, flags); + + walkp = slot->ubh_beans; + while (walkp != NULL && walkp->ub_uid != uid) + walkp = walkp->ub_next; + + if (walkp == NULL) { + if (create) { +#if UB_DEBUG & 1 + printk(KERN_DEBUG "Creating ub %p in slot %p\n", ub, slot); +#endif + memcpy(ub, &default_beancounter, sizeof(*ub)); + ub->ub_next = slot->ubh_beans; + slot->ubh_beans = ub; + ub->ub_uid = uid; + walkp = ub; + ub = NULL; + } + /* if create isn't set, just return NULL */ + } else { + /* beancounter already exists */ + atomic_inc(&walkp->ub_refcount); + } + + if (ub != NULL) + kmem_cache_free(ub_cachep, ub); + unlock_beancounters(slot, flags); + return walkp; +} + + +void __put_beancounter(struct user_beancounter *ub) +{ + struct user_beancounter **ubptr; + unsigned long flags; + struct ub_hash_slot *slot = &ub_hash[ub_hash_fun(ub->ub_uid)]; + +#if UB_DEBUG & 1 + printk(KERN_DEBUG "__put bc %p (cnt %d) for %.20s pid %d cur %08lx cpu %d.\n", + ub, atomic_read(&ub->ub_refcount), + current->comm, current->pid, + (unsigned long)current, smp_processor_id()); +#endif + lock_beancounters(slot, flags); + + if (!atomic_dec_and_test(&ub->ub_refcount)) { + unlock_beancounters(slot, flags); + return; + } + + /* + * Ok its a has bean.... neither the user nor any user + * charged objects exist. + */ + ubptr = &slot->ubh_beans; + + while (*ubptr != NULL) { + if (*ubptr == ub) { + { + int i; + for (i = 0; i < UB_RESOURCES; i++) + if (ub->ub_held[i]) + printk(KERN_DEBUG "Ub %p helds %lu in %d on destroy\n", + ub, ub->ub_held[i], i); + } + *ubptr = ub->ub_next; + kmem_cache_free(ub_cachep, ub); + unlock_beancounters(slot, flags); + return; + } + ubptr = &((*ubptr)->ub_next); + } +#if UB_DEBUG & 1 + printk(KERN_ERR "Invalid beancounter '%p' passed to free.\n", ub); + printk(KERN_DEBUG "Slot %p.\n", slot); +#endif + unlock_beancounters(slot, flags); +} + + +/* + * Generic resource charging stuff + */ + +static int __charge_beancounter_locked(struct user_beancounter *ub, + int resource, unsigned long val, int strict) +{ + int retval; + +#if UB_DEBUG & 2 + printk(KERN_DEBUG "Charging %lu for %d of %p with %lu\n", + val, resource, ub, ub->ub_held[resource]); +#endif + retval = -ENOMEM; + /* ub_value <= UB_MAXVALUE, value <= UB_MAXVALUE, and only one addition + * at the moment is possible so an overflow is impossible. + */ + ub->ub_held[resource] += val; + if (ub->ub_held[resource] > ub->ub_limit[resource]) + goto out; + if (strict && ub->ub_held[resource] > ub->ub_barrier[resource]) + goto out; + if (ub->ub_maxheld[resource] < ub->ub_held[resource]) + ub->ub_maxheld[resource] = ub->ub_held[resource]; + return 0; + +out: + ub->ub_held[resource] -= val; + return retval; +} + +int __charge_beancounter(struct user_beancounter *ub, + int resource, unsigned long val, int strict) +{ + int retval; + unsigned long flags; + + retval = -EINVAL; + if (val > UB_MAXVALUE) + goto out; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, resource, val, strict); + spin_unlock_irqrestore(&ub->ub_lock, flags); +out: + return retval; +} + +static inline void __uncharge_beancounter_locked(struct user_beancounter *ub, + int resource, unsigned long val) +{ +#if UB_DEBUG & 2 + printk(KERN_DEBUG "Uncharging %lu for %d of %p with %lu\n", + val, resource, ub, ub->ub_held[resource]); +#endif + if (ub->ub_held[resource] < val) + printk(KERN_ERR "Uncharging %lu for %d of %p with %lu\n", + val, resource, ub, ub->ub_held[resource]); + ub->ub_held[resource] -= val; +} + +void __uncharge_beancounter(struct user_beancounter *ub, + int resource, unsigned long val) +{ + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, resource, val); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + +/* + * Charge resources on a new task creation + * + * User beancounter here is a login_bc of a parent task + * (which will be equal to task_bc of the new task after allocation + * and initialisation). + * Currently only stack+struct task size is charged. + */ +int __charge_task(struct user_beancounter *ub) +{ + int retval; + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, UB_KMEMSIZE, THREAD_SIZE, 1); + if (!retval) { + retval = __charge_beancounter_locked(ub, UB_NUMPROC, 1, 1); + if (retval) + goto fail; + } +out: + spin_unlock_irqrestore(&ub->ub_lock, flags); + return retval; + +fail: + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, THREAD_SIZE); + goto out; +} + +void __uncharge_task(struct user_beancounter *ub) +{ + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, THREAD_SIZE); + __uncharge_beancounter_locked(ub, UB_NUMPROC, 1); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + +/* + * Different memory type accounting + */ +int __charge_memory(struct user_beancounter *ub, unsigned long size, + unsigned vm_flags, int strict) +{ + int retval; + unsigned long flags; + + size >>= PAGE_SHIFT; + + retval = -EINVAL; + if (size > UB_MAXVALUE) + goto out; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, UB_TOTVMPAGES, size, strict); + if (retval) + goto out_unlock; + if (vm_flags & VM_LOCKED) { + retval = __charge_beancounter_locked(ub, UB_LOCKEDPAGES, + size, strict); + if (retval) + goto totvm_restore; + } + if (vm_flags & VM_ANON) { + retval = __charge_beancounter_locked(ub, UB_ZSHMPAGES, + size, strict); + if (retval) + goto locked_restore; + } +out_unlock: + spin_unlock_irqrestore(&ub->ub_lock, flags); +out: + return retval; + +locked_restore: + __uncharge_beancounter_locked(ub, UB_LOCKEDPAGES, size); +totvm_restore: + __uncharge_beancounter_locked(ub, UB_TOTVMPAGES, size); + goto out_unlock; +} + +void __uncharge_memory(struct user_beancounter *ub, unsigned long size, + unsigned vm_flags) +{ + unsigned long flags; + + size >>= PAGE_SHIFT; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, UB_TOTVMPAGES, size); + if (vm_flags & VM_LOCKED) + __uncharge_beancounter_locked(ub, UB_LOCKEDPAGES, size); + if (vm_flags & VM_ANON) + __uncharge_beancounter_locked(ub, UB_ZSHMPAGES, size); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + + +int __charge_locked_mem(struct user_beancounter *ub, unsigned long size) +{ + return __charge_beancounter(ub, UB_LOCKEDPAGES, size >> PAGE_SHIFT, 1); +} + +void __uncharge_locked_mem(struct user_beancounter *ub, unsigned long size) +{ + __uncharge_beancounter(ub, UB_LOCKEDPAGES, size >> PAGE_SHIFT); +} + +int __charge_kmem(struct user_beancounter *ub, unsigned long size, int strict) +{ + return __charge_beancounter(ub, UB_KMEMSIZE, size, strict); +} + +void __uncharge_kmem(struct user_beancounter *ub, unsigned long size) +{ + __uncharge_beancounter(ub, UB_KMEMSIZE, size); +} + +int __charge_shmpages(struct user_beancounter *ub, unsigned long size) +{ + return __charge_beancounter(ub, UB_SHMPAGES, size, 1); +} + +void __uncharge_shmpages(struct user_beancounter *ub, unsigned long size) +{ + __uncharge_beancounter(ub, UB_SHMPAGES, size); +} + +struct task_struct *select_worst_task(void) +{ + struct task_struct * p; + struct task_struct * worst; + long ub_maxover = 0; + + worst = p = init_task.next_task; + for (; p != &init_task; p = p->next_task) { + struct mm_struct *mm = p->mm; + struct user_beancounter *ub; + long ub_overdraft = 0; /* ub current overdraft */ + + if (!mm) /* don't touch the kernel threads */ + continue; + else + ub = mm->beancounter; + + if (ub) { + ub_overdraft = + ub->ub_held[UB_TOTVMPAGES] + - ub->ub_limit[UB_OOMGUARPAGES]; + if (ub_overdraft < 0) + /* processes without overdraft are not + * preferred over ones without beancounter */ + ub_overdraft = 0; + } + if (ub_overdraft > ub_maxover || + (ub_overdraft == ub_maxover && + mm->total_vm > worst->mm->total_vm) ) { + ub_maxover = ub_overdraft; + worst = p; + } + } + return worst; +} + +/* + * File-related accounting + */ +int __charge_sock(struct user_beancounter *ub, unsigned long size) +{ + int retval; + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, UB_KMEMSIZE, size, 1); + if (!retval) { + retval = __charge_beancounter_locked(ub, UB_NUMSOCK, 1, 1); + if (retval) + goto fail; + } +out: + spin_unlock_irqrestore(&ub->ub_lock, flags); + return retval; + +fail: + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + goto out; +} + +void __uncharge_sock(struct user_beancounter *ub, unsigned long size) +{ + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + __uncharge_beancounter_locked(ub, UB_NUMSOCK, 1); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + +int __charge_flock(struct user_beancounter *ub, unsigned long size) +{ + int retval; + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, UB_KMEMSIZE, size, 1); + if (!retval) { + retval = __charge_beancounter_locked(ub, UB_NUMFLOCK, 1, 1); + if (retval) + goto fail; + } +out: + spin_unlock_irqrestore(&ub->ub_lock, flags); + return retval; + +fail: + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + goto out; +} + +void __uncharge_flock(struct user_beancounter *ub, unsigned long size) +{ + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + __uncharge_beancounter_locked(ub, UB_NUMFLOCK, 1); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + +int __charge_pty(struct user_beancounter *ub) +{ + return __charge_beancounter(ub, UB_NUMPTY, 1, 1); +} + +void __uncharge_pty(struct user_beancounter *ub) +{ + __uncharge_beancounter(ub, UB_NUMPTY, 1); +} + +/* + * Accounting for other resources + */ +int __charge_siginfo(struct user_beancounter *ub, unsigned long size) +{ + int retval; + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + retval = __charge_beancounter_locked(ub, UB_KMEMSIZE, size, 1); + if (!retval) { + retval = __charge_beancounter_locked(ub, UB_NUMSIGINFO, 1, 1); + if (retval) + goto fail; + } +out: + spin_unlock_irqrestore(&ub->ub_lock, flags); + return retval; + +fail: + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + goto out; +} + +void __uncharge_siginfo(struct user_beancounter *ub, unsigned long size) +{ + unsigned long flags; + + spin_lock_irqsave(&ub->ub_lock, flags); + __uncharge_beancounter_locked(ub, UB_KMEMSIZE, size); + __uncharge_beancounter_locked(ub, UB_NUMSIGINFO, 1); + spin_unlock_irqrestore(&ub->ub_lock, flags); +} + + +#ifdef CONFIG_USER_RESOURCE_PROC + +#if BITS_PER_LONG == 32 +#define UB_PROC_LINE_SHIFT 6 +#define UB_PROC_LINE_TEXT (5+2+11+1+10+1+10+1+10+1+10) +#else +#define UB_PROC_LINE_SHIFT 7 +#define UB_PROC_LINE_TEXT (10+2+11+1+20+1+20+1+20+1+20) +#endif +#define UB_PROC_LINE_LEN (1 << UB_PROC_LINE_SHIFT) +#define UB_PROC_LINE_SPACES (UB_PROC_LINE_LEN - UB_PROC_LINE_TEXT - 1) + +static const char *ub_rnames[UB_RESOURCES] = { + "kmemsize", + "lockedpages", + "totvmpages", + "ipcshmpages", + "anonshpages", + "numproc", + "rsspages", + "vmspaceguar", + "oomguar", + "numsock", + "numflock", + "numpty", + "numsiginfo", +}; + +static void out_proc_head(char *buf) +{ +#if BITS_PER_LONG == 32 + sprintf(buf, " uid resource held max barrier" + " limit"); +#else + sprintf(buf, " uid resource held" + " max barrier" + " limit"); +#endif + memset(buf + UB_PROC_LINE_TEXT, ' ', UB_PROC_LINE_SPACES); + buf[UB_PROC_LINE_LEN - 1] = '\n'; +} + +static void out_proc_beancounter(char *buf, struct user_beancounter *ub, int r) +{ + if (!r) +#if BITS_PER_LONG == 32 + sprintf(buf, "%5u: %-11s %10lu %10lu %10lu %10lu", + (unsigned)ub->ub_uid, ub_rnames[r], + ub->ub_held[r], ub->ub_maxheld[r], + ub->ub_barrier[r], ub->ub_limit[r]); +#else + sprintf(buf, "%10u: %-11s %20lu %20lu %20lu %20lu", + (unsigned)ub->ub_uid, ub_rnames[r], + ub->ub_held[r], ub->ub_maxheld[r], + ub->ub_barrier[r], ub->ub_limit[r]); +#endif + else +#if BITS_PER_LONG == 32 + sprintf(buf, " %-11s %10lu %10lu %10lu %10lu", + ub_rnames[r], + ub->ub_held[r], ub->ub_maxheld[r], + ub->ub_barrier[r], ub->ub_limit[r]); +#else + sprintf(buf, " %-11s %20lu %20lu %20lu %20lu", + ub_rnames[r], + ub->ub_held[r], ub->ub_maxheld[r], + ub->ub_barrier[r], ub->ub_limit[r]); +#endif + memset(buf + UB_PROC_LINE_TEXT, ' ', UB_PROC_LINE_SPACES); + buf[UB_PROC_LINE_LEN - 1] = '\n'; +} + +static ssize_t ub_proc_read(struct file *file, char *usrbuf, size_t len, + loff_t *poff) +{ + ssize_t retval; + char *buf; + unsigned long flags; + int i, resource; + struct ub_hash_slot *slot; + struct user_beancounter *ub; + size_t n; + int rem, produced, job, tocopy; + const int is_capable = is_super_ve(current) && + (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH)); + + retval = -ENOBUFS; + buf = (char *)__get_free_page(GFP_KERNEL); + if (buf == NULL) + goto out; + + retval = 0; + if (!is_capable && current->luid == (uid_t)-1) + goto out_free; + if (*poff < 0) + goto out_free; +again: + i = 0; + slot = ub_hash; + n = *poff >> UB_PROC_LINE_SHIFT; /* in lines */ + rem = *poff & (UB_PROC_LINE_LEN - 1); /* in bytes */ + produced = 0; + if (!n) { + out_proc_head(buf); + produced += UB_PROC_LINE_LEN; + n++; + } + n--; + while (1) { + lock_beancounters(slot, flags); + for (ub = slot->ubh_beans; ub != NULL && n >= UB_RESOURCES; + ub = ub->ub_next) + if (is_capable || current->luid == ub->ub_uid) + n -= UB_RESOURCES; + if (ub != NULL) + break; + unlock_beancounters(slot, flags); + if (++i >= UB_HASH_SIZE) + goto out_free; + ++slot; + } + rem += n << UB_PROC_LINE_SHIFT; + job = PAGE_SIZE; + if (len < PAGE_SIZE - rem) + job = rem + len; + while (produced < job) { + if (is_capable || current->luid == ub->ub_uid) + for (resource = 0; produced < job && resource < UB_RESOURCES; + resource++, produced += UB_PROC_LINE_LEN) + out_proc_beancounter(buf + produced, ub, resource); + if (produced >= job) + break; + ub = ub->ub_next; +checkub: + if (ub != NULL) + continue; + if (++i >= UB_HASH_SIZE) + break; + unlock_beancounters(slot, flags); + ++slot; + lock_beancounters(slot, flags); + ub = slot->ubh_beans; + goto checkub; + } + unlock_beancounters(slot, flags); +#if UB_DEBUG & 8 + printk(KERN_DEBUG "UB_PROC: produced %d, job %d, rem %d\n", + produced, job, rem); +#endif + if (produced <= rem) + goto out_free; + /* Temporay buffer `buf' contains `produced' bytes. + * Extract no more than `len' bytes at offset `rem'. + */ + tocopy = produced - rem; + if (len < tocopy) + tocopy = len; + if (!tocopy) + goto out_free; + if (copy_to_user(usrbuf, buf + rem, tocopy)) + goto fail; + *poff += tocopy; + len -= tocopy; + retval += tocopy; + if (!len) + goto out_free; + usrbuf += tocopy; + goto again; +fail: + retval = -EFAULT; +out_free: + free_page((unsigned long)buf); +out: + return retval; +} + +static struct file_operations ub_file_operations = { + read: ub_proc_read, +}; + +#endif /* defined CONFIG_USER_RESOURCE_PROC */ + + +#endif /* defined CONFIG_USER_RESOURCE */ + + +/* + * Initialisation + */ + +void __init beancounter_init(unsigned long mempages) +{ +#ifdef CONFIG_USER_RESOURCE + int k; + + ub_cachep = kmem_cache_create("user_beancounters", + sizeof(struct user_beancounter), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + + atomic_set(&default_beancounter.ub_refcount, 1); + spin_lock_init(&default_beancounter.ub_lock); + + /* + * Default settings + */ + memset(&default_beancounter.ub_held, + 0, sizeof(default_beancounter.ub_held)); + default_beancounter.ub_limit[UB_KMEMSIZE] = + mempages > (192*1024*1024 >> PAGE_SHIFT) ? + 32*1024*1024 : + (mempages << PAGE_SHIFT) / 6; + default_beancounter.ub_limit[UB_LOCKEDPAGES] = 8; + default_beancounter.ub_limit[UB_TOTVMPAGES] = + UB_MAXVALUE; /* swappable */ + default_beancounter.ub_limit[UB_SHMPAGES] = 64; + default_beancounter.ub_limit[UB_ZSHMPAGES] = 1024; + default_beancounter.ub_limit[UB_NUMPROC] = max_threads / 2; + default_beancounter.ub_limit[UB_NUMSOCK] = 1024; + default_beancounter.ub_limit[UB_NUMFLOCK] = 1024; + default_beancounter.ub_limit[UB_NUMPTY] = 16; + default_beancounter.ub_limit[UB_NUMSIGINFO] = 1024; + for (k = 0; k < UB_RESOURCES; k++) + default_beancounter.ub_barrier[k] = + default_beancounter.ub_limit[k]; + + /* + * Initialise the beancounter hash. + */ + for (k = 0; k < UB_HASH_SIZE; k++) + spin_lock_init(&ub_hash[k].ubh_lock); + +#ifdef CONFIG_USER_RESOURCE_PROC + { + struct proc_dir_entry *entry; + entry = create_proc_entry("user_beancounters", S_IRUGO, NULL); + if (entry) + entry->proc_fops = &ub_file_operations; + } +#endif +#endif +} + + +/* + * The (rather boring) getluid syscall + */ +asmlinkage long sys_getluid(void) +{ + return current->luid != (uid_t)-1 ? current->luid : -EINVAL; +} + +/* + * The setluid syscall + */ +asmlinkage long sys_setluid(uid_t uid) +{ + struct user_beancounter *ub; + int error; + + /* You may not disown a setluid */ + error = -EINVAL; + if (uid == (uid_t)-1) + goto out; + + /* You may only set an luid as root */ + error = -EPERM; + if (!capable(CAP_SETUID)) + goto out; + + /* The luid once set is irrevocable to all */ + if (current->luid != (uid_t)-1) + goto out; + +#ifdef CONFIG_USER_RESOURCE + /* Ok - set up a beancounter entry for this user */ + error = -ENOBUFS; + ub = get_beancounter_byuid(uid, 1); + if(ub == NULL) + goto out; + + printk(KERN_DEBUG "setluid, bean %p (count %d) for %.20s pid %d\n", + ub, atomic_read(&ub->ub_refcount), + current->comm, current->pid); + /* Install it */ + current->login_bc = ub; +#endif + + /* Take on our new luid and report it as OK */ + current->luid = uid; + + error = 0; +out: + return error; +} + +/* + * The setbeanlimit syscall + */ +asmlinkage long sys_setublimit(uid_t uid, unsigned long resource, + struct rlimit *rlim) +{ + int error; +#ifdef CONFIG_USER_RESOURCE + unsigned long flags; + struct user_beancounter *ub; + struct rlimit new_rlim; + + error = -EPERM; + if(!is_super_ve(current)) + goto out; + if(!capable(CAP_SYS_RESOURCE)) + goto out; + + error = -EINVAL; + if (resource >= UB_RESOURCES) + goto out; + + error = -EFAULT; + if (copy_from_user(&new_rlim, rlim, sizeof(*rlim))) + goto out; + + error = -EINVAL; + if (new_rlim.rlim_cur < 0 || new_rlim.rlim_cur > UB_MAXVALUE || + new_rlim.rlim_max < 0 || new_rlim.rlim_max > UB_MAXVALUE) + goto out; + + error = -EINVAL; + ub = get_beancounter_byuid(uid, 0); + if (ub == NULL) { +#if UB_DEBUG & 4 + printk(KERN_DEBUG "No login bc for uid %d\n", uid); +#endif + goto out; + } + + spin_lock_irqsave(&ub->ub_lock, flags); + ub->ub_barrier[resource] = new_rlim.rlim_cur; + ub->ub_limit[resource] = new_rlim.rlim_max; + spin_unlock_irqrestore(&ub->ub_lock, flags); + + __put_beancounter(ub); + + error = 0; +out: +#else + error = -ENOSYS; +#endif + return error; +} Index: linux/kernel/capability.c diff -u linux/kernel/capability.c:1.1.1.4 ASPcomplete/linux/kernel/capability.c:1.2 --- linux/kernel/capability.c:1.1.1.4 Mon Aug 14 19:43:25 2000 +++ linux/kernel/capability.c Sun Jun 18 13:04:04 2000 @@ -49,7 +49,7 @@ if (pid && pid != current->pid) { read_lock(&tasklist_lock); - target = find_task_by_pid(pid); /* identify target of query */ + target = find_task_by_pid_ve(pid); /* identify target of query */ if (!target) error = -ESRCH; } else { @@ -85,7 +85,7 @@ /* FIXME: do we need to have a write lock here..? */ read_lock(&tasklist_lock); - for_each_task(target) { + for_each_task_ve(target) { if (target->pgrp != pgrp) continue; target->cap_effective = *effective; @@ -106,7 +106,7 @@ /* FIXME: do we need to have a write lock here..? */ read_lock(&tasklist_lock); /* ALL means everyone other than self or 'init' */ - for_each_task(target) { + for_each_task_ve(target) { if (target == current || target->pid == 1) continue; target->cap_effective = *effective; @@ -159,7 +159,7 @@ if (pid > 0 && pid != current->pid) { read_lock(&tasklist_lock); - target = find_task_by_pid(pid); /* identify target of query */ + target = find_task_by_pid_ve(pid); /* identify target of query */ if (!target) { error = -ESRCH; goto out; Index: linux/kernel/exit.c diff -u linux/kernel/exit.c:1.1.1.7 ASPcomplete/linux/kernel/exit.c:1.11 --- linux/kernel/exit.c:1.1.1.7 Mon Aug 14 19:43:25 2000 +++ linux/kernel/exit.c Mon Aug 7 19:08:00 2000 @@ -12,6 +12,7 @@ #ifdef CONFIG_BSD_PROCESS_ACCT #include #endif +#include #include #include @@ -61,6 +62,8 @@ current->counter += p->counter; if (current->counter >= MAX_COUNTER) current->counter = MAX_COUNTER; + uncharge_task(p->task_bc); + put_beancounter(p->task_bc); free_task_struct(p); } else { printk("task releasing itself\n"); @@ -79,7 +82,7 @@ fallback = -1; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (p->session <= 0) continue; if (p->pgrp == pgrp) { @@ -106,7 +109,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if ((p == ignored_task) || (p->pgrp != pgrp) || (p->state == TASK_ZOMBIE) || (p->p_pptr->pid == 1)) @@ -132,7 +135,7 @@ struct task_struct * p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (p->pgrp != pgrp) continue; if (p->state != TASK_STOPPED) @@ -149,11 +152,16 @@ struct task_struct * p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if (p->p_opptr == father) { /* We dont want people slaying init */ p->exit_signal = SIGCHLD; p->self_exec_id++; +#ifdef CONFIG_VE + if( p->envid->init_entry != p ) + p->p_opptr = p->envid->init_entry; + else +#endif p->p_opptr = child_reaper; /* init */ if (p->pdeath_signal) send_sig(p->pdeath_signal, p, 0); } @@ -430,13 +438,42 @@ NORET_TYPE void do_exit(long code) { struct task_struct *tsk = current; +#ifdef CONFIG_VE + struct ve_struct *envid = current->envid; +#endif if (in_interrupt()) printk("Aiee, killing interrupt handler\n"); if (!tsk->pid) panic("Attempted to kill the idle task!"); +#ifndef CONFIG_VE if (tsk->pid == 1) panic("Attempted to kill init!"); +#else + if (envid->init_entry==tsk) + { + if( do_is_super_ve( envid ) && tsk->pid==1 ) + panic("Attempted to kill init!"); + else { + struct siginfo info; + + memset( &info, 0, sizeof(info) ); + info.si_code = SI_KERNEL; + info.si_signo = SIGKILL; + + down( &envid->init_exit_guard ); + kill_something_info(SIGKILL, &info, -1); + + envid->init_entry = child_reaper; + write_lock_irq( &tasklist_lock ); + REMOVE_LINKS(tsk); + tsk->p_pptr = tsk->p_opptr = child_reaper; + SET_LINKS(tsk); + write_unlock_irq( &tasklist_lock ); + } + } +#endif + tsk->flags |= PF_EXITING; del_timer_sync(&tsk->real_timer); @@ -454,9 +491,32 @@ tsk->state = TASK_ZOMBIE; tsk->exit_code = code; exit_notify(); + /* + * Login beancounter will no longer have a sense. + * Struct task memory will be uncharged and tsk->task_bc dropped + * in release(). + */ + put_beancounter(tsk->login_bc); put_exec_domain(tsk->exec_domain); if (tsk->binfmt && tsk->binfmt->module) __MOD_DEC_USE_COUNT(tsk->binfmt->module); + +#ifdef CONFIG_VE + if( !do_is_super_ve(envid) ) + { + /* Non-atomicity is not dangerous here as we are under kernel lock */ + int result = atomic_read( &envid->pcounter ); + atomic_dec( &envid->pcounter ); + if( result==2 ) + { + down_trylock( &envid->init_exit_guard ); + up( &envid->init_exit_guard ); + } + else if( result==1 ) + do_env_cleanup( envid ); + } + +#endif schedule(); /* * In order to get rid of the "volatile function does return" message Index: linux/kernel/fork.c diff -u linux/kernel/fork.c:1.1.1.8 ASPcomplete/linux/kernel/fork.c:1.10 --- linux/kernel/fork.c:1.1.1.8 Mon Aug 14 19:43:25 2000 +++ linux/kernel/fork.c Sat Aug 12 19:36:41 2000 @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -46,6 +47,18 @@ wq_write_unlock_irqrestore(&q->lock, flags); } +/* + * For SMP, we need to re-test the user struct counter + * after having aquired the spinlock. This allows us to do + * the common case (not freeing anything) without having + * any locking. + */ +#ifdef CONFIG_SMP + #define uid_hash_free(up) (!atomic_read(&(up)->count)) +#else + #define uid_hash_free(up) (1) +#endif + void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait) { unsigned long flags; @@ -80,7 +93,10 @@ /* Protects next_safe and last_pid. */ spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED; -static int get_pid(unsigned long flags) +#ifndef CONFIG_VE +static +#endif +int get_pid(unsigned long flags) { static int next_safe = PID_MAX; struct task_struct *p; @@ -98,7 +114,7 @@ next_safe = PID_MAX; read_lock(&tasklist_lock); repeat: - for_each_task(p) { + for_each_task_all(p) { if(p->pid == last_pid || p->pgrp == last_pid || p->session == last_pid) { @@ -129,6 +145,7 @@ int retval; flush_cache_mm(current->mm); + mm->total_vm = 0; mm->locked_vm = 0; mm->mmap = NULL; mm->mmap_avl = NULL; @@ -146,9 +163,16 @@ retval = -ENOMEM; if(mpnt->vm_flags & VM_DONTCOPY) continue; + retval = charge_memory(mm->beancounter, + mpnt->vm_end - mpnt->vm_start, + (mpnt->vm_flags & ~VM_LOCKED), 1); + if (retval) + goto fail_nomem; + retval = -ENOMEM; tmp = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (!tmp) - goto fail_nomem; + goto out_uncharge; + mm->total_vm += (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT; *tmp = *mpnt; tmp->vm_flags &= ~VM_LOCKED; tmp->vm_mm = mm; @@ -194,19 +218,28 @@ fail_nomem: flush_tlb_mm(current->mm); return retval; + +out_uncharge: + uncharge_memory(mm->beancounter, mpnt->vm_end - mpnt->vm_start, + (mpnt->vm_flags & ~VM_LOCKED)); + /* linked VM areas will be uncharged when the list is destroyed */ + goto fail_nomem; } #define allocate_mm() (kmem_cache_alloc(mm_cachep, SLAB_KERNEL)) -static struct mm_struct * mm_init(struct mm_struct * mm) +static struct mm_struct * mm_init(struct mm_struct * mm, struct user_beancounter *ub) { atomic_set(&mm->mm_users, 1); atomic_set(&mm->mm_count, 1); init_MUTEX(&mm->mmap_sem); mm->page_table_lock = SPIN_LOCK_UNLOCKED; - mm->pgd = pgd_alloc(); - if (mm->pgd) + mm->pgd = pgd_alloc(ub); + if (mm->pgd) { + mm->beancounter = ub; + get_beancounter(ub); return mm; + } kmem_cache_free(mm_cachep, mm); return NULL; } @@ -215,14 +248,14 @@ /* * Allocate and initialize an mm_struct. */ -struct mm_struct * mm_alloc(void) +struct mm_struct * mm_alloc(struct user_beancounter *ub) { struct mm_struct * mm; mm = allocate_mm(); if (mm) { memset(mm, 0, sizeof(*mm)); - return mm_init(mm); + return mm_init(mm, ub); } return NULL; } @@ -235,8 +268,9 @@ inline void __mmdrop(struct mm_struct *mm) { if (mm == &init_mm) BUG(); - pgd_free(mm->pgd); + pgd_free(mm->beancounter, mm->pgd); destroy_context(mm); + put_beancounter(mm->beancounter); kmem_cache_free(mm_cachep, mm); } @@ -308,7 +342,7 @@ /* Copy the current MM stuff.. */ memcpy(mm, current->mm, sizeof(*mm)); - if (!mm_init(mm)) + if (!mm_init(mm, tsk->login_bc)) goto fail_nomem; tsk->mm = mm; @@ -535,7 +569,7 @@ */ int do_fork(unsigned long clone_flags, unsigned long usp, struct pt_regs *regs) { - int retval = -ENOMEM; + int retval; struct task_struct *p; DECLARE_MUTEX_LOCKED(sem); @@ -547,9 +581,14 @@ current->vfork_sem = &sem; + retval = charge_task(current->login_bc); + if (retval) + goto fork_out; + + retval = -ENOMEM; p = alloc_task_struct(); if (!p) - goto fork_out; + goto fork_out_uncharge; *p = *current; @@ -622,6 +661,12 @@ p->lock_depth = -1; /* -1 = no lock */ p->start_time = jiffies; + /* Clone the beancounter reference. + * The parent (current) keeps the reference count being nonzero. */ + get_beancounter(p->login_bc); /* copied intact */ + p->task_bc = p->login_bc; + get_beancounter(p->task_bc); + retval = -ENOMEM; /* copy all the process information */ if (copy_files(clone_flags, p)) @@ -664,6 +709,11 @@ * * Let it rip! */ +#ifdef CONFIG_VE + if( !is_super_ve(p) ) + atomic_inc( &p->envid->pcounter ); +#endif + retval = p->pid; write_lock_irq(&tasklist_lock); SET_LINKS(p); @@ -688,6 +738,8 @@ bad_fork_cleanup_files: exit_files(p); /* blocking */ bad_fork_cleanup: + put_beancounter(p->task_bc); + put_beancounter(p->login_bc); put_exec_domain(p->exec_domain); if (p->binfmt && p->binfmt->module) __MOD_DEC_USE_COUNT(p->binfmt->module); @@ -696,5 +748,21 @@ free_uid(p->user); bad_fork_free: free_task_struct(p); + uncharge_task(current->login_bc); goto bad_fork; + +fork_out_uncharge: + uncharge_task(current->login_bc); + goto fork_out; +} + +void __init filescache_init(void) +{ + files_cachep = kmem_cache_create("files_cache", + sizeof(struct files_struct), + 0, + SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!files_cachep) + panic("Cannot create files cache"); } Index: linux/kernel/ksyms.c diff -u linux/kernel/ksyms.c:1.1.1.7 ASPcomplete/linux/kernel/ksyms.c:1.4 --- linux/kernel/ksyms.c:1.1.1.7 Mon Aug 14 19:43:25 2000 +++ linux/kernel/ksyms.c Mon Aug 7 19:08:00 2000 @@ -453,7 +453,7 @@ EXPORT_SYMBOL(bdevname); EXPORT_SYMBOL(cdevname); EXPORT_SYMBOL(simple_strtoul); -EXPORT_SYMBOL(system_utsname); /* UTS data */ +EXPORT_SYMBOL(_system_utsname); /* UTS data */ EXPORT_SYMBOL(uts_sem); /* UTS semaphore */ #ifndef __mips__ EXPORT_SYMBOL(sys_call_table); Index: linux/kernel/sched.c diff -u linux/kernel/sched.c:1.1.1.7 ASPcomplete/linux/kernel/sched.c:1.5 --- linux/kernel/sched.c:1.1.1.7 Mon Aug 14 19:43:25 2000 +++ linux/kernel/sched.c Mon Aug 7 19:08:00 2000 @@ -646,7 +646,7 @@ struct task_struct *p; spin_unlock_irq(&runqueue_lock); read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice); read_unlock(&tasklist_lock); spin_lock_irq(&runqueue_lock); @@ -877,7 +877,7 @@ struct task_struct *tsk = current; if (pid) - tsk = find_task_by_pid(pid); + tsk = find_task_by_pid_ve(pid); return tsk; } @@ -1166,7 +1166,7 @@ printk(" task PC stack pid father child younger older\n"); #endif read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) show_task(p); read_unlock(&tasklist_lock); } Index: linux/kernel/signal.c diff -u linux/kernel/signal.c:1.1.1.4 ASPcomplete/linux/kernel/signal.c:1.3 --- linux/kernel/signal.c:1.1.1.4 Mon Aug 14 19:43:25 2000 +++ linux/kernel/signal.c Tue Aug 1 15:30:45 2000 @@ -61,9 +61,12 @@ t->sigqueue_tail = &t->sigqueue; while (q) { + struct user_beancounter *bc = q->charged_bc; n = q->next; kmem_cache_free(signal_queue_cachep, q); atomic_dec(&nr_queued_signals); + uncharge_siginfo(bc, sizeof(struct signal_queue)); + put_beancounter(bc); q = n; } } @@ -143,11 +146,14 @@ if (q->info.si_signo == sig) break; if (q) { + struct user_beancounter *bc = q->charged_bc; if ((*pp = q->next) == NULL) current->sigqueue_tail = pp; copy_siginfo(info, &q->info); kmem_cache_free(signal_queue_cachep,q); atomic_dec(&nr_queued_signals); + uncharge_siginfo(bc, sizeof(struct signal_queue)); + put_beancounter(bc); /* Then see if this signal is still pending. (Non rt signals may not be queued twice.) @@ -225,10 +231,13 @@ if (q->info.si_signo == sig) break; if (q) { + struct user_beancounter *bc = q->charged_bc; if ((*pp = q->next) == NULL) t->sigqueue_tail = pp; kmem_cache_free(signal_queue_cachep,q); atomic_dec(&nr_queued_signals); + uncharge_siginfo(bc, sizeof(struct signal_queue)); + put_beancounter(bc); } return 1; } @@ -346,8 +355,19 @@ pass on the info struct. */ if (atomic_read(&nr_queued_signals) < max_queued_signals) { + struct user_beancounter *bc = current->login_bc; + if ( !charge_siginfo(bc, sizeof(struct signal_queue))) { q = (struct signal_queue *) - kmem_cache_alloc(signal_queue_cachep, GFP_ATOMIC); + kmem_cache_alloc(signal_queue_cachep, + GFP_ATOMIC); + if (q) { + get_beancounter(bc); + q->charged_bc = bc; + } + else + uncharge_siginfo(bc, + sizeof(struct signal_queue)); + } } if (q) { @@ -460,7 +480,7 @@ retval = -ESRCH; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { if (p->pgrp == pgrp) { int err = send_sig_info(sig, info, p); if (err != 0) @@ -492,7 +512,7 @@ retval = -ESRCH; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { if (p->leader && p->session == sess) { int err = send_sig_info(sig, info, p); if (err) @@ -515,7 +535,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - p = find_task_by_pid(pid); + p = find_task_by_pid_ve(pid); error = -ESRCH; if (p) error = send_sig_info(sig, info, p); @@ -540,7 +560,7 @@ struct task_struct * p; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { if (p->pid > 1 && p != current) { int err = send_sig_info(sig, info, p); ++count; @@ -901,10 +921,13 @@ if (q->info.si_signo != sig) pp = &q->next; else { + struct user_beancounter *bc = q->charged_bc; if ((*pp = q->next) == NULL) current->sigqueue_tail = pp; kmem_cache_free(signal_queue_cachep, q); atomic_dec(&nr_queued_signals); + uncharge_siginfo(bc, sizeof(struct signal_queue)); + put_beancounter(bc); } q = *pp; } Index: linux/kernel/softirq.c diff -u linux/kernel/softirq.c:1.1.1.4 ASPcomplete/linux/kernel/softirq.c:1.3 --- linux/kernel/softirq.c:1.1.1.4 Mon Aug 14 19:43:25 2000 +++ linux/kernel/softirq.c Mon Aug 7 19:08:00 2000 @@ -45,6 +45,9 @@ #endif /* CONFIG_ARCH_S390 */ static struct softirq_action softirq_vec[32]; +#ifdef CONFIG_VE_IRQ +atomic_t soft_irq_state[NR_CPUS]; +#endif asmlinkage void do_softirq() { @@ -57,6 +60,9 @@ local_bh_disable(); local_irq_disable(); +#ifdef CONFIG_VE_IRQ + local_softirq_enter(); +#endif mask = softirq_mask(cpu); active = softirq_active(cpu) & mask; @@ -86,6 +92,9 @@ goto retry; } +#ifdef CONFIG_VE_IRQ + local_softirq_leave(); +#endif local_bh_enable(); /* Leave with locally disabled hard irqs. It is critical to close @@ -276,8 +285,10 @@ for (i=0; i<32; i++) tasklet_init(bh_task_vec+i, bh_action, i); +#ifdef CONFIG_VE_IRQ + memset( soft_irq_state, 0, sizeof(soft_irq_state)); +#endif + open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL); open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL); } - - Index: linux/kernel/sys.c diff -u linux/kernel/sys.c:1.1.1.7 ASPcomplete/linux/kernel/sys.c:1.10 --- linux/kernel/sys.c:1.1.1.7 Mon Aug 14 19:43:25 2000 +++ linux/kernel/sys.c Sat Aug 12 19:36:41 2000 @@ -18,6 +18,10 @@ #include #include +#ifdef CONFIG_VE +#define system_utsname (current->envid->_system_utsname) +#endif + /* * this is where the system-wide overflow UID and GID are defined, for * architectures that now have 32-bit UID/GID but didn't in the past @@ -212,7 +216,7 @@ niceval = 19; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { if (!proc_sel(p, which, who)) continue; if (p->uid != current->euid && @@ -247,7 +251,7 @@ return -EINVAL; read_lock(&tasklist_lock); - for_each_task (p) { + for_each_task_ve (p) { long niceval; if (!proc_sel(p, which, who)) continue; @@ -283,6 +287,25 @@ magic2 != LINUX_REBOOT_MAGIC2B)) return -EINVAL; +#ifdef CONFIG_VE + if( !is_super_ve(current) && LINUX_REBOOT_CMD_CAD_ON!=cmd && LINUX_REBOOT_CMD_CAD_OFF!=cmd ) + { + struct siginfo info; + do_mark_env_to_down( current->envid ); + + info.si_errno = 0; + info.si_code = SI_KERNEL; + info.si_pid = current->pid; + info.si_uid = current->uid; + info.si_signo = SIGKILL; + + kill_something_info(SIGKILL, &info, -1); + send_sig_info(SIGKILL, &info, current); + + return 0; + } +#endif + lock_kernel(); switch (cmd) { case LINUX_REBOOT_CMD_RESTART: @@ -748,7 +771,11 @@ if (tbuf) if (copy_to_user(tbuf, ¤t->times, sizeof(struct tms))) return -EFAULT; +#ifndef CONFIG_VE return jiffies; +#else + return jiffies - current->envid->init_entry->start_time; +#endif } /* @@ -782,7 +809,7 @@ read_lock(&tasklist_lock); err = -ESRCH; - p = find_task_by_pid(pid); + p = find_task_by_pid_ve(pid); if (!p) goto out; @@ -800,7 +827,7 @@ goto out; if (pgid != pid) { struct task_struct * tmp; - for_each_task (tmp) { + for_each_task_ve(tmp) { if (tmp->pgrp == pgid && tmp->session == current->session) goto ok_pgid; @@ -826,7 +853,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - p = find_task_by_pid(pid); + p = find_task_by_pid_ve(pid); retval = -ESRCH; if (p) @@ -851,7 +878,7 @@ struct task_struct *p; read_lock(&tasklist_lock); - p = find_task_by_pid(pid); + p = find_task_by_pid_ve(pid); retval = -ESRCH; if(p) @@ -867,7 +894,7 @@ int err = -EPERM; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_ve(p) { if (p->pgrp == current->pid) goto out; } Index: linux/kernel/sysctl.c diff -u linux/kernel/sysctl.c:1.1.1.7 ASPcomplete/linux/kernel/sysctl.c:1.10 --- linux/kernel/sysctl.c:1.1.1.7 Mon Aug 14 19:43:25 2000 +++ linux/kernel/sysctl.c Mon Aug 7 19:08:00 2000 @@ -61,11 +61,11 @@ #ifdef CONFIG_CHR_DEV_SG extern int sg_big_buff; #endif -#ifdef CONFIG_SYSVIPC -extern size_t shm_ctlmax; +#if defined(CONFIG_SYSVIPC) extern int msg_ctlmax; extern int msg_ctlmnb; extern int msg_ctlmni; +extern size_t shm_ctlmax; extern int sem_ctls[]; #endif @@ -89,6 +89,19 @@ ctl_table *, void **); static int proc_doutsstring(ctl_table *table, int write, struct file *filp, void *buffer, size_t *lenp); +#ifdef CONFIG_VE +static int ve_doutsstring(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp); +static int ve_dointvec(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp); +static int ve_doulongvec_minmax(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp); +#else +#define ve_doutsstring proc_doutsstring +#define ve_dointvec proc_dointvec +#define ve_doulongvec_minmax proc_doulongvec_minmax +#endif + static ctl_table root_table[]; static struct ctl_table_header root_table_header = @@ -147,16 +160,16 @@ }; static ctl_table kern_table[] = { - {KERN_OSTYPE, "ostype", system_utsname.sysname, 64, - 0444, NULL, &proc_doutsstring, &sysctl_string}, - {KERN_OSRELEASE, "osrelease", system_utsname.release, 64, - 0444, NULL, &proc_doutsstring, &sysctl_string}, - {KERN_VERSION, "version", system_utsname.version, 64, - 0444, NULL, &proc_doutsstring, &sysctl_string}, - {KERN_NODENAME, "hostname", system_utsname.nodename, 64, - 0644, NULL, &proc_doutsstring, &sysctl_string}, - {KERN_DOMAINNAME, "domainname", system_utsname.domainname, 64, - 0644, NULL, &proc_doutsstring, &sysctl_string}, + {KERN_OSTYPE, "ostype", ve_offsetof(system_utsname.sysname), 64, + 0444, NULL, &ve_doutsstring, &sysctl_string}, + {KERN_OSRELEASE, "osrelease", ve_offsetof(system_utsname.release), 64, + 0444, NULL, &ve_doutsstring, &sysctl_string}, + {KERN_VERSION, "version", ve_offsetof(system_utsname.version), 64, + 0444, NULL, &ve_doutsstring, &sysctl_string}, + {KERN_NODENAME, "hostname", ve_offsetof(system_utsname.nodename), 64, + 0644, NULL, &ve_doutsstring, &sysctl_string}, + {KERN_DOMAINNAME, "domainname", ve_offsetof(system_utsname.domainname), 64, + 0644, NULL, &ve_doutsstring, &sysctl_string}, {KERN_PANIC, "panic", &panic_timeout, sizeof(int), 0644, NULL, &proc_dointvec}, {KERN_CAP_BSET, "cap-bound", &cap_bset, sizeof(kernel_cap_t), @@ -214,7 +227,7 @@ 0644, NULL, &proc_dointvec}, {KERN_MSGMNB, "msgmnb", &msg_ctlmnb, sizeof (int), 0644, NULL, &proc_dointvec}, - {KERN_SEM, "sem", &sem_ctls, 4*sizeof (int), + {KERN_SEM, "sem", sem_ctls, 4*sizeof (int), 0644, NULL, &proc_dointvec}, #endif #ifdef CONFIG_MAGIC_SYSRQ @@ -353,22 +366,29 @@ * some sysctl variables are readonly even to root. */ -static int test_perm(int mode, int op) +#define TEST_CONDITION 0007 +#define VE_TEST_CONDITION 0004 +#define KERNEL_BASE 0xc000000ul + +static int test_perm(int mode, int op, int test_condition) { if (!current->euid) mode >>= 6; else if (in_egroup_p(0)) mode >>= 3; - if ((mode & op & 0007) == op) + if ((mode & op & test_condition) == op) return 0; return -EACCES; } static inline int ctl_perm(ctl_table *table, int op) { - return test_perm(table->mode, op); + ulong data = (ulong)table->data; + return test_perm(table->mode, op, (datai_mode, op); + return test_perm(inode->i_mode, op, TEST_CONDITION); } int proc_dostring(ctl_table *table, int write, struct file *filp, @@ -1283,10 +1303,39 @@ return 0; } +#ifdef CONFIG_VE +static void prepare_data(ctl_table *new_table, ctl_table *table) +{ + memcpy( new_table, table, sizeof(ctl_table) ); + new_table->data += (ulong)current->envid; +} +static int ve_doutsstring(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + ctl_table new_table; + prepare_data(&new_table,table); + return proc_doutsstring( &new_table, write, filp, buffer, lenp ); +} -#else /* CONFIG_SYSCTL */ +static int ve_dointvec(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + ctl_table new_table; + prepare_data(&new_table,table); + return proc_dointvec( &new_table, write, filp, buffer, lenp ); +} +static int ve_doulongvec_minmax(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + ctl_table new_table; + prepare_data(&new_table,table); + return proc_doulongvec_minmax( &new_table, write, filp, buffer, lenp ); +} +#endif +#else /* CONFIG_SYSCTL */ + extern asmlinkage long sys_sysctl(struct __sysctl_args *args) { return -ENOSYS; @@ -1357,6 +1406,27 @@ void unregister_sysctl_table(struct ctl_table_header * table) { + return -ENOSYS; } + +#ifdef CONFIG_VE +static int ve_doutsstring(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + return -ENOSYS; +} + +int ve_dointvec(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + return -ENOSYS; +} + +int ve_doulongvec_minmax(ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + return -ENOSYS; +} +#endif #endif /* CONFIG_SYSCTL */ Index: linux/kernel/timer.c diff -u linux/kernel/timer.c:1.1.1.8 ASPcomplete/linux/kernel/timer.c:1.6 --- linux/kernel/timer.c:1.1.1.8 Mon Aug 14 19:43:25 2000 +++ linux/kernel/timer.c Sat Aug 12 19:36:41 2000 @@ -612,7 +612,7 @@ unsigned long nr = 0; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if ((p->state == TASK_RUNNING || (p->state & TASK_UNINTERRUPTIBLE))) nr += FIXED_1; Index: linux/kernel/ve.c diff -uN /dev/null linux/kernel/ve.c:1.23 --- /dev/null Mon Aug 14 20:39:34 2000 +++ linux/kernel/ve.c Mon Aug 14 17:38:11 2000 @@ -0,0 +1,956 @@ +/* + * linux/kernel/ve.c + * + * Copyright (C) 2000 SWSoft Pte Ltd + * + * Author Denis V. Lunev den@asp-linux.com + * + * 2000-03-10 Created + */ + +/* + * 've.c' is file with basic VE support. It provides basic primities + * along with initialization script + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +/* Various externals from other parts of kernel */ +extern struct task_struct *child_reaper; +extern struct tty_driver pty_driver, pty_slave_driver; +extern struct tty_driver *tty_drivers; + +#ifdef CONFIG_UNIX98_PTYS +extern struct tty_driver ptm_driver[]; /* Unix98 pty masters; for /dev/ptmx */ +extern struct tty_driver pts_driver[]; /* Unix98 pty slaves; for /dev/ptmx */ +#endif + +extern rwlock_t tty_driver_guard; +extern int get_pid(unsigned long flags); + +#ifdef CONFIG_SYSVIPC +extern int ve_ipc_cleanup( struct ve_struct *envid ); +#endif + +#ifdef CONFIG_VE_LINK +static struct sock *nlve0 = NULL; +#endif + +static struct ve_struct *ve_list_head = NULL; +rwlock_t ve_list_guard = RW_LOCK_UNLOCKED; +rwlock_t devperms_hash_guard = RW_LOCK_UNLOCKED; + +asmlinkage int sys_env_create(envid_t veid, unsigned flags, u32 addr); +asmlinkage int sys_mark_env_to_down(envid_t veid); +static int do_env_create(envid_t veid, unsigned flags, u32 addr); +static int do_prepare_devperms( envid_t veid, unsigned mode ); +static void do_clean_devperms( envid_t veid ); +static int register_ve_tty_driver( struct ve_struct* envid ); +static void unregister_ve_tty_driver( struct ve_struct* envid ); +static void clear_termios( struct tty_driver* driver ); +static int ve_read_proc(char *page, char **start, off_t off, int count, int *eof, void *data); +static int ve_get_info(char *buffer, off_t offset, int length); + +/* Default capabilities list for VE */ +kernel_cap_t env_cap_default; + + +static inline struct ve_struct *get_ve(envid_t veid) +{ + struct ve_struct *ptr; + read_lock( &ve_list_guard ); + for( ptr = ve_list_head; ptr; ptr = ptr->next ) + if( ptr->veid == veid ) + break; + read_unlock( &ve_list_guard ); + return ptr; +} + +/* Devices permissions routines, + * character and block devices separately + */ +/* Rules applied in the following order: + MAJOR!=0, MINOR!=0 + MAJOR!=0, MINOR==0 + MAJOR==0, MINOR==0 +*/ +struct devperms_struct +{ + kdev_t dev; /* device id */ + unsigned type; + envid_t veid; + unsigned char mask; + + struct devperms_struct *devhash_next; + struct devperms_struct **devhash_pprev; +}; + +static struct devperms_struct original_perms[] = +{{ + MKDEV(0,0), /*device*/ + S_IFCHR, /*type*/ + 0, /*veid*/ + S_IRWXO, + NULL, NULL +}, +{ + MKDEV(0,0), /*device*/ + S_IFBLK, /*type*/ + 0, /*veid*/ + S_IRWXO, + NULL, NULL +}}; + +static unsigned default_chrperms[] = { + PTY_MASTER_MAJOR, 0, S_IRWXO, + PTY_SLAVE_MAJOR, 0, S_IRWXO, + TTY_MAJOR, 0, S_IRWXO, + TTYAUX_MAJOR, 0, S_IRWXO, + UNIX98_PTY_MASTER_MAJOR, 0, S_IRWXO, + UNIX98_PTY_SLAVE_MAJOR, 0, S_IRWXO, + MEM_MAJOR, 5, S_IRWXO, /* zero */ + MEM_MAJOR, 8, S_IRWXO, /* random */ + MEM_MAJOR, 9, S_IRWXO, /* urandom */ + MEM_MAJOR, 3, S_IRWXO, /* null */ + 0 +}; +static unsigned default_blkperms[] = { + 0 +}; + + +#define DEVPERMS_HASH_SZ 64 +struct devperms_struct *devperms_hash[DEVPERMS_HASH_SZ]; + +#define devperms_hashfn(id,dev) ( id<<8 ^ (MAJOR(dev)<<3) ^ MINOR(dev) ) & (DEVPERMS_HASH_SZ - 1) + +static inline void hash_devperms(struct devperms_struct *p) +{ + struct devperms_struct **htable = &devperms_hash[devperms_hashfn(p->veid,p->dev)]; + + if((p->devhash_next = *htable) != NULL) + (*htable)->devhash_pprev = &p->devhash_next; + *htable = p; + p->devhash_pprev = htable; +} + +static inline void unhash_devperms(struct devperms_struct *p) +{ + if(p->devhash_next) + p->devhash_next->devhash_pprev = p->devhash_pprev; + *p->devhash_pprev = p->devhash_next; +} + +static inline struct devperms_struct *find_devperms(envid_t veid, int type, kdev_t dev) +{ + struct devperms_struct *p, **htable = &devperms_hash[devperms_hashfn(veid,dev)]; + + for(p = *htable; p && !( p->type==(type&S_IFMT) && + MAJOR(dev)==MAJOR(p->dev) && + MINOR(dev)==MINOR(p->dev) && + p->veid==veid); + p = p->devhash_next ) + ; + return p; +} + +static int do_prepare_devperms( envid_t veid, unsigned mode ) +{ + unsigned *ptr = S_ISCHR(mode) ? default_chrperms : default_blkperms; + struct devperms_struct *new = kmalloc( sizeof(struct devperms_struct), GFP_KERNEL ); + + if( !new ) + return -ENOMEM; + + new->veid = veid; + new->type = mode; + new->dev = MKDEV(0,0); + new->mask = 0; + hash_devperms( new ); + + while( *ptr ) + { + unsigned major = *ptr++; + unsigned minor = *ptr++; + new = kmalloc( sizeof(struct devperms_struct), GFP_KERNEL ); + if( !new ) + return -ENOMEM; + + new->veid = veid; + new->type = mode; + new->dev = MKDEV(major,minor); + new->mask = *ptr++; + hash_devperms( new ); + } + return 0; +} + +static void do_clean_devperms( envid_t veid ) +{ + int i; + struct devperms_struct* ptr; + + for( i=0; idevhash_next; + if( ptr->veid==veid ) + { + unhash_devperms( ptr ); + kfree( ptr ); + } + + ptr = next; + } +} + +unsigned char get_device_perms_ve( envid_t veid, int type, kdev_t dev ) +{ + struct devperms_struct *ptr; + + read_lock( &devperms_hash_guard ); + ptr = find_devperms( veid, type, dev ); + + if( ptr ) + goto end; + if( MINOR(dev) ) + ptr = find_devperms( veid, type, MKDEV(MAJOR(dev),0) ); + if( ptr ) + goto end; + + if( MAJOR(dev) ) + ptr = find_devperms( veid, type, MKDEV(0,0) ); + +end: read_unlock( &devperms_hash_guard ); + return ptr ? ptr->mask : 0; +} + +asmlinkage int sys_setdevperms( envid_t veid, unsigned type, kdev_t dev, unsigned mask ) +{ + struct ve_struct *ptr; + struct devperms_struct *perms; + + if( !capable(CAP_SETVEID) || veid==0 ) + return -EPERM; + + if( S_ISBLK(type) && MAJOR(dev)>=MAX_BLKDEV ) + return -ENODEV; + + if( S_ISCHR(type) && MAJOR(dev)>=MAX_CHRDEV ) + return -ENODEV; + + if( (ptr=get_ve(veid))==NULL ) + return -ESRCH; + + write_lock_irq( &devperms_hash_guard ); + perms = find_devperms( veid, type, dev ); + if( !perms ) + { + perms = kmalloc( sizeof(struct devperms_struct), GFP_KERNEL ); + if( !perms ) { + write_unlock_irq( &devperms_hash_guard ); + return -ENOMEM; + } + + perms->veid = veid; + perms->dev = dev; + perms->type = type; + perms->mask = mask & S_IRWXO; + hash_devperms( perms ); + } else + perms->mask = mask & S_IRWXO; + write_unlock_irq( &devperms_hash_guard ); + + return 0; +} + + +static int ve_read_proc(char *page, char **start, off_t off, int count, int *eof, void *data) +{ + int len = ve_get_info(page, off, count); + if (len <= off+count) *eof = 1; + *start = page + off; + len -= off; + if (len>count) len = count; + if (len<0) len = 0; + return len; +} + + +static int ve_get_info(char *page, off_t offset, int length) +{ + int len = 0, clen; + int out_count = 0; + char buffer[128]; + int rsize = 0; + struct ve_struct *ptr; + + if( !is_super_ve(current) ) + return sprintf( page, "%10d %5d\n", current->envid->veid, + atomic_read(¤t->envid->pcounter) ); + + *page = 0; + read_lock( &ve_list_guard ); + for( ptr = ve_list_head; ptr; ptr = ptr->next ) { + out_count += rsize; + if( out_countveid, + atomic_read(&ptr->pcounter) ); + if( !rsize ) + { + out_count = rsize = clen; + if( out_count < offset ) + continue; + } + strncat( page+len, buffer+len-out_count+rsize, length-len ); + len += clen; + if( len>length ) + break; + } + read_unlock( &ve_list_guard ); + return len; +} + + +asmlinkage int sys_env_create(envid_t veid, unsigned flags, u32 addr) +{ + int status = 0; + + if( !flags ) + return current->envid->veid; + + if( !capable(CAP_SETVEID) ) + return -EPERM; + + if( flags&VE_CREATE ) + flags |= VE_ENTER; + if( flags&VE_TEST && flags&VE_ENTER ) + return -EINVAL; + + lock_kernel(); + status = do_env_create(veid, flags, addr); + unlock_kernel(); + return status; +} + +asmlinkage int sys_mark_env_to_down(envid_t veid) +{ + struct ve_struct *ptr; + + if( !capable(CAP_SETVEID) || veid==0 ) + return -EPERM; + + if( (ptr=get_ve(veid))==NULL ) + return -ESRCH; + + do_mark_env_to_down(ptr); + return 0; +} + +void do_mark_env_to_down( struct ve_struct *ptr ) +{ + int pid = get_pid(0); + + write_lock_irq( &tasklist_lock ); + unhash_pid( ptr->init_entry ); + ptr->init_entry->pid = pid; + hash_pid( ptr->init_entry ); + write_unlock_irq( &tasklist_lock ); +} + +static int do_env_create(envid_t veid, unsigned flags, u32 addr) +{ + struct task_struct *proc = current; + struct ve_struct *old = current->envid; + struct ve_struct *ptr = get_ve(veid); + int err = 0; + + if( ptr ) + { + if( flags&VE_EXCLUSIVE ) + return -EACCES; + else + goto out_success; + } + + if( !(flags & VE_CREATE) ) + return -ESRCH; + + ptr = kmalloc( sizeof(struct ve_struct), GFP_KERNEL ); + if( ptr == NULL ) + return -ENOMEM; + + memset( ptr, 0, sizeof(struct ve_struct) ); + memcpy( &ptr->_system_utsname, &child_reaper->envid->_system_utsname, + sizeof(struct new_utsname) ); + ptr->veid = veid; + memcpy( &ptr->cap_default, &env_cap_default, sizeof(env_cap_default) ); + ptr->ip = addr; + proc->envid = ptr; + sema_init( &ptr->init_exit_guard, 1 ); + + if( (err = register_ve_tty_driver(ptr))!=0 ) + goto err_vetty; + + write_lock_irq( &devperms_hash_guard ); + if( (err=do_prepare_devperms( veid, S_IFCHR ))!=0 ) + goto err_perms; + if( (err=do_prepare_devperms( veid, S_IFBLK ))!=0 ) + goto err_perms; + write_unlock_irq( &devperms_hash_guard ); + + write_lock_irq( &ve_list_guard ); + ptr->prev = NULL; + ptr->next = ve_list_head; + if( ve_list_head ) + ve_list_head->prev = ptr; + ve_list_head = ptr; + write_unlock_irq( &ve_list_guard ); + ptr->init_entry = proc; + + read_lock(&proc->fs->lock); + ptr->fs_rootmnt = proc->fs->rootmnt; + ptr->fs_root = proc->fs->root; + read_unlock(&proc->fs->lock); + + write_lock_irq( &tasklist_lock ); + unhash_pid(proc); + proc->pid = 1; + hash_pid(proc); + write_unlock_irq( &tasklist_lock ); + +out_success: + if( flags&VE_ENTER ) + { + if( down_trylock( &ptr->init_exit_guard )) + return -EBUSY; + up( &ptr->init_exit_guard ); + + cap_mask( proc->cap_effective, ptr->cap_default ); + cap_mask( proc->cap_inheritable, ptr->cap_default ); + cap_mask( proc->cap_permitted, ptr->cap_default ); + proc->envid = ptr; + atomic_inc( &ptr->pcounter ); + } + return current->envid->veid; + + err_perms: + do_clean_devperms( ptr->veid ); + write_unlock_irq( &devperms_hash_guard ); + err_vetty: + unregister_ve_tty_driver(ptr); + kfree(ptr); + proc->envid = old; + return err; +} + +void do_env_cleanup( struct ve_struct *ptr ) +{ +#ifdef CONFIG_SYSVIPC + ve_ipc_cleanup( ptr ); +#endif + unregister_ve_tty_driver( ptr ); + + write_lock_irq( &devperms_hash_guard ); + do_clean_devperms( ptr->veid ); + write_unlock_irq( &devperms_hash_guard ); + + write_lock_irq( &ve_list_guard ); + if( ptr->prev ) + ptr->prev->next = ptr->next; + else + ve_list_head = ptr->next; + if( ptr->next ) + ptr->next->prev = ptr->prev; + write_unlock_irq( &ve_list_guard ); + kfree( ptr ); +} + +static void free_tty_driver( struct tty_driver *driver ) +{ + if( !driver ) + return; + + clear_termios( driver ); + + if( driver->other ) + driver->other->other = NULL; + else + kfree( driver->driver_state ); + + kfree( driver->table ); + kfree( driver->termios ); + kfree( driver->termios_locked ); + kfree( driver ); +} + +static struct tty_driver *alloc_tty_driver( struct tty_driver *base, void *state, + struct ve_struct *envid ) +{ + struct tty_driver *driver = kmalloc( sizeof(struct tty_driver), GFP_KERNEL ); + if( !driver ) + return NULL; + + memcpy( driver, base, sizeof(struct tty_driver)); + + driver->refcount = &envid->pty_refcount; + driver->prev = driver->next = NULL; + + driver->table = kmalloc( sizeof(void*)*NR_PTYS, GFP_KERNEL ); + driver->termios = kmalloc( sizeof(void*)*NR_PTYS, GFP_KERNEL ); + driver->termios_locked = kmalloc( sizeof(void*)*NR_PTYS, GFP_KERNEL ); + driver->driver_state = state ? state : + kmalloc( sizeof(struct pty_struct)*NR_PTYS, GFP_KERNEL ); + + if( !driver->table || !driver->termios || !driver->termios_locked || + !driver->driver_state ) + { + free_tty_driver( driver ); + return NULL; + } + + driver->envid = envid; + driver->flags &= ~TTY_DRIVER_INSTALLED; + + memset( driver->table, 0, sizeof(void*)*NR_PTYS ); + memset( driver->termios, 0, sizeof(void*)*NR_PTYS ); + memset( driver->termios_locked, 0, sizeof(void*)*NR_PTYS ); + if( !state ) + { + int i; + memset( driver->driver_state, 0, sizeof(struct pty_struct)*NR_PTYS ); + for (i = 0; i < NR_PTYS; i++) + init_waitqueue_head(&((struct pty_struct*) + driver->driver_state)[i].open_wait); + } + return driver; +} + + +static void register_tty_driver( struct tty_driver *driver ) +{ + struct tty_driver *ptr = driver; + while( ptr->next ) + ptr = ptr->next; + ptr->next = tty_drivers; + tty_drivers->prev = ptr; + tty_drivers = driver; +} + +static int register_ve_tty_driver( struct ve_struct* envid ) +{ +#ifdef CONFIG_UNIX98_PTYS + int i; +#endif + + /* Traditional BSD devices */ + if( !(envid->pty_driver = alloc_tty_driver( &pty_driver, NULL, envid )) ) + return -ENOMEM; + if( !(envid->pty_slave_driver = alloc_tty_driver( &pty_slave_driver, + envid->pty_driver->driver_state, + envid ))) + return -ENOMEM; + + envid->pty_driver->other = envid->pty_slave_driver; + envid->pty_slave_driver->other = envid->pty_driver; + +#ifdef CONFIG_UNIX98_PTYS + for ( i = 0 ; i < UNIX98_NR_MAJORS ; i++ ) + { + if( !(envid->ptm_driver[i] = alloc_tty_driver( &ptm_driver[i], NULL, envid )) ) + return -ENOMEM; + if( !(envid->pts_driver[i] = alloc_tty_driver( &pts_driver[i], + envid->ptm_driver[i]->driver_state, + envid )) ) + return -ENOMEM; + + envid->ptm_driver[i]->other = envid->pts_driver[i]; + envid->pts_driver[i]->other = envid->ptm_driver[i]; + } + + for ( i = 0 ; i < UNIX98_NR_MAJORS ; i++ ) + { + envid->ptm_driver[i]->next = envid->pts_driver[i]; + if( i>0 ) + envid->ptm_driver[i]->prev = envid->pts_driver[i-1]; + envid->pts_driver[i]->prev = envid->ptm_driver[i]; + if( ipts_driver[i]->next = envid->ptm_driver[i+1]; + } +#endif + + /* Register driver manually */ + write_lock_irq( &tty_driver_guard ); + envid->pty_driver->next = envid->pty_slave_driver; + envid->pty_slave_driver->prev = envid->pty_driver; + +#ifdef CONFIG_UNIX98_PTYS + register_tty_driver( envid->ptm_driver[0] ); +#endif + register_tty_driver( envid->pty_driver ); + write_unlock_irq( &tty_driver_guard ); + return 0; +} + +static void unregister_ve_tty_driver( struct ve_struct* envid ) +{ +#ifdef CONFIG_UNIX98_PTYS + int i; +#endif + + if( do_is_super_ve(envid) ) /* VE0 */ + return; + + if( envid->pty_driver && envid->pty_driver->next ) /* Incomplete init ? */ + { + struct tty_driver *ptr = envid->pty_driver; + + write_lock_irq( &tty_driver_guard ); + while( ptr->next && ptr->next->envid==envid ) + ptr = ptr->next; + + if (envid->pty_driver->prev) + envid->pty_driver->prev->next = ptr->next; + else + tty_drivers = ptr->next; + + if (ptr->next) + ptr->next->prev = envid->pty_driver->prev; + write_unlock_irq( &tty_driver_guard ); + } + + free_tty_driver( envid->pty_driver ); + free_tty_driver( envid->pty_slave_driver ); + +#ifdef CONFIG_UNIX98_PTYS + for ( i = 0 ; i < UNIX98_NR_MAJORS ; i++ ) + { + free_tty_driver( envid->ptm_driver[i] ); + free_tty_driver( envid->pts_driver[i] ); + } +#endif +} + + +/* + * Free the termios and termios_locked structures because + * we don't want to get memory leaks when modular tty + * drivers are removed from the kernel. + */ +static void clear_termios( struct tty_driver* driver ) +{ + int i; + struct termios *tp; + if( !driver || !driver->termios ) + return; + + for (i = 0; i < driver->num; i++) { + tp = driver->termios[i]; + if (tp) { + driver->termios[i] = NULL; + kfree(tp); + } + tp = driver->termios_locked[i]; + if (tp) { + driver->termios_locked[i] = NULL; + kfree(tp); + } + } +} + +#ifdef CONFIG_VE_NET +struct ve_struct *get_ve_by_ip( u32 addr ) +{ + struct ve_struct *ptr; + + read_lock( &ve_list_guard ); + for( ptr = ve_list_head; ptr; ptr = ptr->next ) + if( ptr->ip == addr ) + break; + read_unlock( &ve_list_guard ); + return ptr ? ptr : child_reaper->envid; +} +#endif + +#ifdef CONFIG_VE_LINK +/* + * Process one VElink message. + * The major part of the code is taken from rtnetlink processing. + * This idea belongs to A. Kuznetsov + */ +#include +#include +#include +#include + +#define VE_CONF_SIZE(len) (len+sizeof(struct ve_conf_request)) + +pid_t daemon_pid = 0; +struct request_struct +{ + struct list_head list; + ulong id; + int answer; + struct semaphore waiter; +}; +LIST_HEAD(request_list); +spinlock_t request_list_guard = SPIN_LOCK_UNLOCKED; + +asmlinkage int sys_ve_conf_request( char *user_request, int len ) +{ + struct sk_buff *skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + struct nlmsghdr *nlh; + struct ve_conf_request *req; + int err = 0; + struct request_struct *waitq; + + if (!skb) + return -ENOBUFS; + + waitq = kmalloc(sizeof(struct request_struct), GFP_KERNEL); + if( !waitq ) + { + err = -ENOMEM; + goto req_alloc_failed; + } + memset(waitq,0,sizeof(struct request_struct)); + sema_init(&waitq->waiter,0); + + nlh = NLMSG_PUT(skb, daemon_pid, jiffies, VE_CONF_MSG, VE_CONF_SIZE(len)); + nlh->nlmsg_flags = NLM_F_ECHO | NLM_F_REQUEST; + + req = NLMSG_DATA(nlh); + req->veid = current->envid->veid; + req->id = waitq->id = jiffies; + + if( (err=copy_from_user( req->data, user_request, len ))<0 ) + goto nlmsg_failure; + if( (err=netlink_unicast(nlve0, skb, daemon_pid, MSG_DONTWAIT))<0 ) + goto nlmsg_failure; + + spin_lock_irq( &request_list_guard ); + list_add( &waitq->list, &request_list ); + spin_unlock_irq( &request_list_guard ); + + printk( "awaiting request answer\n" ); + down_interruptible( &waitq->waiter ); + + printk( "answer arrived %d\n", waitq->answer ); + spin_lock_irq( &request_list_guard ); + list_del( &waitq->list ); + spin_unlock_irq( &request_list_guard ); + err = waitq->answer; + kfree(waitq); + + return err; + + nlmsg_failure: + req_alloc_failed: + printk( "error in sender\n" ); + kfree(waitq); + kfree_skb(skb); + return err; +} + + +extern __inline__ int +velink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp) +{ + int type; + struct ve_conf_request *req; + struct list_head *it; + + /* Only requests from VE0 are accepted */ + if( !is_super_ve(skb) ) + return 0; + + /* Only requests are handled by kernel now */ + if (!(nlh->nlmsg_flags&NLM_F_REQUEST)) + return 0; + + type = nlh->nlmsg_type; + + /* A control message: ignore them */ + if (type < RTM_BASE) + return 0; + + /* Unknown message: reply with EINVAL */ + if (type > RTM_MAX) + goto err_inval; + + /* All the messages must have at least 1 byte length */ + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(struct ve_conf_request))) + return 0; + + /* Register user space daemon */ + if (type==VE_REGISTER) + { + daemon_pid = nlh->nlmsg_pid; + return 0; + } + + if( type != VE_CONF_MSG ) + goto out; + + req = NLMSG_DATA(nlh); + printk( "answer arrived to receiver %d\n", req->answer ); + list_for_each(it,&request_list) { + struct request_struct *ptr = list_entry(it, struct request_struct, list); + + if( ptr->id==req->id ) + { + printk( "questioner found\n" ); + ptr->answer = req->answer; + up( &ptr->waiter ); + goto out; + } + } + +out: + printk( "all ok. quiting receiver\n" ); + return 0; + +err_inval: + *errp = -EINVAL; + return -1; +} + +extern __inline__ int velink_rcv_skb(struct sk_buff *skb) +{ + int err; + struct nlmsghdr * nlh; + + while (skb->len >= NLMSG_SPACE(0)) { + u32 rlen; + + nlh = (struct nlmsghdr *)skb->data; + if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len) + return 0; + rlen = NLMSG_ALIGN(nlh->nlmsg_len); + if (rlen > skb->len) + rlen = skb->len; + if (velink_rcv_msg(skb, nlh, &err)) { + /* Not error, but we must interrupt processing here: + * Note, that in this case we do not pull message + * from skb, it will be processed later. + */ + if (err == 0) + return -1; + netlink_ack(skb, nlh, err); + } else if (nlh->nlmsg_flags&NLM_F_ACK) + netlink_ack(skb, nlh, 0); + skb_pull(skb, rlen); + } + + return 0; +} + +/* + * venetlink input queue processing routine: + * - try to acquire shared lock. If it is failed, defer processing. + * - feed skbs to velink_rcv_skb, until it refuse a message, + * that will occur, when a dump started and/or acquisition of + * exclusive lock failed. + */ + +static void velink_rcv(struct sock *sk, int len) +{ + do { + struct sk_buff *skb; + + if (rtnl_shlock_nowait()) + return; + + while ((skb = skb_dequeue(&sk->receive_queue)) != NULL) { + if (velink_rcv_skb(skb)) { + if (skb->len) + skb_queue_head(&sk->receive_queue, skb); + else + kfree_skb(skb); + break; + } + kfree_skb(skb); + } + + up(&rtnl_sem); + } while (nlve0 && nlve0->receive_queue.qlen); +} +#else +asmlinkage int sys_ve_conf_request( char *user_request, int len ) +{ + return -ENOSYS; +} +#endif + + +/* Here we creates information for 'default' VE, i.e. no VE + * No locks required as this is a part of init and invoked + * at the very beginning + */ +void __init init_ve_system(void) +{ + struct task_struct *init_entry = current; + int i; + struct ve_struct *ptr = kmalloc( sizeof(struct ve_struct), GFP_KERNEL ); + + printk( "Initializing VE sub-system\n" ); + memset( ptr, 0, sizeof(struct ve_struct) ); + ptr->init_entry = init_entry; + cap_set_full(ptr->cap_default); + + hash_devperms( original_perms ); + hash_devperms( original_perms+1 ); + + ve_list_head = ptr; + sema_init( &ptr->init_exit_guard, 1 ); + atomic_set( &ptr->pcounter, 1+NR_CPUS ); + init_entry->envid = ptr; + memcpy( &ptr->_system_utsname, &_system_utsname, sizeof system_utsname ); + + /* Don't forget about idle tasks */ + for( i=0; ienvid = ptr; + } + + /* Preparing default capability list */ + cap_set_full(env_cap_default); + cap_lower(env_cap_default, CAP_SYS_MODULE); + cap_lower(env_cap_default, CAP_SYS_RAWIO); + cap_lower(env_cap_default, CAP_SYS_NICE); + cap_lower(env_cap_default, CAP_SYS_TIME); + cap_lower(env_cap_default, CAP_NET_ADMIN); + + read_lock(&init_entry->fs->lock); + ptr->fs_rootmnt = init_entry->fs->rootmnt; + ptr->fs_root = init_entry->fs->root; + read_unlock(&init_entry->fs->lock); +} + +void __init postinit_ve_system(void) +{ +#ifdef CONFIG_VE_LINK + nlve0 = netlink_kernel_create(VELINK, velink_rcv); +#endif + create_proc_read_entry("veinfo", 0, NULL, ve_read_proc, NULL); +} Index: linux/mm/Makefile diff -u linux/mm/Makefile:1.1.1.3 ASPcomplete/linux/mm/Makefile:1.2 --- linux/mm/Makefile:1.1.1.3 Mon Aug 14 19:41:35 2000 +++ linux/mm/Makefile Tue Aug 1 15:30:45 2000 @@ -16,4 +16,8 @@ O_OBJS += highmem.o endif +ifeq ($(CONFIG_USER_RESOURCE),y) +O_OBJS += kubd.o +endif + include $(TOPDIR)/Rules.make Index: linux/mm/filemap.c diff -u linux/mm/filemap.c:1.1.1.7 ASPcomplete/linux/mm/filemap.c:1.6 --- linux/mm/filemap.c:1.1.1.7 Mon Aug 14 19:41:35 2000 +++ linux/mm/filemap.c Sat Aug 12 22:02:29 2000 @@ -78,15 +78,6 @@ atomic_dec(&page_cache_size); } -static inline int sync_page(struct page *page) -{ - struct address_space *mapping = page->mapping; - - if (mapping && mapping->a_ops && mapping->a_ops->sync_page) - return mapping->a_ops->sync_page(page); - return 0; -} - /* * Remove a page from the page cache and free it. Caller has to make * sure the page is locked and that nobody else uses it - or that usage Index: linux/mm/kubd.c diff -uN /dev/null linux/mm/kubd.c:1.1 --- /dev/null Mon Aug 14 20:39:34 2000 +++ linux/mm/kubd.c Tue Aug 1 15:30:45 2000 @@ -0,0 +1,235 @@ +/* + * linux/mm/kubd.c + * + * Copyright (C) 2000 Andrey Moruga + * + * TODO: + * - consider what should be done (if any) for bad pgd/pmd entries + * + * Changes: + * 2000/07/28 Andrey V. Savochkin + * - some cosmetic changes + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Page weights (it's a scale for our fixed-point arithmetic). + * Restrictions: + * - UB_PAGE_WEIGHT should be considerably greater than maximum number of + * different mm in the system (is defines the precision of calculations) + * - MAX_ULONGLONG/UB_PAGE_WEIGHT should be greater than the amount of + * physical memory (to avoid overflows) + */ +#define UB_PAGE_WEIGHT_SHIFT 24 +#define UB_PAGE_WEIGHT (1 << UB_PAGE_WEIGHT_SHIFT) + +/* + * Various debugging stuff + * 1 main cycle + * 2 pte + */ +#define UB_DEBUG 2 + +static inline void kcharge_pte (pte_t *pte, struct user_beancounter *ub) +{ + struct page *page; + int nref; + + if (!pte_present(*pte)) + return; + + page = pte_page(*pte); + if ((page-mem_map >= max_mapnr) || PageReserved(page)) + return; + + nref = page_count(page) - !!page->buffers + - (PageSwapCache(page) ? 1 : !!page->mapping); + /* now charge the beancounter with page_weight / nreferences */ + if (nref <= 0) { +#if UB_DEBUG & 2 + printk("Error: reference counter is %d\n", nref); + printk("Page is %p, counter %d\n", page, page_count(page)); + printk("Buffers is %p, Mapping %p\n", page->buffers, + page->mapping); + printk("PageSwapCache(page) is %d\n", PageSwapCache(page)); + printk("PageReserved(page) is %d\n", PageReserved(page)); + printk("page-mem_map is %d, max_mapnr %ld\n", page-mem_map, + max_mapnr); +#endif + } else + ub->ub_held_pages += UB_PAGE_WEIGHT / nref; +} + +static inline void kcharge_pmd (pmd_t *pmd, unsigned long address, + unsigned long end, struct user_beancounter *ub) +{ + pte_t *pte; + unsigned long pmd_end; + + if (pmd_none(*pmd)) + return; + if (pmd_bad(*pmd)) { + pmd_ERROR(*pmd); + pmd_clear(pmd); + return; + } + + pte = pte_offset(pmd, address); + + pmd_end = (address + PMD_SIZE) & PMD_MASK; + if (end > pmd_end) + end = pmd_end; + + do { + kcharge_pte(pte, ub); + address += PAGE_SIZE; + pte++; + } while (address && (address < end)); +} + + +static inline void kcharge_pgd (pgd_t *dir, unsigned long address, + unsigned long end, struct user_beancounter *ub) +{ + pmd_t *pmd; + unsigned long pgd_end; + + if (pgd_none(*dir)) + return; + if (pgd_bad(*dir)) { + pgd_ERROR(*dir); + pgd_clear(dir); + return; + } + pmd = pmd_offset(dir, address); + + pgd_end = (address + PGDIR_SIZE) & PGDIR_MASK; + if (pgd_end && (end > pgd_end)) + end = pgd_end; + + do { + kcharge_pmd(pmd, address, end, ub); + + address = (address + PMD_SIZE) & PMD_MASK; + pmd++; + } while (address && (address < end)); +} + +static inline void kcharge_vma (struct vm_area_struct *vma, + struct user_beancounter *ub) +{ + unsigned long address = vma->vm_start; + unsigned long end = vma->vm_end; + pgd_t *pgdir; + + pgdir = pgd_offset(vma->vm_mm, address); + do { + /* go through pgd */ + kcharge_pgd(pgdir, address, end, ub); + address = (address + PGDIR_SIZE) & PGDIR_MASK; + pgdir++; + } while (address && (address < end)); +} + +static inline void kcharge_mm (struct mm_struct *mm, + struct user_beancounter *ub) +{ + struct vm_area_struct *vma; + + vmlist_access_lock(mm); + for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) + kcharge_vma (vma, ub); + vmlist_access_unlock(mm); + +} + +/* The background daemon, started as a kernel thread and repeatedly calculating + * and charging the current physical memory consumption */ +int kubd(void *unused) +{ + struct task_struct *tsk = current; + struct ub_hash_slot *slot; + struct task_struct *p; + struct user_beancounter *ub; + unsigned long flags; + long i; + + tsk->session = 1; + tsk->pgrp = 1; + strcpy(tsk->comm, "kubd"); + sigfillset(&tsk->blocked); + +repeat: + /* first of all, clear the current ub_held_pages values */ + slot = ub_hash; + for (i = 0; i < UB_HASH_SIZE; i++) { + lock_beancounters(slot, flags); + for (ub = slot->ubh_beans; ub != NULL; ub = ub->ub_next) + ub->ub_held_pages = 0; + unlock_beancounters(slot, flags); + slot++; + } + + /* now, go through all the processes -> mm -> vma -> pgd -> pmd + * -> page to find out how many processes reference the page */ + /* we have to keep the tasklist locked */ + read_lock(&tasklist_lock); + for (p = init_task.next_task; p != &init_task; + p = p->next_task) { + + struct mm_struct *mm = p->mm; + struct user_beancounter *ub; + + if (!p->swappable || !mm) + continue; + ub = mm->beancounter; + if (!ub) + continue; + kcharge_mm(mm, ub); + } + read_unlock(&tasklist_lock); + + /* now the calculations are done; + * set ub_held[UB_RESPAGES] values */ + slot = ub_hash; + for (i = 0; i < UB_HASH_SIZE; i++) { + lock_beancounters(slot, flags); + for (ub = slot->ubh_beans; ub != NULL; ub = ub->ub_next) { + ub->ub_held[UB_RESPAGES] = + ub->ub_held_pages >> UB_PAGE_WEIGHT_SHIFT; + if (ub->ub_maxheld[UB_RESPAGES] < + ub->ub_held[UB_RESPAGES]) + ub->ub_maxheld[UB_RESPAGES] = + ub->ub_held[UB_RESPAGES]; + } + unlock_beancounters(slot, flags); + slot++; + } + /* work's done, time to get sleep a bit */ + tsk->state = TASK_INTERRUPTIBLE; +#if UB_DEBUG & 1 + printk("Gonna sleep now\n"); +#endif + schedule_timeout(HZ); +#if UB_DEBUG & 1 + printk("Just woke up\n"); +#endif + goto repeat; +} + +static int __init kubd_init(void) +{ + printk("Starting kubd\n"); + kernel_thread(kubd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND); + return 0; +} + +module_init(kubd_init) Index: linux/mm/memory.c diff -u linux/mm/memory.c:1.1.1.7 ASPcomplete/linux/mm/memory.c:1.5 --- linux/mm/memory.c:1.1.1.7 Mon Aug 14 19:41:35 2000 +++ linux/mm/memory.c Sat Aug 12 19:36:41 2000 @@ -73,7 +73,7 @@ * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. */ -static inline void free_one_pmd(pmd_t * dir) +static inline void free_one_pmd(struct mm_struct *mm, pmd_t * dir) { pte_t * pte; @@ -86,10 +86,10 @@ } pte = pte_offset(dir, 0); pmd_clear(dir); - pte_free(pte); + pte_free(mm->beancounter, pte); } -static inline void free_one_pgd(pgd_t * dir) +static inline void free_one_pgd(struct mm_struct *mm, pgd_t * dir) { int j; pmd_t * pmd; @@ -104,8 +104,8 @@ pmd = pmd_offset(dir, 0); pgd_clear(dir); for (j = 0; j < PTRS_PER_PMD ; j++) - free_one_pmd(pmd+j); - pmd_free(pmd); + free_one_pmd(mm, pmd+j); + pmd_free(mm->beancounter, pmd); } /* Low and high watermarks for page table cache. @@ -130,7 +130,7 @@ page_dir += first; do { - free_one_pgd(page_dir); + free_one_pgd(mm, page_dir); page_dir++; } while (--nr); @@ -178,7 +178,7 @@ continue; } if (pgd_none(*dst_pgd)) { - if (!pmd_alloc(dst_pgd, 0)) + if (!pmd_alloc(dst->beancounter, dst_pgd, 0)) goto nomem; } @@ -201,7 +201,7 @@ goto cont_copy_pmd_range; } if (pmd_none(*dst_pmd)) { - if (!pte_alloc(dst_pmd, 0)) + if (!pte_alloc(dst->beancounter, dst_pmd, 0)) goto nomem; } @@ -650,7 +650,8 @@ } while (address && (address < end)); } -static inline int zeromap_pmd_range(pmd_t * pmd, unsigned long address, +static inline int zeromap_pmd_range(struct mm_struct *mm, + pmd_t * pmd, unsigned long address, unsigned long size, pgprot_t prot) { unsigned long end; @@ -660,7 +661,7 @@ if (end > PGDIR_SIZE) end = PGDIR_SIZE; do { - pte_t * pte = pte_alloc(pmd, address); + pte_t * pte = pte_alloc(mm->beancounter, pmd, address); if (!pte) return -ENOMEM; zeromap_pte_range(pte, address, end - address, prot); @@ -682,11 +683,11 @@ if (address >= end) BUG(); do { - pmd_t *pmd = pmd_alloc(dir, address); + pmd_t *pmd = pmd_alloc(current->mm->beancounter, dir, address); error = -ENOMEM; if (!pmd) break; - error = zeromap_pmd_range(pmd, address, end - address, prot); + error = zeromap_pmd_range(current->mm, pmd, address, end - address, prot); if (error) break; address = (address + PGDIR_SIZE) & PGDIR_MASK; @@ -725,7 +726,8 @@ } while (address && (address < end)); } -static inline int remap_pmd_range(pmd_t * pmd, unsigned long address, unsigned long size, +static inline int remap_pmd_range(struct mm_struct *mm, + pmd_t * pmd, unsigned long address, unsigned long size, unsigned long phys_addr, pgprot_t prot) { unsigned long end; @@ -736,7 +738,7 @@ end = PGDIR_SIZE; phys_addr -= address; do { - pte_t * pte = pte_alloc(pmd, address); + pte_t * pte = pte_alloc(mm->beancounter, pmd, address); if (!pte) return -ENOMEM; remap_pte_range(pte, address, end - address, address + phys_addr, prot); @@ -759,11 +761,11 @@ if (from >= end) BUG(); do { - pmd_t *pmd = pmd_alloc(dir, from); + pmd_t *pmd = pmd_alloc(current->mm->beancounter, dir, from); error = -ENOMEM; if (!pmd) break; - error = remap_pmd_range(pmd, from, end - from, phys_addr + from, prot); + error = remap_pmd_range(current->mm, pmd, from, end - from, phys_addr + from, prot); if (error) break; from = (from + PGDIR_SIZE) & PGDIR_MASK; @@ -1205,6 +1207,7 @@ * Ok, the entry was present, we need to get the page table * lock to synchronize with kswapd, and verify that the entry * didn't change from under us.. + * RED PEN: ok, what if it really changed between these two checks? SAW */ spin_lock(&mm->page_table_lock); if (pte_val(entry) == pte_val(*pte)) { @@ -1232,10 +1235,10 @@ pmd_t *pmd; pgd = pgd_offset(mm, address); - pmd = pmd_alloc(pgd, address); + pmd = pmd_alloc(mm->beancounter, pgd, address); if (pmd) { - pte_t * pte = pte_alloc(pmd, address); + pte_t * pte = pte_alloc(mm->beancounter, pmd, address); if (pte) ret = handle_pte_fault(mm, vma, address, write_access, pte); } Index: linux/mm/mlock.c diff -u linux/mm/mlock.c:1.1.1.3 ASPcomplete/linux/mm/mlock.c:1.2 --- linux/mm/mlock.c:1.1.1.3 Mon Aug 14 19:41:35 2000 +++ linux/mm/mlock.c Tue Jul 25 11:52:10 2000 @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -116,6 +117,13 @@ if (newflags == vma->vm_flags) return 0; + if (newflags & VM_LOCKED) { + retval = charge_locked_mem(vma->vm_mm->beancounter, + end - start); + if (retval) + return retval; + } + if (start == vma->vm_start) { if (end == vma->vm_end) retval = mlock_fixup_all(vma, newflags); @@ -133,8 +141,17 @@ if (newflags & VM_LOCKED) { pages = -pages; make_pages_present(start, end); + } else { + /* successfully unlocked some memory */ + uncharge_locked_mem(vma->vm_mm->beancounter, + end - start); } vma->vm_mm->locked_vm -= pages; + } else { + /* memory was charged and fixup failed; uncharge the mem */ + if (newflags & VM_LOCKED) + uncharge_locked_mem(vma->vm_mm->beancounter, + end - start); } return retval; } Index: linux/mm/mmap.c diff -u linux/mm/mmap.c:1.1.1.6 ASPcomplete/linux/mm/mmap.c:1.4 --- linux/mm/mmap.c:1.1.1.6 Mon Aug 14 19:41:35 2000 +++ linux/mm/mmap.c Mon Aug 7 19:08:00 2000 @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -55,13 +56,18 @@ * (buffers+cache), use the minimum values. Allow an extra 2% * of num_physpages for safety margin. */ - long free; /* Sometimes we want to use more memory than we have. */ if (sysctl_overcommit_memory) return 1; + /* Memory gaurantees for "good" processes */ + if (current->mm->beancounter && + (current->mm->beancounter->ub_held[UB_TOTVMPAGES] + pages + <= current->mm->beancounter->ub_barrier[UB_SPCGUARPAGES])) + return 1; + free = atomic_read(&buffermem_pages); free += atomic_read(&page_cache_size); free += nr_free_pages(); @@ -272,7 +278,7 @@ } else { vma->vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_SHARED) - vma->vm_flags |= VM_SHARED | VM_MAYSHARE; + vma->vm_flags |= VM_SHARED | VM_MAYSHARE | VM_ANON; } vma->vm_page_prot = protection_map[vma->vm_flags & 0x0f]; vma->vm_ops = NULL; @@ -296,22 +302,25 @@ !vm_enough_memory(len >> PAGE_SHIFT)) goto free_vma; + if (charge_memory(vma->vm_mm->beancounter, len, vma->vm_flags, 1)) + goto free_vma; + if (file) { if (vma->vm_flags & VM_DENYWRITE) { error = deny_write_access(file); if (error) - goto free_vma; + goto uncharge_and_free_vma; correct_wcount = 1; } vma->vm_file = file; get_file(file); error = file->f_op->mmap(file, vma); if (error) - goto unmap_and_free_vma; + goto unmap_uncharge_and_free_vma; } else if (flags & MAP_SHARED) { error = map_zero_setup(vma); if (error) - goto free_vma; + goto uncharge_and_free_vma; } /* @@ -334,7 +343,7 @@ } return addr; -unmap_and_free_vma: +unmap_uncharge_and_free_vma: if (correct_wcount) atomic_inc(&file->f_dentry->d_inode->i_writecount); vma->vm_file = NULL; @@ -343,6 +352,8 @@ flush_cache_range(mm, vma->vm_start, vma->vm_end); zap_page_range(mm, vma->vm_start, vma->vm_end - vma->vm_start); flush_tlb_range(mm, vma->vm_start, vma->vm_end); +uncharge_and_free_vma: + uncharge_memory(vma->vm_mm->beancounter, len, vma->vm_flags); free_vma: kmem_cache_free(vm_area_cachep, vma); return error; @@ -530,6 +541,7 @@ area->vm_ops->close(area); if (area->vm_file) fput(area->vm_file); + uncharge_memory(area->vm_mm->beancounter, len, area->vm_flags); kmem_cache_free(vm_area_cachep, area); return extra; } @@ -569,6 +581,8 @@ insert_vm_struct(mm, area); vmlist_modify_unlock(mm); + /* uncharge memory at the last moment */ + uncharge_memory(area->vm_mm->beancounter, len, area->vm_flags); return extra; } @@ -787,12 +801,17 @@ if (!vm_enough_memory(len >> PAGE_SHIFT)) return -ENOMEM; + if (charge_memory(mm->beancounter, len, + vm_flags(PROT_READ|PROT_WRITE|PROT_EXEC, + MAP_FIXED|MAP_PRIVATE) | mm->def_flags, 1)) + return -ENOMEM; + /* * create a vma struct for an anonymous mapping */ vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (!vma) - return -ENOMEM; + goto fail_uncharge; vma->vm_mm = mm; vma->vm_start = addr; @@ -825,6 +844,12 @@ make_pages_present(addr, addr + len); } return addr; + +fail_uncharge: + uncharge_memory(mm->beancounter, len, + vm_flags(PROT_READ|PROT_WRITE|PROT_EXEC, + MAP_FIXED|MAP_PRIVATE) | mm->def_flags); + return -ENOMEM; } /* Build the AVL tree corresponding to the VMA list. */ @@ -868,6 +893,7 @@ zap_page_range(mm, start, size); if (mpnt->vm_file) fput(mpnt->vm_file); + uncharge_memory(mm->beancounter, size, mpnt->vm_flags); kmem_cache_free(vm_area_cachep, mpnt); mpnt = next; } Index: linux/mm/mremap.c diff -u linux/mm/mremap.c:1.1.1.4 ASPcomplete/linux/mm/mremap.c:1.2 --- linux/mm/mremap.c:1.1.1.4 Mon Aug 14 19:41:35 2000 +++ linux/mm/mremap.c Tue Jul 25 11:52:10 2000 @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -51,9 +52,9 @@ pmd_t * pmd; pte_t * pte = NULL; - pmd = pmd_alloc(pgd_offset(mm, addr), addr); + pmd = pmd_alloc(mm->beancounter, pgd_offset(mm, addr), addr); if (pmd) - pte = pte_alloc(pmd, addr); + pte = pte_alloc(mm->beancounter, pmd, addr); return pte; } @@ -241,6 +242,9 @@ !(flags & MAP_NORESERVE) && !vm_enough_memory((new_len - old_len) >> PAGE_SHIFT)) goto out; + ret = charge_memory(vma->vm_mm->beancounter, new_len, vma->vm_flags, 1); + if (ret) + goto out; /* old_len exactly to the end of the area.. * And we're not relocating the area. @@ -264,6 +268,8 @@ addr + new_len); } ret = addr; + uncharge_memory(vma->vm_mm->beancounter, old_len, + vma->vm_flags); goto out; } } @@ -277,10 +283,20 @@ if (!(flags & MREMAP_FIXED)) { new_addr = get_unmapped_area(0, new_len); if (!new_addr) - goto out; + goto out_uncharge; } ret = move_vma(vma, addr, old_len, new_len, new_addr); } + + /* + * On fail uncharge the initially charged size. + * On success the old size is uncharged in do_unmap called from + * move_vma. + */ +out_uncharge: + if (ret == -ENOMEM) + uncharge_memory(vma->vm_mm->beancounter, new_len, + vma->vm_flags); out: return ret; } Index: linux/mm/swapfile.c diff -u linux/mm/swapfile.c:1.1.1.6 ASPcomplete/linux/mm/swapfile.c:1.6 --- linux/mm/swapfile.c:1.1.1.6 Mon Aug 14 19:41:35 2000 +++ linux/mm/swapfile.c Sat Aug 12 19:36:41 2000 @@ -371,7 +371,7 @@ return -ENOMEM; } read_lock(&tasklist_lock); - for_each_task(p) + for_each_task_all(p) unuse_process(p->mm, entry, page); read_unlock(&tasklist_lock); shm_unuse(entry, page); Index: linux/mm/vmscan.c diff -u linux/mm/vmscan.c:1.1.1.8 ASPcomplete/linux/mm/vmscan.c:1.5 --- linux/mm/vmscan.c:1.1.1.8 Mon Aug 14 19:41:35 2000 +++ linux/mm/vmscan.c Sat Aug 12 19:36:41 2000 @@ -11,6 +11,8 @@ * Zone aware kswapd started 02/00, Kanoj Sarcar (kanoj@sgi.com). */ +#define UB_VM_DEBUG 0 + #include #include #include @@ -51,6 +53,9 @@ if (mm->swap_cnt) mm->swap_cnt--; + if (mm->beancounter && mm->beancounter->ub_swp_pri) + mm->beancounter->ub_swp_pri--; + /* Don't look at this pte if it's been accessed recently. */ if (pte_young(pte)) { /* @@ -337,6 +342,17 @@ return 0; } +static inline void update_ub_swp_pri(struct mm_struct* mm) +{ + if (mm->beancounter) { + mm->beancounter->ub_swp_pri = + mm->beancounter->ub_held[UB_RESPAGES] + - mm->beancounter->ub_barrier[UB_RESPAGES]; + if (mm->beancounter->ub_swp_pri < 0) + mm->beancounter->ub_swp_pri = 0; + } +} + /* * Select the task with maximal swap_cnt and try to swap out a page. * N.B. This function returns only 0 or 1. Return values != 1 from @@ -345,6 +361,7 @@ static int swap_out(unsigned int priority, int gfp_mask) { struct task_struct * p; + struct task_struct * pb; int counter; int __ret = 0; @@ -369,6 +386,8 @@ for (; counter >= 0; counter--) { unsigned long max_cnt = 0; + unsigned long max_overdraft = 0; + signed long ub_swp_pri = 0; struct mm_struct *best = NULL; int pid = 0; int assign = 0; @@ -381,25 +400,52 @@ continue; if (mm->rss <= 0) continue; - /* Refresh swap_cnt? */ - if (assign == 1) + /* Refresh swap_cnt and ub_swp_pri? */ + if (assign == 1) { + /* updates of beancounter->ub_swp_pri are only + * here with the kernel lock - we do not need + * the ub_lock yet */ + update_ub_swp_pri(mm); mm->swap_cnt = mm->rss; - if (mm->swap_cnt > max_cnt) { + } + if (mm->beancounter) { + ub_swp_pri = mm->beancounter->ub_swp_pri; + if ( ub_swp_pri <= 0) + continue; + } + else + ub_swp_pri = 0; + if ((ub_swp_pri > max_overdraft && mm->swap_cnt > 0) || + (ub_swp_pri == max_overdraft && + mm->swap_cnt > max_cnt)) { + max_overdraft = ub_swp_pri; max_cnt = mm->swap_cnt; best = mm; pid = p->pid; + pb = p; } } read_unlock(&tasklist_lock); if (!best) { if (!assign) { assign = 1; +#if UB_VM_DEBUG + printk(KERN_DEBUG "Reassign happened...\n"); +#endif goto select; } goto out; } else { int ret; +#if UB_VM_DEBUG + printk("best is pid %d swap_cnt %d cmd %.20s ub %c\n", + pid, best->swap_cnt, pb->comm, + best->beancounter?'+':'-'); + if (best->beancounter) + printk("ub_swp_pri is %d\n", + best->beancounter->ub_swp_pri); +#endif atomic_inc(&best->mm_count); ret = swap_out_mm(best, gfp_mask); mmdrop(best); Index: linux/net/socket.c diff -u linux/net/socket.c:1.1.1.8 ASPcomplete/linux/net/socket.c:1.4 --- linux/net/socket.c:1.1.1.8 Mon Aug 14 19:41:36 2000 +++ linux/net/socket.c Sat Aug 12 19:36:41 2000 @@ -70,6 +70,7 @@ #include #include #include +#include #include #if defined(CONFIG_KMOD) && defined(CONFIG_NET) @@ -460,6 +461,10 @@ sock->file = NULL; sockets_in_use[smp_processor_id()].counter++; + + sock->beancounter = current->login_bc; + get_beancounter(sock->beancounter); + return sock; } @@ -497,6 +502,9 @@ return; } sock->file=NULL; + + put_beancounter(sock->beancounter); + sock->beancounter = NULL; } int sock_sendmsg(struct socket *sock, struct msghdr *msg, int size) Index: linux/net/atm/clip.c diff -u linux/net/atm/clip.c:1.1.1.4 ASPcomplete/linux/net/atm/clip.c:1.3 --- linux/net/atm/clip.c:1.1.1.4 Mon Aug 14 19:41:36 2000 +++ linux/net/atm/clip.c Mon Jul 17 15:19:43 2000 @@ -525,7 +525,7 @@ unlink_clip_vcc(clip_vcc); return 0; } - error = ip_route_output(&rt,ip,0,1,0); + error = ip_route_output(&rt,ip,0,1,0,NULL); if (error) return error; neigh = __neigh_lookup(&clip_tbl,&ip,rt->u.dst.dev,1); ip_rt_put(rt); Index: linux/net/core/dev.c diff -u linux/net/core/dev.c:1.1.1.4 ASPcomplete/linux/net/core/dev.c:1.10 --- linux/net/core/dev.c:1.1.1.4 Mon Aug 14 19:41:37 2000 +++ linux/net/core/dev.c Sat Aug 12 22:02:29 2000 @@ -1253,6 +1253,24 @@ } #endif skb->h.raw = skb->nh.raw = skb->data; + +#ifdef CONFIG_VE_NET +/* + * FIXME: + * 1) get_ve_by_ip should be much more faster them linear search + * 2) In the case of skb->dev==&loopback && !LOOPBACK(skb->nh.iph->daddr) + * we should not invoke get_ve_by_ip, but set VE from destination + * 3) is_super_ve(skb->envid) should be taken more correct, i.e. + * 4) alloc_skb should not markup skb with envid. + */ + if( (is_super_ve(skb) && skb->protocol==__constant_htons(ETH_P_IP)) || + ( skb->rx_dev==&loopback_dev && !LOOPBACK(skb->nh.iph->daddr)) ) + { + skb->envid = get_ve_by_ip(skb->nh.iph->daddr); +/* printk( "netif_rx: %x, %d\n", skb->nh.iph->daddr, skb->envid ? skb->envid->veid : -1 ); */ + } +#endif + { struct packet_type *ptype, *pt_prev; unsigned short type = skb->protocol; Index: linux/net/core/skbuff.c diff -u linux/net/core/skbuff.c:1.1.1.3 ASPcomplete/linux/net/core/skbuff.c:1.6 --- linux/net/core/skbuff.c:1.1.1.3 Mon Aug 14 19:41:37 2000 +++ linux/net/core/skbuff.c Fri Aug 4 14:08:14 2000 @@ -205,6 +205,10 @@ skb->is_clone = 0; skb->cloned = 0; +#ifdef CONFIG_VE_NET + skb->envid = in_interrupt_correct() ? child_reaper->envid : current->envid; +#endif + atomic_set(&skb->users, 1); atomic_set(skb_datarefp(skb), 1); return skb; @@ -375,6 +379,9 @@ #endif #ifdef CONFIG_NET_SCHED new->tc_index = old->tc_index; +#endif +#ifdef CONFIG_VE_NET + new->envid = old->envid; #endif } Index: linux/net/core/sock.c diff -u linux/net/core/sock.c:1.1.1.5 ASPcomplete/linux/net/core/sock.c:1.9 --- linux/net/core/sock.c:1.1.1.5 Mon Aug 14 19:41:37 2000 +++ linux/net/core/sock.c Sat Aug 12 22:02:29 2000 @@ -169,7 +169,7 @@ #ifdef CONFIG_FILTER struct sk_filter *filter; #endif - int val; + int val, t_val; int valbool; int err; struct linger ling; @@ -231,7 +231,14 @@ if (val > sysctl_wmem_max) val = sysctl_wmem_max; - sk->sndbuf = max(val*2,SOCK_MIN_SNDBUF); + t_val = max(val*2,SOCK_MIN_SNDBUF); + if (t_val > sk->sndbuf_charged && sk->beancounter) { + if ((ret = charge_kmem(sk->beancounter, + t_val - sk->sndbuf_charged, 1))) + break; + sk->sndbuf_charged = t_val; + } + sk->sndbuf = t_val; /* * Wake up sending tasks if we @@ -250,7 +257,14 @@ val = sysctl_rmem_max; /* FIXME: is this lower bound the right one? */ - sk->rcvbuf = max(val*2,SOCK_MIN_RCVBUF); + t_val = max(val*2,SOCK_MIN_RCVBUF); + if (t_val > sk->rcvbuf_charged && sk->beancounter) { + if ((ret = charge_kmem(sk->beancounter, + t_val - sk->rcvbuf_charged, 1))) + break; + sk->rcvbuf_charged = t_val; + } + sk->rcvbuf = t_val; break; case SO_KEEPALIVE: @@ -575,6 +589,9 @@ memset(sk, 0, sizeof(struct sock)); sk->family = family; sock_lock_init(sk); +#ifdef CONFIG_VE_NET + sk->envid = in_interrupt_correct() ? child_reaper->envid : current->envid; +#endif } return sk; @@ -582,6 +599,8 @@ void sk_free(struct sock *sk) { + struct user_beancounter *bc; + int charged; #ifdef CONFIG_FILTER struct sk_filter *filter; #endif @@ -600,7 +619,13 @@ if (atomic_read(&sk->omem_alloc)) printk(KERN_DEBUG "sk_free: optmem leakage (%d bytes) detected.\n", atomic_read(&sk->omem_alloc)); + bc = sk->beancounter; + charged = sizeof(struct sock) + sk->rcvbuf_charged + sk->sndbuf_charged; + kmem_cache_free(sk_cachep, sk); + + uncharge_sock(bc, charged); + put_beancounter(bc); } void __init sk_init(void) Index: linux/net/ipv4/af_inet.c diff -u linux/net/ipv4/af_inet.c:1.1.1.3 ASPcomplete/linux/net/ipv4/af_inet.c:1.2 --- linux/net/ipv4/af_inet.c:1.1.1.3 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/af_inet.c Tue Jul 25 11:52:10 2000 @@ -81,6 +81,7 @@ #include #include #include +#include #include #include @@ -309,7 +310,13 @@ { struct sock *sk; struct proto *prot; + int charge; + charge = sizeof(struct sock) + + sysctl_wmem_default + sysctl_rmem_default; + if (charge_sock(sock->beancounter, charge)) + goto ret_oom; + sock->state = SS_UNCONNECTED; sk = sk_alloc(PF_INET, GFP_KERNEL, 1); if (sk == NULL) @@ -355,6 +362,10 @@ sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_WANT; sock_init_data(sock,sk); + sk->beancounter = sock->beancounter; + get_beancounter(sk->beancounter); + sk->rcvbuf_charged = sysctl_rmem_default; + sk->sndbuf_charged = sysctl_wmem_default; sk->destruct = inet_sock_destruct; @@ -400,17 +411,22 @@ free_and_badtype: sk_free(sk); + uncharge_sock(sock->beancounter, charge); return -ESOCKTNOSUPPORT; free_and_badperm: sk_free(sk); + uncharge_sock(sock->beancounter, charge); return -EPERM; free_and_noproto: sk_free(sk); + uncharge_sock(sock->beancounter, charge); return -EPROTONOSUPPORT; do_oom: + uncharge_sock(sock->beancounter, charge); +ret_oom: return -ENOBUFS; } Index: linux/net/ipv4/arp.c diff -u linux/net/ipv4/arp.c:1.1.1.5 ASPcomplete/linux/net/ipv4/arp.c:1.4 --- linux/net/ipv4/arp.c:1.1.1.5 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/arp.c Mon Aug 7 19:08:00 2000 @@ -333,7 +333,7 @@ if (skb && inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL) saddr = skb->nh.iph->saddr; else - saddr = inet_select_addr(dev, target, RT_SCOPE_LINK); + saddr = inet_select_addr(dev, target, RT_SCOPE_LINK, NULL); if ((probes -= neigh->parms->ucast_probes) < 0) { if (!(neigh->nud_state&NUD_VALID)) @@ -838,7 +838,7 @@ r->arp_flags |= ATF_COM; if (dev == NULL) { struct rtable * rt; - if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0)) != 0) + if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0, NULL)) != 0) return err; dev = rt->u.dst.dev; ip_rt_put(rt); @@ -921,7 +921,7 @@ if (dev == NULL) { struct rtable * rt; - if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0)) != 0) + if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0, NULL)) != 0) return err; dev = rt->u.dst.dev; ip_rt_put(rt); Index: linux/net/ipv4/devinet.c diff -u linux/net/ipv4/devinet.c:1.1.1.4 ASPcomplete/linux/net/ipv4/devinet.c:1.6 --- linux/net/ipv4/devinet.c:1.1.1.4 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/devinet.c Mon Aug 7 19:08:00 2000 @@ -484,6 +484,14 @@ ifr.ifr_name[IFNAMSIZ-1] = 0; #ifdef CONFIG_IP_ALIAS +#ifdef CONFIG_VE_NET + if( !is_super_ve(current) && strncmp(ifr.ifr_name,"lo",2) ) + { + char buf[16]; + sprintf( buf, ":%d", current->envid->veid ); + strcat( ifr.ifr_name, buf ); + } +#endif colon = strchr(ifr.ifr_name, ':'); if (colon) *colon = 0; @@ -663,6 +671,10 @@ rarok: rtnl_unlock(); +#ifdef CONFIG_VE_NET + if( !is_super_ve(current) && colon ) + *colon = 0; +#endif if (copy_to_user(arg, &ifr, sizeof(struct ifreq))) return -EFAULT; return 0; @@ -680,6 +692,11 @@ return 0; for ( ; ifa; ifa = ifa->ifa_next) { +#ifdef CONFIG_VE_NET + if( !is_super_ve(current) && dev!=&loopback_dev && + ifa->ifa_local!=current->envid->ip && ifa->ifa_address!=current->envid->ip ) + continue; +#endif if (!buf) { done += sizeof(ifr); continue; @@ -691,6 +708,13 @@ strcpy(ifr.ifr_name, ifa->ifa_label); else strcpy(ifr.ifr_name, dev->name); +#ifdef CONFIG_VE_NET + if (!is_super_ve(current)) + { + char* ptr = strchr(ifr.ifr_name,':'); + if( ptr ) *ptr = 0; + } +#endif (*(struct sockaddr_in *) &ifr.ifr_addr).sin_family = AF_INET; (*(struct sockaddr_in *) &ifr.ifr_addr).sin_addr.s_addr = ifa->ifa_local; @@ -704,10 +728,24 @@ return done; } -u32 inet_select_addr(const struct net_device *dev, u32 dst, int scope) +u32 inet_select_addr(const struct net_device *dev, u32 dst, int scope, void *_envid) { u32 addr = 0; struct in_device *in_dev; +#ifdef CONFIG_VE_NET + struct ve_struct *envid = (struct ve_struct *)_envid; + if( (!do_is_super_ve(envid) && dev!=&loopback_dev) || !dev ) + { + if( !envid->ip ) + for (dev = dev_base; dev!=NULL && !envid->ip; dev = dev->next) + { + if( dev->flags&IFF_LOOPBACK ) + continue; + envid->ip = inet_select_addr( dev, 0, scope, envid ); + } + return envid->ip; + } +#endif read_lock(&inetdev_lock); in_dev = __in_dev_get(dev); Index: linux/net/ipv4/fib_frontend.c diff -u linux/net/ipv4/fib_frontend.c:1.1.1.3 ASPcomplete/linux/net/ipv4/fib_frontend.c:1.3 --- linux/net/ipv4/fib_frontend.c:1.1.1.3 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/fib_frontend.c Mon Jul 31 11:56:32 2000 @@ -71,7 +71,6 @@ return tb; } - #endif /* CONFIG_IP_MULTIPLE_TABLES */ @@ -153,7 +152,11 @@ #ifdef CONFIG_IP_MULTIPLE_TABLES res.r = NULL; #endif - +#ifdef CONFIG_VE_NET + /* Fix me: do we need VEID here */ + key.envid = get_ve_by_ip(addr); +#endif + if (!local_table || local_table->tb_lookup(local_table, &key, &res)) { return NULL; } @@ -184,6 +187,9 @@ #ifdef CONFIG_IP_MULTIPLE_TABLES res.r = NULL; #endif +#ifdef CONFIG_VE_NET + key.envid = get_ve_by_ip(addr); +#endif if (local_table) { ret = RTN_UNICAST; @@ -204,7 +210,7 @@ */ int fib_validate_source(u32 src, u32 dst, u8 tos, int oif, - struct net_device *dev, u32 *spec_dst, u32 *itag) + struct net_device *dev, u32 *spec_dst, u32 *itag, void *envid) { struct in_device *in_dev; struct rt_key key; @@ -218,6 +224,9 @@ key.oif = 0; key.iif = oif; key.scope = RT_SCOPE_UNIVERSE; +#ifdef CONFIG_VE + key.envid = envid; +#endif no_addr = rpf = 0; read_lock(&inetdev_lock); @@ -268,7 +277,7 @@ last_resort: if (rpf) goto e_inval; - *spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); + *spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE, NULL); *itag = 0; return 0; @@ -654,6 +663,10 @@ #ifndef CONFIG_IP_MULTIPLE_TABLES local_table = fib_hash_init(RT_TABLE_LOCAL); main_table = fib_hash_init(RT_TABLE_MAIN); + +#ifdef CONFIG_VE_ROUTE + local_table->allow_read = main_table->allow_read = 1; +#endif #else fib_rules_init(); #endif @@ -661,4 +674,3 @@ register_netdevice_notifier(&fib_netdev_notifier); register_inetaddr_notifier(&fib_inetaddr_notifier); } - Index: linux/net/ipv4/fib_hash.c diff -u linux/net/ipv4/fib_hash.c:1.1.1.3 ASPcomplete/linux/net/ipv4/fib_hash.c:1.2 --- linux/net/ipv4/fib_hash.c:1.1.1.3 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/fib_hash.c Mon Jul 31 11:56:32 2000 @@ -926,6 +926,10 @@ #ifdef CONFIG_PROC_FS tb->tb_get_info = fn_hash_get_info; #endif +#ifdef CONFIG_VE_ROUTE + tb->envid = current->envid; + tb->allow_read = 0; +#endif memset(tb->tb_data, 0, sizeof(struct fn_hash)); return tb; } Index: linux/net/ipv4/fib_rules.c diff -u linux/net/ipv4/fib_rules.c:1.1.1.3 ASPcomplete/linux/net/ipv4/fib_rules.c:1.3 --- linux/net/ipv4/fib_rules.c:1.1.1.3 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/fib_rules.c Wed Jul 19 16:01:10 2000 @@ -298,6 +298,9 @@ u32 daddr = key->dst; u32 saddr = key->src; +#ifdef CONFIG_VE_NET + res->envid = key->envid; +#endif FRprintk("Lookup: %u.%u.%u.%u <- %u.%u.%u.%u ", NIPQUAD(key->dst), NIPQUAD(key->src)); read_lock(&fib_rules_lock); Index: linux/net/ipv4/fib_semantics.c diff -u linux/net/ipv4/fib_semantics.c:1.1.1.4 ASPcomplete/linux/net/ipv4/fib_semantics.c:1.4 --- linux/net/ipv4/fib_semantics.c:1.1.1.4 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/fib_semantics.c Fri Jun 30 22:49:09 2000 @@ -576,6 +576,9 @@ return 1; res->fi = fi; +#ifdef CONFIG_VE_NET + res->envid = key->envid; +#endif switch (type) { #ifdef CONFIG_IP_ROUTE_NAT @@ -623,7 +626,7 @@ u32 __fib_res_prefsrc(struct fib_result *res) { - return inet_select_addr(FIB_RES_DEV(*res), FIB_RES_GW(*res), res->scope); + return inet_select_addr(FIB_RES_DEV(*res), FIB_RES_GW(*res), res->scope, ENVID(res)); } #ifdef CONFIG_RTNETLINK Index: linux/net/ipv4/icmp.c diff -u linux/net/ipv4/icmp.c:1.1.1.5 ASPcomplete/linux/net/ipv4/icmp.c:1.7 --- linux/net/ipv4/icmp.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/icmp.c Mon Aug 7 19:08:00 2000 @@ -519,10 +519,13 @@ if (ipc.opt->srr) daddr = icmp_param->replyopts.faddr; } - if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0)) + if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0, ENVID(skb))) goto out; if (icmpv4_xrlim_allow(rt, icmp_param->icmph.type, - icmp_param->icmph.code)) { + icmp_param->icmph.code)) { +#ifdef CONFIG_VE_NET + sk->envid = skb->envid; +#endif ip_build_xmit(sk, icmp_glue_bits, icmp_param, icmp_param->data_len+sizeof(struct icmphdr), &ipc, rt, MSG_DONTWAIT); @@ -631,7 +634,7 @@ * fast routing cache at first. Otherwise an attacker can * grow the routing table. */ - if (ip_route_output(&rt, iph->saddr, saddr, RT_TOS(tos), 0)) + if (ip_route_output(&rt, iph->saddr, saddr, RT_TOS(tos), 0, ENVID(skb_in))) goto out; if (ip_options_echo(&icmp_param.replyopts, skb_in)) @@ -654,7 +657,8 @@ ipc.opt = &icmp_param.replyopts; if (icmp_param.replyopts.srr) { ip_rt_put(rt); - if (ip_route_output(&rt, icmp_param.replyopts.faddr, saddr, RT_TOS(tos), 0)) + if (ip_route_output(&rt, icmp_param.replyopts.faddr, saddr, RT_TOS(tos), 0, + ENVID(skb_in))) goto out; } @@ -672,7 +676,10 @@ icmp_param.data_len=(skb_in->tail-(u8*)iph); if (icmp_param.data_len > room) icmp_param.data_len = room; - + +#ifdef CONFIG_VE_NET + icmp_socket->sk->envid = skb_in->envid; +#endif ip_build_xmit(icmp_socket->sk, icmp_glue_bits, &icmp_param, icmp_param.data_len+sizeof(struct icmphdr), &ipc, rt, MSG_DONTWAIT); @@ -781,7 +788,8 @@ if ((raw_sk = raw_v4_htable[hash]) != NULL) { while ((raw_sk = __raw_v4_lookup(raw_sk, iph->protocol, iph->saddr, - iph->daddr, skb->dev->ifindex)) != NULL) { + iph->daddr, skb->dev->ifindex,skb) + ) != NULL) { raw_err(raw_sk, skb); raw_sk = raw_sk->next; } Index: linux/net/ipv4/igmp.c diff -u linux/net/ipv4/igmp.c:1.1.1.5 ASPcomplete/linux/net/ipv4/igmp.c:1.4 --- linux/net/ipv4/igmp.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/igmp.c Mon Aug 7 19:08:00 2000 @@ -204,7 +204,7 @@ if (type == IGMP_HOST_LEAVE_MESSAGE) dst = IGMP_ALL_ROUTER; - if (ip_route_output(&rt, dst, 0, 0, dev->ifindex)) + if (ip_route_output(&rt, dst, 0, 0, dev->ifindex, NULL)) return -1; if (rt->rt_src == 0) { ip_rt_put(rt); @@ -610,7 +610,7 @@ __dev_put(dev); } - if (!dev && !ip_route_output(&rt, imr->imr_multiaddr.s_addr, 0, 0, 0)) { + if (!dev && !ip_route_output(&rt, imr->imr_multiaddr.s_addr, 0, 0, 0, NULL)) { dev = rt->u.dst.dev; ip_rt_put(rt); } Index: linux/net/ipv4/ip_fragment.c diff -u linux/net/ipv4/ip_fragment.c:1.1.1.4 ASPcomplete/linux/net/ipv4/ip_fragment.c:1.2 --- linux/net/ipv4/ip_fragment.c:1.1.1.4 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/ip_fragment.c Mon Jul 31 21:52:02 2000 @@ -611,6 +611,10 @@ qp->meat == qp->len) ret = ip_frag_reasm(qp); +#ifdef CONFIG_VE + if( ret ) + ret->envid = skb->envid; +#endif spin_unlock(&qp->lock); ipq_put(qp); return ret; Index: linux/net/ipv4/ip_gre.c diff -u linux/net/ipv4/ip_gre.c:1.1.1.5 ASPcomplete/linux/net/ipv4/ip_gre.c:1.4 --- linux/net/ipv4/ip_gre.c:1.1.1.5 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/ip_gre.c Mon Aug 7 19:08:00 2000 @@ -485,7 +485,7 @@ skb2->nh.raw = skb2->data; /* Try to guess incoming interface */ - if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0)) { + if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0, NULL)) { kfree_skb(skb2); return; } @@ -495,7 +495,7 @@ if (rt->rt_flags&RTCF_LOCAL) { ip_rt_put(rt); rt = NULL; - if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0) || + if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0, NULL) || rt->u.dst.dev->type != ARPHRD_IPGRE) { ip_rt_put(rt); kfree_skb(skb2); @@ -729,7 +729,7 @@ tos &= ~1; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, NULL)) { tunnel->stat.tx_carrier_errors++; goto tx_error; } @@ -860,6 +860,15 @@ skb->nfct = NULL; #endif + err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, + do_ip_send); + if(err < 0) { + if(net_ratelimit()) + printk(KERN_ERR "ipgre_tunnel_xmit: ip_send() failed, err=%d\n", -err); + skb = NULL; + goto tx_error; + } + IPTUNNEL_XMIT(); tunnel->recursion--; return 0; @@ -1078,9 +1087,11 @@ MOD_INC_USE_COUNT; if (MULTICAST(t->parms.iph.daddr)) { struct rtable *rt; + void *envid = NULL; + if (ip_route_output(&rt, t->parms.iph.daddr, t->parms.iph.saddr, RT_TOS(t->parms.iph.tos), - t->parms.link)) { + t->parms.link, NULL)) { MOD_DEC_USE_COUNT; return -EADDRNOTAVAIL; } @@ -1153,7 +1164,8 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), + tunnel->parms.link, NULL)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } Index: linux/net/ipv4/ip_output.c diff -u linux/net/ipv4/ip_output.c:1.1.1.3 ASPcomplete/linux/net/ipv4/ip_output.c:1.5 --- linux/net/ipv4/ip_output.c:1.1.1.3 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/ip_output.c Tue Aug 1 21:23:00 2000 @@ -118,7 +118,8 @@ if (ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos) | RTO_CONN, - skb->sk ? skb->sk->bound_dev_if : 0)) { + skb->sk ? skb->sk->bound_dev_if : 0, + ENVID(skb))) { printk("route_me_harder: No more route.\n"); return -EINVAL; } @@ -404,7 +405,7 @@ */ if (ip_route_output(&rt, daddr, sk->saddr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if)) + sk->bound_dev_if, ENVID(sk))) goto no_route; __sk_dst_set(sk, &rt->u.dst); } @@ -987,7 +988,7 @@ daddr = replyopts.opt.faddr; } - if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0)) + if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0, ENVID(skb))) return; /* And let IP do all the hard work. Index: linux/net/ipv4/ipip.c diff -u linux/net/ipv4/ipip.c:1.1.1.5 ASPcomplete/linux/net/ipv4/ipip.c:1.4 --- linux/net/ipv4/ipip.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/ipip.c Mon Aug 7 19:08:00 2000 @@ -419,7 +419,7 @@ skb2->nh.raw = skb2->data; /* Try to guess incoming interface */ - if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0)) { + if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0, NULL)) { kfree_skb(skb2); return; } @@ -429,7 +429,7 @@ if (rt->rt_flags&RTCF_LOCAL) { ip_rt_put(rt); rt = NULL; - if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0) || + if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0, NULL) || rt->u.dst.dev->type != ARPHRD_IPGRE) { ip_rt_put(rt); kfree_skb(skb2); @@ -556,7 +556,7 @@ goto tx_error_icmp; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, NULL)) { tunnel->stat.tx_carrier_errors++; goto tx_error_icmp; } @@ -813,7 +813,8 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), + tunnel->parms.link, NULL)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } Index: linux/net/ipv4/ipmr.c diff -u linux/net/ipv4/ipmr.c:1.1.1.4 ASPcomplete/linux/net/ipv4/ipmr.c:1.3 --- linux/net/ipv4/ipmr.c:1.1.1.4 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/ipmr.c Mon Aug 7 19:08:00 2000 @@ -1142,11 +1142,12 @@ #endif if (vif->flags&VIFF_TUNNEL) { - if (ip_route_output(&rt, vif->remote, vif->local, RT_TOS(iph->tos), vif->link)) + if (ip_route_output(&rt, vif->remote, vif->local, RT_TOS(iph->tos), vif->link, + ENVID(skb))) return; encap = sizeof(struct iphdr); } else { - if (ip_route_output(&rt, iph->daddr, 0, RT_TOS(iph->tos), vif->link)) + if (ip_route_output(&rt, iph->daddr, 0, RT_TOS(iph->tos), vif->link, ENVID(skb))) return; } Index: linux/net/ipv4/raw.c diff -u linux/net/ipv4/raw.c:1.1.1.4 ASPcomplete/linux/net/ipv4/raw.c:1.6 --- linux/net/ipv4/raw.c:1.1.1.4 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/raw.c Mon Aug 14 20:05:54 2000 @@ -96,7 +96,7 @@ struct sock *__raw_v4_lookup(struct sock *sk, unsigned short num, unsigned long raddr, unsigned long laddr, - int dif) + int dif, struct sk_buff *skb) { struct sock *s = sk; @@ -104,7 +104,9 @@ if((s->num == num) && !(s->daddr && s->daddr != raddr) && !(s->rcv_saddr && s->rcv_saddr != laddr) && - !(s->bound_dev_if && s->bound_dev_if != dif)) + !(s->bound_dev_if && s->bound_dev_if != dif) && + check_ve_strict(s,skb) + ) break; /* gotcha */ } return s; @@ -142,12 +144,12 @@ goto out; sk = __raw_v4_lookup(sk, iph->protocol, iph->saddr, iph->daddr, - skb->dev->ifindex); + skb->dev->ifindex, skb); while(sk != NULL) { struct sock *sknext = __raw_v4_lookup(sk->next, iph->protocol, iph->saddr, iph->daddr, - skb->dev->ifindex); + skb->dev->ifindex, skb); if (iph->protocol != IPPROTO_ICMP || ! icmp_filter(sk, skb)) { struct sk_buff *clone; @@ -400,7 +402,7 @@ rfh.saddr = sk->protinfo.af_inet.mc_addr; } - err = ip_route_output(&rt, daddr, rfh.saddr, tos, ipc.oif); + err = ip_route_output(&rt, daddr, rfh.saddr, tos, ipc.oif, ENVID(sk)); if (err) goto done; @@ -651,6 +653,8 @@ for (sk = raw_v4_htable[i]; sk; sk = sk->next, num++) { if (sk->family != PF_INET) + continue; + if( !check_current_ve(sk) ) continue; pos += 128; if (pos < offset) Index: linux/net/ipv4/route.c diff -u linux/net/ipv4/route.c:1.1.1.4 ASPcomplete/linux/net/ipv4/route.c:1.5 --- linux/net/ipv4/route.c:1.1.1.4 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/route.c Mon Jul 17 15:19:43 2000 @@ -1097,13 +1097,15 @@ else if (fib_lookup(&rt->key, &res) == 0) { #ifdef CONFIG_IP_ROUTE_NAT if (res.type == RTN_NAT) - src = inet_select_addr(rt->u.dst.dev, rt->rt_gateway, RT_SCOPE_UNIVERSE); + src = inet_select_addr(rt->u.dst.dev, rt->rt_gateway, RT_SCOPE_UNIVERSE, + ENVIDP(rt->key)); else #endif src = FIB_RES_PREFSRC(res); fib_res_put(&res); } else - src = inet_select_addr(rt->u.dst.dev, rt->rt_gateway, RT_SCOPE_UNIVERSE); + src = inet_select_addr(rt->u.dst.dev, rt->rt_gateway, RT_SCOPE_UNIVERSE, + ENVIDP(rt->key)); memcpy(addr, &src, 4); } @@ -1178,8 +1180,8 @@ if (ZERONET(saddr)) { if (!LOCAL_MCAST(daddr)) goto e_inval; - spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK); - } else if (fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst, &itag) < 0) + spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK, ENVID(skb)); + } else if (fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst, &itag, ENVID(skb)) < 0) goto e_inval; rth = dst_alloc(&ipv4_dst_ops); @@ -1275,6 +1277,9 @@ #ifdef CONFIG_IP_ROUTE_FWMARK key.fwmark = skb->nfmark; #endif +#ifdef CONFIG_VE_NET + key.envid = ENVID(skb); +#endif key.iif = dev->ifindex; key.oif = 0; key.scope = RT_SCOPE_UNIVERSE; @@ -1341,9 +1346,11 @@ if (res.type == RTN_LOCAL) { int result; result = fib_validate_source(saddr, daddr, tos, loopback_dev.ifindex, - dev, &spec_dst, &itag); + dev, &spec_dst, &itag, ENVID(skb)); +#if 0 if (result < 0) goto martian_source; +#endif if (result) flags |= RTCF_DIRECTSRC; spec_dst = daddr; @@ -1366,7 +1373,8 @@ goto e_inval; } - err = fib_validate_source(saddr, daddr, tos, FIB_RES_OIF(res), dev, &spec_dst, &itag); + err = fib_validate_source(saddr, daddr, tos, FIB_RES_OIF(res), dev, &spec_dst, &itag, + ENVID(skb)); if (err < 0) goto martian_source; @@ -1398,6 +1406,9 @@ #ifdef CONFIG_IP_ROUTE_FWMARK rth->key.fwmark = skb->nfmark; #endif +#ifdef CONFIG_VE_NET + rth->key.envid = ENVID(skb); +#endif rth->key.src = saddr; rth->rt_src = saddr; rth->rt_gateway = daddr; @@ -1447,9 +1458,9 @@ goto e_inval; if (ZERONET(saddr)) { - spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK); + spec_dst = inet_select_addr(dev, 0, RT_SCOPE_LINK, ENVID(skb)); } else { - err = fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst, &itag); + err = fib_validate_source(saddr, 0, tos, 0, dev, &spec_dst, &itag, ENVID(skb)); if (err < 0) goto martian_source; if (err) @@ -1470,6 +1481,9 @@ rth->key.dst = daddr; rth->rt_dst = daddr; rth->key.tos = tos; +#ifdef CONFIG_VE_NET + rth->key.envid = ENVID(skb); +#endif #ifdef CONFIG_IP_ROUTE_FWMARK rth->key.fwmark = skb->nfmark; #endif @@ -1500,7 +1514,7 @@ goto intern; no_route: - spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); + spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE, ENVID(skb)); res.type = RTN_UNREACHABLE; goto local_input; @@ -1556,6 +1570,11 @@ tos &= IPTOS_RT_MASK; hash = rt_hash_code(daddr, saddr^(iif<<5), tos); +#ifdef CONFIG_VE_NET + if( !LOOPBACK(saddr) ) + skb->envid = get_ve_by_ip(daddr); +#endif + read_lock(&rt_hash_table[hash].lock); for (rth=rt_hash_table[hash].chain; rth; rth=rth->u.rt_next) { if (rth->key.dst == daddr && @@ -1565,6 +1584,9 @@ #ifdef CONFIG_IP_ROUTE_FWMARK rth->key.fwmark == skb->nfmark && #endif +#ifdef CONFIG_VE_NET + rth->key.envid == ENVID(skb) && +#endif rth->key.tos == tos) { rth->u.dst.lastuse = jiffies; dst_hold(&rth->u.dst); @@ -1612,7 +1634,7 @@ * Major route resolver routine. */ -int ip_route_output_slow(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif) +int ip_route_output_slow(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif, void *_envid) { struct rt_key key; struct fib_result res; @@ -1630,6 +1652,9 @@ key.iif = loopback_dev.ifindex; key.oif = oif; key.scope = (tos&RTO_ONLINK) ? RT_SCOPE_LINK : RT_SCOPE_UNIVERSE; +#ifdef CONFIG_VE_NET + key.envid = (struct ve_struct*)_envid; +#endif res.fi = NULL; #ifdef CONFIG_IP_MULTIPLE_TABLES res.r = NULL; @@ -1687,14 +1712,14 @@ if (LOCAL_MCAST(daddr) || daddr == 0xFFFFFFFF) { if (!key.src) - key.src = inet_select_addr(dev_out, 0, RT_SCOPE_LINK); + key.src = inet_select_addr(dev_out, 0, RT_SCOPE_LINK, _envid); goto make_route; } if (!key.src) { if (MULTICAST(daddr)) - key.src = inet_select_addr(dev_out, 0, key.scope); + key.src = inet_select_addr(dev_out, 0, key.scope, _envid); else if (!daddr) - key.src = inet_select_addr(dev_out, 0, RT_SCOPE_HOST); + key.src = inet_select_addr(dev_out, 0, RT_SCOPE_HOST, _envid); } } @@ -1734,7 +1759,7 @@ */ if (key.src == 0) - key.src = inet_select_addr(dev_out, 0, RT_SCOPE_LINK); + key.src = inet_select_addr(dev_out, 0, RT_SCOPE_LINK, _envid); res.type = RTN_UNICAST; goto make_route; } @@ -1748,8 +1773,15 @@ goto e_inval; if (res.type == RTN_LOCAL) { +#ifndef CONFIG_VE_NET if (!key.src) key.src = key.dst; +#else + if( LOOPBACK(key.dst) || !key.envid ) + key.src = key.dst; + else + key.src = inet_select_addr( NULL, 0, RT_SCOPE_HOST, _envid ); +#endif if (dev_out) dev_put(dev_out); dev_out = &loopback_dev; @@ -1820,12 +1852,16 @@ goto e_nobufs; atomic_set(&rth->u.dst.__refcnt, 1); + rth->u.dst.flags= DST_HOST; rth->key.dst = daddr; rth->key.tos = tos; rth->key.src = saddr; rth->key.iif = 0; rth->key.oif = oif; +#ifdef CONFIG_VE_NET + rth->key.envid = key.envid; +#endif rth->rt_dst = key.dst; rth->rt_src = key.src; #ifdef CONFIG_IP_ROUTE_NAT @@ -1883,7 +1919,7 @@ goto done; } -int ip_route_output(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif) +int ip_route_output(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif, void *envid) { unsigned hash; struct rtable *rth; @@ -1896,6 +1932,9 @@ rth->key.src == saddr && rth->key.iif == 0 && rth->key.oif == oif && +#ifdef CONFIG_VE_NET + rth->key.envid == envid && +#endif !((rth->key.tos^tos)&(IPTOS_RT_MASK|RTO_ONLINK)) && ((tos&RTO_TPROXY) || !(rth->rt_flags&RTCF_TPROXY)) ) { @@ -1909,7 +1948,7 @@ } read_unlock_bh(&rt_hash_table[hash].lock); - return ip_route_output_slow(rp, daddr, saddr, tos, oif); + return ip_route_output_slow(rp, daddr, saddr, tos, oif, envid); } #ifdef CONFIG_RTNETLINK @@ -2058,7 +2097,7 @@ int oif = 0; if (rta[RTA_OIF-1]) memcpy(&oif, RTA_DATA(rta[RTA_OIF-1]), sizeof(int)); - err = ip_route_output(&rt, dst, src, rtm->rtm_tos, oif); + err = ip_route_output(&rt, dst, src, rtm->rtm_tos, oif, ENVID(in_skb)); } if (err) { kfree_skb(skb); Index: linux/net/ipv4/syncookies.c diff -u linux/net/ipv4/syncookies.c:1.1.1.4 ASPcomplete/linux/net/ipv4/syncookies.c:1.3 --- linux/net/ipv4/syncookies.c:1.1.1.4 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/syncookies.c Mon Aug 7 19:08:01 2000 @@ -181,7 +181,8 @@ opt->srr ? opt->faddr : req->af.v4_req.rmt_addr, req->af.v4_req.loc_addr, sk->protinfo.af_inet.tos | RTO_CONN, - 0)) { + 0, + ENVID(sk))) { tcp_openreq_free(req); return NULL; } Index: linux/net/ipv4/tcp_input.c diff -u linux/net/ipv4/tcp_input.c:1.1.1.3 ASPcomplete/linux/net/ipv4/tcp_input.c:1.3 --- linux/net/ipv4/tcp_input.c:1.1.1.3 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/tcp_input.c Tue Jul 25 11:52:10 2000 @@ -1558,6 +1558,10 @@ tw->ts_recent_stamp= tp->ts_recent_stamp; tw->pprev_death = NULL; +#ifdef CONFIG_VE_NET + tw->envid = sk->envid; +#endif + #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) if(tw->family == PF_INET6) { memcpy(&tw->v6_daddr, @@ -2694,9 +2698,16 @@ */ struct sock *tcp_create_openreq_child(struct sock *sk, struct open_request *req, struct sk_buff *skb) { - struct sock *newsk = sk_alloc(PF_INET, GFP_ATOMIC, 0); + struct sock *newsk; - if(newsk != NULL) { + if (charge_sock(sk->beancounter, + sizeof(struct sock) + sk->rcvbuf + sk->sndbuf)) + goto charge_fail; + newsk = sk_alloc(PF_INET, GFP_ATOMIC, 0); + if (newsk == NULL) + goto alloc_fail; + + { struct tcp_opt *newtp; #ifdef CONFIG_FILTER struct sk_filter *filter; @@ -2800,6 +2811,10 @@ tcp_reset_keepalive_timer(newsk, keepalive_time_when(newtp)); newsk->socket = NULL; newsk->sleep = NULL; + /* newsk->beancounter = sk->beancounter; -- copied */ + get_beancounter(newsk->beancounter); + newsk->rcvbuf_charged = sk->rcvbuf; + newsk->sndbuf_charged = sk->sndbuf; newtp->tstamp_ok = req->tstamp_ok; if((newtp->sack_ok = req->sack_ok) != 0) @@ -2828,6 +2843,15 @@ newtp->mss_clamp = req->mss; } return newsk; + +alloc_fail: + uncharge_sock(sk->beancounter, + sizeof(struct sock) + sk->rcvbuf + sk->sndbuf); + return NULL; +charge_fail: + if (net_ratelimit()) + printk(KERN_WARNING "no resources, can\'t create socket.\n"); + return NULL; } /* Index: linux/net/ipv4/tcp_ipv4.c diff -u linux/net/ipv4/tcp_ipv4.c:1.1.1.5 ASPcomplete/linux/net/ipv4/tcp_ipv4.c:1.6 --- linux/net/ipv4/tcp_ipv4.c:1.1.1.5 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/tcp_ipv4.c Mon Aug 7 19:08:01 2000 @@ -236,7 +236,7 @@ int sk_reuse = sk->reuse; for( ; sk2 != NULL; sk2 = sk2->bind_next) { - if (sk != sk2 && + if (sk != sk2 && check_ve_strict(sk2,sk) && sk->bound_dev_if == sk2->bound_dev_if) { if (!sk_reuse || !sk2->reuse || @@ -411,14 +411,16 @@ * connection. So always assume those are both wildcarded * during the search since they can never be otherwise. */ -static struct sock *__tcp_v4_lookup_listener(struct sock *sk, u32 daddr, unsigned short hnum, int dif) +static struct sock *__tcp_v4_lookup_listener(struct sock *sk, u32 daddr, unsigned short hnum, int dif, + struct sk_buff *skb) { struct sock *result = NULL; int score, hiscore; hiscore=0; for(; sk; sk = sk->next) { - if(sk->num == hnum) { + if(sk->num == hnum && check_ve_strict(sk,skb) + ) { __u32 rcv_saddr = sk->rcv_saddr; score = 1; @@ -444,19 +446,20 @@ } /* Optimize the common listener case. */ -__inline__ struct sock *tcp_v4_lookup_listener(u32 daddr, unsigned short hnum, int dif) +__inline__ struct sock *tcp_v4_lookup_listener(u32 daddr, unsigned short hnum, int dif, + struct sk_buff *skb) { struct sock *sk; read_lock(&tcp_lhash_lock); sk = tcp_listening_hash[tcp_lhashfn(hnum)]; if (sk) { - if (sk->num == hnum && + if (sk->num == hnum && check_ve_strict(sk,skb) && sk->next == NULL && (!sk->rcv_saddr || sk->rcv_saddr == daddr) && !sk->bound_dev_if) goto sherry_cache; - sk = __tcp_v4_lookup_listener(sk, daddr, hnum, dif); + sk = __tcp_v4_lookup_listener(sk, daddr, hnum, dif, skb); } if (sk) { sherry_cache: @@ -473,7 +476,8 @@ */ static inline struct sock *__tcp_v4_lookup_established(u32 saddr, u16 sport, - u32 daddr, u16 hnum, int dif) + u32 daddr, u16 hnum, int dif, + struct sk_buff *skb) { struct tcp_ehash_bucket *head; TCP_V4_ADDR_COOKIE(acookie, saddr, daddr) @@ -488,13 +492,13 @@ head = &tcp_ehash[hash]; read_lock(&head->lock); for(sk = head->chain; sk; sk = sk->next) { - if(TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif)) + if(TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif, skb)) goto hit; /* You sunk my battleship! */ } /* Must check for a TIME_WAIT'er before going to listener hash. */ for(sk = (head + tcp_ehash_size)->chain; sk; sk = sk->next) - if(TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif)) + if(TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif, skb)) goto hit; read_unlock(&head->lock); @@ -507,24 +511,25 @@ } static inline struct sock *__tcp_v4_lookup(u32 saddr, u16 sport, - u32 daddr, u16 hnum, int dif) + u32 daddr, u16 hnum, int dif, struct sk_buff *skb) { struct sock *sk; - sk = __tcp_v4_lookup_established(saddr, sport, daddr, hnum, dif); + sk = __tcp_v4_lookup_established(saddr, sport, daddr, hnum, dif, skb); if (sk) return sk; - return tcp_v4_lookup_listener(daddr, hnum, dif); + return tcp_v4_lookup_listener(daddr, hnum, dif, skb); } -__inline__ struct sock *tcp_v4_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif) +__inline__ struct sock *tcp_v4_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif, + struct sk_buff *skb) { struct sock *sk; local_bh_disable(); - sk = __tcp_v4_lookup(saddr, sport, daddr, ntohs(dport), dif); + sk = __tcp_v4_lookup(saddr, sport, daddr, ntohs(dport), dif, skb); local_bh_enable(); return sk; @@ -557,7 +562,8 @@ skp = &sk2->next) { tw = (struct tcp_tw_bucket*)sk2; - if(TCP_IPV4_MATCH(sk2, acookie, saddr, daddr, ports, dif)) { + + if(TCP_IPV4_MATCH(sk2, acookie, saddr, daddr, ports, dif, sk)) { struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); /* With PAWS, it is safe from the viewpoint @@ -591,7 +597,7 @@ /* And established part... */ for(skp = &head->chain; (sk2=*skp)!=NULL; skp = &sk2->next) { - if(TCP_IPV4_MATCH(sk2, acookie, saddr, daddr, ports, dif)) + if(TCP_IPV4_MATCH(sk2, acookie, saddr, daddr, ports, dif, sk)) goto not_unique; } @@ -676,7 +682,8 @@ } tmp = ip_route_connect(&rt, nexthop, sk->saddr, - RT_TOS(sk->protinfo.af_inet.tos)|RTO_CONN|sk->localroute, sk->bound_dev_if); + RT_TOS(sk->protinfo.af_inet.tos)|RTO_CONN|sk->localroute, + sk->bound_dev_if, ENVID(sk)); if (tmp < 0) return tmp; @@ -893,7 +900,7 @@ th = (struct tcphdr*)(dp+(iph->ihl<<2)); - sk = tcp_v4_lookup(iph->daddr, th->dest, iph->saddr, th->source, tcp_v4_iif(skb)); + sk = tcp_v4_lookup(iph->daddr, th->dest, iph->saddr, th->source, tcp_v4_iif(skb), skb); if (sk == NULL) { ICMP_INC_STATS_BH(IcmpInErrors); return; @@ -1178,7 +1185,7 @@ req->af.v4_req.rmt_addr), req->af.v4_req.loc_addr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if)) { + sk->bound_dev_if, ENVID(sk))) { IP_INC_STATS_BH(IpOutNoRoutes); return NULL; } @@ -1506,7 +1513,9 @@ th->source, skb->nh.iph->daddr, ntohs(th->dest), - tcp_v4_iif(skb)); + tcp_v4_iif(skb), + skb + ); if (nsk) { if (nsk->state != TCP_TIME_WAIT) { @@ -1648,7 +1657,7 @@ skb->used = 0; sk = __tcp_v4_lookup(skb->nh.iph->saddr, th->source, - skb->nh.iph->daddr, ntohs(th->dest), tcp_v4_iif(skb)); + skb->nh.iph->daddr, ntohs(th->dest), tcp_v4_iif(skb), skb); if (!sk) goto no_tcp_socket; @@ -1701,7 +1710,7 @@ { struct sock *sk2; - sk2 = tcp_v4_lookup_listener(skb->nh.iph->daddr, ntohs(th->dest), tcp_v4_iif(skb)); + sk2 = tcp_v4_lookup_listener(skb->nh.iph->daddr, ntohs(th->dest), tcp_v4_iif(skb), skb); if (sk2 != NULL) { tcp_tw_deschedule((struct tcp_tw_bucket *)sk); tcp_timewait_kill((struct tcp_tw_bucket *)sk); @@ -1746,7 +1755,7 @@ err = ip_route_output(&rt, daddr, sk->saddr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if); + sk->bound_dev_if, ENVID(sk)); if (err) { sk->err_soft=-err; sk->error_report(sk); @@ -1768,7 +1777,7 @@ /* Query new route using another rt buffer */ tmp = ip_route_connect(&new_rt, rt->rt_dst, 0, RT_TOS(sk->protinfo.af_inet.tos)|sk->localroute, - sk->bound_dev_if); + sk->bound_dev_if, ENVID(sk)); /* Only useful if different source addrs */ if (tmp == 0) { @@ -2075,6 +2084,9 @@ struct open_request *req; struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); + if( !check_current_ve(sk) ) + continue; + if (!TCP_INET_FAMILY(sk->family)) goto skip_listen; @@ -2127,6 +2139,9 @@ read_lock(&head->lock); for(sk = head->chain; sk; sk = sk->next, num++) { + if( !check_current_ve(sk) ) + continue; + if (!TCP_INET_FAMILY(sk->family)) continue; pos += 128; @@ -2144,6 +2159,9 @@ tw = (struct tcp_tw_bucket *)tw->next, num++) { if (!TCP_INET_FAMILY(tw->family)) continue; + if( !check_current_ve(tw) ) + continue; + pos += 128; if (pos < offset) continue; Index: linux/net/ipv4/udp.c diff -u linux/net/ipv4/udp.c:1.1.1.5 ASPcomplete/linux/net/ipv4/udp.c:1.5 --- linux/net/ipv4/udp.c:1.1.1.5 Mon Aug 14 19:41:37 2000 +++ linux/net/ipv4/udp.c Mon Jul 17 15:19:43 2000 @@ -178,7 +178,7 @@ sk2 != NULL; sk2 = sk2->next) { if (sk2->num == snum && - sk2 != sk && + sk2 != sk && check_ve_strict(sk2,sk) && sk2->bound_dev_if == sk->bound_dev_if && (!sk2->rcv_saddr || !sk->rcv_saddr || @@ -227,15 +227,17 @@ /* UDP is nearly always wildcards out the wazoo, it makes no sense to try * harder than this. -DaveM */ -struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif) +struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif, + struct sk_buff *skb) { struct sock *sk, *result = NULL; unsigned short hnum = ntohs(dport); int badness = -1; for(sk = udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]; sk != NULL; sk = sk->next) { - if(sk->num == hnum) { + if(sk->num == hnum && check_ve_strict(sk,skb)) { int score = 0; + if(sk->rcv_saddr) { if(sk->rcv_saddr != daddr) continue; @@ -268,12 +270,14 @@ return result; } -__inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif) +__inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif, + struct sk_buff *skb) { struct sock *sk; read_lock(&udp_hash_lock); - sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif); + + sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif, skb); if (sk) sock_hold(sk); read_unlock(&udp_hash_lock); @@ -328,7 +332,7 @@ return; } - sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex); + sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex, skb); if (sk == NULL) { ICMP_INC_STATS_BH(IcmpInErrors); return; /* No socket for error */ @@ -555,7 +559,7 @@ rt = (struct rtable*)sk_dst_check(sk, 0); if (rt == NULL) { - err = ip_route_output(&rt, daddr, ufh.saddr, tos, ipc.oif); + err = ip_route_output(&rt, daddr, ufh.saddr, tos, ipc.oif, ENVID(sk)); if (err) goto out; @@ -771,7 +775,8 @@ sk_dst_reset(sk); err = ip_route_connect(&rt, usin->sin_addr.s_addr, sk->saddr, - sk->protinfo.af_inet.tos|sk->localroute, sk->bound_dev_if); + sk->protinfo.af_inet.tos|sk->localroute, sk->bound_dev_if, + ENVID(sk)); if (err) return err; if ((rt->rt_flags&RTCF_BROADCAST) && !sk->broadcast) { @@ -940,9 +945,9 @@ if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) return udp_v4_mcast_deliver(skb, uh, saddr, daddr); - - sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex); + + sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex, skb); if (sk != NULL) { udp_queue_rcv_skb(sk, skb); sock_put(sk); @@ -1021,6 +1026,9 @@ for (sk = udp_hash[i]; sk; sk = sk->next, num++) { if (sk->family != PF_INET) continue; + if( !check_current_ve(sk) ) + continue; + pos += 128; if (pos < offset) continue; Index: linux/net/ipv4/netfilter/ip_fw_compat_masq.c diff -u linux/net/ipv4/netfilter/ip_fw_compat_masq.c:1.1.1.5 ASPcomplete/linux/net/ipv4/netfilter/ip_fw_compat_masq.c:1.3 --- linux/net/ipv4/netfilter/ip_fw_compat_masq.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/netfilter/ip_fw_compat_masq.c Mon Aug 7 19:08:01 2000 @@ -72,12 +72,12 @@ /* Pass 0 instead of saddr, since it's going to be changed anyway. */ - if (ip_route_output(&rt, iph->daddr, 0, 0, 0) != 0) { + if (ip_route_output(&rt, iph->daddr, 0, 0, 0, NULL) != 0) { DEBUGP("ipnat_rule_masquerade: Can't reroute.\n"); return NF_DROP; } newsrc = inet_select_addr(rt->u.dst.dev, rt->rt_gateway, - RT_SCOPE_UNIVERSE); + RT_SCOPE_UNIVERSE, NULL); ip_rt_put(rt); range = ((struct ip_nat_multi_range) { 1, Index: linux/net/ipv4/netfilter/ip_nat_core.c diff -u linux/net/ipv4/netfilter/ip_nat_core.c:1.1.1.6 ASPcomplete/linux/net/ipv4/netfilter/ip_nat_core.c:1.4 --- linux/net/ipv4/netfilter/ip_nat_core.c:1.1.1.6 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/netfilter/ip_nat_core.c Mon Aug 7 19:08:01 2000 @@ -204,7 +204,7 @@ struct rtable *rt; /* FIXME: IPTOS_TOS(iph->tos) --RR */ - if (ip_route_output(&rt, var_ip, 0, 0, 0) != 0) { + if (ip_route_output(&rt, var_ip, 0, 0, 0, NULL) != 0) { DEBUGP("do_extra_mangle: Can't get route to %u.%u.%u.%u\n", IP_PARTS(var_ip)); return 0; Index: linux/net/ipv4/netfilter/ipt_MASQUERADE.c diff -u linux/net/ipv4/netfilter/ipt_MASQUERADE.c:1.1.1.5 ASPcomplete/linux/net/ipv4/netfilter/ipt_MASQUERADE.c:1.3 --- linux/net/ipv4/netfilter/ipt_MASQUERADE.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/netfilter/ipt_MASQUERADE.c Mon Jul 17 15:19:43 2000 @@ -85,7 +85,7 @@ if (ip_route_output(&rt, (*pskb)->nh.iph->daddr, 0, RT_TOS((*pskb)->nh.iph->tos)|RTO_CONN, - out->ifindex) != 0) { + out->ifindex, NULL) != 0) { /* Shouldn't happen */ printk("MASQUERADE: No route: Rusty's brain broke!\n"); return NF_DROP; Index: linux/net/ipv4/netfilter/ipt_MIRROR.c diff -u linux/net/ipv4/netfilter/ipt_MIRROR.c:1.1.1.4 ASPcomplete/linux/net/ipv4/netfilter/ipt_MIRROR.c:1.3 --- linux/net/ipv4/netfilter/ipt_MIRROR.c:1.1.1.4 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/netfilter/ipt_MIRROR.c Mon Jul 17 15:19:43 2000 @@ -44,7 +44,7 @@ /* Backwards */ if (ip_route_output(&rt, iph->saddr, iph->daddr, RT_TOS(iph->tos) | RTO_CONN, - 0)) { + 0, NULL)) { return 0; } Index: linux/net/ipv4/netfilter/ipt_owner.c diff -u linux/net/ipv4/netfilter/ipt_owner.c:1.1.1.3 ASPcomplete/linux/net/ipv4/netfilter/ipt_owner.c:1.2 --- linux/net/ipv4/netfilter/ipt_owner.c:1.1.1.3 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv4/netfilter/ipt_owner.c Sun Jun 18 13:04:04 2000 @@ -18,7 +18,7 @@ int i; read_lock(&tasklist_lock); - p = find_task_by_pid(pid); + p = find_task_by_pid_all(pid); if(p && p->files) { for (i=0; i < p->files->max_fds; i++) { if (fcheck_files(p->files, i) == skb->sk->socket->file) { @@ -38,7 +38,7 @@ int i, found=0; read_lock(&tasklist_lock); - for_each_task(p) { + for_each_task_all(p) { if ((p->session != sid) || !p->files) continue; Index: linux/net/ipv6/sit.c diff -u linux/net/ipv6/sit.c:1.1.1.5 ASPcomplete/linux/net/ipv6/sit.c:1.4 --- linux/net/ipv6/sit.c:1.1.1.5 Mon Aug 14 19:41:38 2000 +++ linux/net/ipv6/sit.c Mon Aug 7 19:08:01 2000 @@ -474,7 +474,7 @@ dst = addr6->s6_addr32[3]; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, NULL)) { tunnel->stat.tx_carrier_errors++; goto tx_error_icmp; } @@ -740,7 +740,8 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), + tunnel->parms.link, NULL)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } Index: linux/net/khttpd/userspace.c diff -u linux/net/khttpd/userspace.c:1.1.1.3 ASPcomplete/linux/net/khttpd/userspace.c:1.2 --- linux/net/khttpd/userspace.c:1.1.1.3 Mon Aug 14 19:41:41 2000 +++ linux/net/khttpd/userspace.c Tue Aug 1 22:02:10 2000 @@ -160,7 +160,7 @@ EnterFunction("FindUserspace"); local_bh_disable(); - sk = tcp_v4_lookup_listener(INADDR_ANY,Port,0); + sk = tcp_v4_lookup_listener(INADDR_ANY,Port,0,NULL); local_bh_enable(); return sk; } Index: linux/net/packet/af_packet.c diff -u linux/net/packet/af_packet.c:1.1.1.6 ASPcomplete/linux/net/packet/af_packet.c:1.7 --- linux/net/packet/af_packet.c:1.1.1.6 Mon Aug 14 19:41:41 2000 +++ linux/net/packet/af_packet.c Sat Aug 12 22:02:29 2000 @@ -257,7 +257,7 @@ * so that this procedure is noop. */ - if (skb->pkt_type == PACKET_LOOPBACK) + if (skb->pkt_type == PACKET_LOOPBACK || !check_ve(skb,sk)) goto out; if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) @@ -429,6 +429,9 @@ sk = (struct sock *) pt->data; po = sk->protinfo.af_packet; + if (!check_ve(skb,sk)) + goto drop; + skb->dev = dev; if (dev->hard_header) { Index: linux/net/unix/af_unix.c diff -u linux/net/unix/af_unix.c:1.1.1.7 ASPcomplete/linux/net/unix/af_unix.c:1.6 --- linux/net/unix/af_unix.c:1.1.1.7 Mon Aug 14 19:41:40 2000 +++ linux/net/unix/af_unix.c Mon Aug 7 19:08:01 2000 @@ -109,6 +109,9 @@ #include #include +#ifdef CONFIG_VE +#include +#endif #define min(a,b) (((a)<(b))?(a):(b)) @@ -250,8 +253,8 @@ unix_socket *s; for (s=unix_socket_table[hash^type]; s; s=s->next) { - if(s->protinfo.af_unix.addr->len==len && - memcmp(s->protinfo.af_unix.addr->name, sunname, len) == 0) + if(s->protinfo.af_unix.addr->len==len && check_current_ve(s) && + memcmp(s->protinfo.af_unix.addr->name, sunname, len) == 0 ) return s; } return NULL; @@ -280,8 +283,7 @@ { struct dentry *dentry = s->protinfo.af_unix.dentry; - if(dentry && dentry->d_inode == i) - { + if( dentry && dentry->d_inode == i && check_current_ve(s) ) { sock_hold(s); break; } @@ -1725,6 +1727,8 @@ read_lock(&unix_table_lock); forall_unix_sockets (i,s) { + if( !check_current_ve(s) ) + continue; unix_state_rlock(s); len+=sprintf(buffer+len,"%p: %08X %08X %08X %04X %02X %5ld",