From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Sat 08 Feb 2003 - 13:07:10 GMT
On Sat, Feb 08, 2003 at 01:08:44AM -0500, Jacques Gelinas wrote:
> I am trying to figure that bug. I have withness the bug twice on
> a server although I am using ctx-15 and ctx-16 on many servers.
> 
> The big difference between ctx-15 and the previous is the way
> the struct iproot_info is used. In previous kernel, only struct task
> was referencing struct iproot pointers. A reference count was maintained
> when a new process was created and when a process was ending. Easy.
> 
> In ctx-15, sockets also reference those pointers, so have to handle
> the reference count. The big issue when debugging ctx-15 was to
> realised that sockets (struct sock) were copied to other struct such
> as tcp_tw_bucket and that some common routine were use
> to handle both struct sock and struct tcp_tw_bucket. Anyway, the
> reference count stuff, while trivial (one line of core here and there) took
> some time to get right (one line of code here and there, but where :-) ).
> 
> Now, I realise that ipv6 is sharing much of the code of ipv4. It does
> share the socket initialisation code, but it use the same
> cleanup function: inet_sock_destruct. This function does the reference
> count stuff on iproot_info using a non initialised pointer. Oops.
> 
> So I did the following in net/ipv6/af_inet6.c
> 
> *** af_inet6.bak        2001-10-31 15:32:46.000000000 -0500
> --- af_inet6.c  2003-02-08 00:50:28.000000000 -0500
> ***************
> *** 183,188 ****
> --- 183,191 ----
>         sk->protinfo.af_inet.mc_index   = 0;
>         sk->protinfo.af_inet.mc_list    = NULL;
> 
> +       sk->s_context = current->s_context;
> +       sk->ip_info = NULL;
> +
>         if (ipv4_config.no_pmtu_disc)
>                 sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_DONT;
>         else
> ----------------------------------------------------------------------
> 
> Now, I suspect this is not all. But just to make sure:
> 
> Aare there some people out there having crash with ctx-15 or 16,
> not using ipv6 at all ?
Jacques,
the panic usually happens at sched.c 570 and this is 
a "bug" panic, have a look ...
asmlinkage void schedule(void)
{
        ...
        if (unlikely(in_interrupt())) {
                printk("Scheduling in interrupt\n");
570		BUG();
        }
.. so my educated guess would be, there are situations
where a schedule request happens while the system is
handling an interrupt ...
the cause for this, might be some badly initialized
pointer or struct, but this seems not very likely, 
so I would search for a race condition under heavy
interrupt load ...
best,
Herbert
> Those using ipv6, can you try this patch ?
> 
> (for sure it does not make the kernel vserver/ipv6 aware).
> 
> ---------------------------------------------------------
> Jacques Gelinas <jack_at_solucorp.qc.ca>
> vserver: run general purpose virtual servers on one box, full speed!
> http://www.solucorp.qc.ca/miscprj/s_context.hc