About this list Date view Thread view Subject view Author view Attachment view

From: Christian (chth_at_gmx.net)
Date: Wed 13 Nov 2002 - 09:07:30 GMT

On Mon, 11 Nov 2002 16:37:14 +0100
Herbert Poetzl <herbert_at_13thfloor.at> wrote:

> hmm, so a configuration file, or a log file which would
> not be used for some time, will become a candidate for
> unification? what if the file then gets used in a way
> not suited for the IMMUTABLE-UNLINK approach?

Thats the task of 'clever' selcection options --exclude '.*/etc/.*'
--exclude '.*\.conf' but You are right prolly it needs better selection
options '--exclude --clrmod 111' exclude files where the execute bits are
not set ... and so on .. thats why i asked here for ideas ... thanks

> how do you plan to match (compare) the files?
> - by path/contents
> - by hash values (md5,etc)

- fileselection/size/contents

calculating a hash would involve a scan through a entire file anyways plus
some calculations ... so i plan to do the following
a) stat all files, any special files are excluded, dirs are matched
against the include/exclude regex (anyone wants --includedir --excludedir
instead?), files are matched against all selection-options
b) the stat-data of files which became a candidate by file selection are
kept in a map/set {filename,stat,attr} (i will use C++ for implemetation
like the other tools too) the filesize will be used as ordering attribute.
c) mmap reasonably many files of the same size and matching
uid/gid/stat/attr... into memory and compare them (and redo this if not
all files can be mapped, have special care for huge files which can not be
mmap'ed, ...)

Note: files dont need to be on the same path only content matters
b1) if the big dictionary in memory becomes a problem i could use a
temponary db3 or so.

> because you might run in an O(n^2) issue ...

I don't really care this is not a performance important task you might run
it once a month and it can take many hours, no problem and the above
algorithm might be somewhere in O(n) maybe little worse.

> a linear approach could be generating a list of hash
> values (sum, md5sum, cksum, fsum) for each vertual
> server (including a reference) and then only comparing
> a to-be-unified server (list) with the reference ...
> should give O(n)

so i thought ... but keeping stat-structs instead hashes (i first thought
about hashes, but that will be slower! and less precise).

cya Christian

About this list Date view Thread view Subject view Author view Attachment view
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Wed 13 Nov 2002 - 10:53:30 GMT by hypermail 2.1.3