About this list Date view Thread view Subject view Author view Attachment view

From: Sam Vilain (sam_at_vilain.net)
Date: Thu 28 Apr 2005 - 00:40:16 BST


Herbert Poetzl wrote:
>>Sadly, Postgres is missing these important features;
>> - bitmap indexes
>> - OLAP query re-writing
> I have absolutely no idea what you are talking about ...
> but:
> New Enterprise Features in 7.4
> * Hash aggregation in memory to make data warehousing and
> OLAP queries up to 20 times faster;
> (they are now at 8.0.1 or later ...)

Well, given you asked, and NO THIS ISN'T A FLAMEWAR INVITATION LURKERS :)

Bitmap indexes are a simple concept, and last time I checked there were
Pg patches for them (using Pg's pluggable index system), but they weren't
standard. Looking now, all I see is the occasional question on their
mailing list followed by a few clueless replies ('do you mean this...?').

All they are is a B-Tree on the distinct *values* of a column, and then
a very long bitmap for each value, one bit for each row in the table,
with 1's in the rows where that value is held. A low CPU overhead
compression scheme makes these fairly efficient. It means that if you're
joining together lots of query conditions on columns with discrete values,
it can be reduced to bitwise operations on these very long bitmaps; on a
modern CPU the actual expanded bitmap might only actually end up in L1,
and the CPU can crank through them at 1.6GHz * 64 * 4 (or however many
ALUs your CPU has); still usually limited by IO capability of course.

For "data mining" applications, this saves a *lot* of time, sometimes
multiple orders of magnitude.

And that's still simple. The OLAP query re-writing is even funkier.

OLAP is a generic term for a large range of technologies, so there
seems to be some things in there labelled for OLAP. There are lots of
tricks that solve the goal of OLAP, no doubt most of which I am ignorant
of.

But in particular, one thing that Oracle does really nicely is the way
you can make a view "materialized" - ie, the computed view is kept around,
rather than being generated as needed. Then, when you perform queries
on the original table that Oracle figures out could use the computed view
to avoid looking at the original table (or improve speed by using an
index, perhaps), then it transparently re-writes the query to instead use
the materialized view (assuming you know how to flick all the relevant
switches that only advanced Oracle DBAs can reach).

The upshot of that is that you can take virtually any regularly repeated
query, or hopefully a wide range of common queries, and manually help the
database along by telling it what to pre-calculate. And you don't even
have to 're-run' the queries when the source data changes - it has
support for minimally updating just the bits that changed.

Oracle certainly has a significant feature lead on Pg for data mining.

>>Without those, our database would run like cold treacle.
> well, there are a bunch of SQL 'features' not present
> in Oracle either ... so it really depends on the requirements

Absolutely. I think Oracle stinks as a general purpose application
server back-end. It's buggy as a VW convention, heavy as a lead
elephant and as snappy as old celery for small transactions.

Sam.
_______________________________________________
Vserver mailing list
Vserver_at_list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver


About this list Date view Thread view Subject view Author view Attachment view
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Thu 28 Apr 2005 - 00:40:51 BST by hypermail 2.1.3