Hi - clearly, it'd be great if Wikipedia had better performance.
I looked at some of the "Database benchmarks" postings,
but I don't see any analysis of what's causing the ACTUAL bottlenecks
on the real system (with many users & full database).
Has someone done that analysis?
I suspect you guys have considered far more options, but as a
newcomer who's just read the source code documentation, maybe
some of these ideas will be helpful:
1. Perhaps for simple reads of the current article (cur),
you could completely skip using MySQL and use the filesystem instead.
Simple encyclopedia articles could be simply stored in the
filesystem, one article per file. To avoid the huge directory problem
(which many filesystems don't handle well, though Reiser does),
you could use the terminfo trick.. create subdirectories for the
first, second, and maybe even the third characters. E.G., "Europe"
is in "wiki/E/u/r/Europe.text". The existence of a file can be used as
the link test. This may or may not be faster than MySQL, but
it's probably faster: the OS developers have been optimizing
file access for a very long time, and instead of having
userspace<->kernel<->userspace interaction, it's
userspace<->kernel interaction. You also completely avoid
locking and other joyless issues.
2. The generation of HTML from the Wiki format could be cached,
as has been discussed. It could also be sped up, e.g., by
rewriting it in flex. I suspect it'd be easy to rewrite the
translation of Wiki to HTML in flex and produce something quite fast.
My "html2wikipedia" is written in flex - it's really fast and didn't
take long to write. The real problem is, I suspect that
isn't the bottleneck.
3. You could start sending out text ASAP, instead of batching it.
Many browsers start displaying text as it's available, so to
users it might _feel_ faster. Also, holding text in-memory
may create memory pressure that forces more useful stuff out of
memory.
Anyway, I don't know if these ideas are all that helpful,
but I hope they are.