See the postgresql wiki for basics.
I have a corpus of around 90K hands in about 1100 files, all PokerStars NLHE. To test under different settings I followed this procedure:
- recreate the tables in fpdb,
- restart postgresql,
- restart fpdb and pipe the output to a file. $ ./fpdb.py | tee output
This doesn't get fsynced often and perhaps not even until you quit fpdb, so don't expect interim results.
- bulk import the folder. select "don't drop indices" (this feature is currently broken)
there's no progress meter or anything - stdout gets written to occasionally and top should show fpdb and postgres working.
- used these [[file:speeds1.py]] and [[file:speeds2.r]] to parse and create the graphs, usage $ ./speeds1.py < output > speeds && R —vanilla —slave < speeds2.r
- rename Rplot001.png to avoid overwriting it on next test
I tried it under three conditions…
Here's a graph of hands per second with the default postgresql config (ubuntu 8.04)
The most worrying and obvious feature is the sharp drop at 15k hands. It might've been a sudden load on the pc but we'll shortly see it wasn't.
This is the second thing recommended in the wiki.
Setting the effective cache size to 1GB didn't affect things at all.
The wiki recommends something like 1/4 of total memory for this.
I held off doing this first because when I just set it in postgresql.conf it wouldn't start up, and upon further reading it involved messing with some kernel parameters which I've never done before. In the end setting the linux kernel.shmmax parameter wasn't so hard. Get the current settings:
$ sysctl -A | grep kernel.shm
kernel.shmmax = 33554432
kernel.shmall = 2097152
which is 32MB and 8GB (shmall is in 4096 byte pages) respectively. (i.e. shmall is already plenty big enough) I have 6GB memory so according to the wiki I want something for shared_buffers (and hence shmmax) more like:
1GB = 1073741824 bytes
2GB = 2147483648 bytes
so for example:
$ sudo sysctl -w kernel.shmmax=2147483648
This needs to be bigger than what I set in postgresql.conf. So then set shared_buffers=1536MB in postgresql.conf and restart postgresql.
This is much, much better. Note the total time (right hand graph) is about 1/3.
dropping indexes 1
Reran with 'dbtest=True' (Carl's executemany() thing) and got drop postgresql indexes to work; this is with server set up as above.
dropping indexes 2
Reran with 'dbtest=True' and changed index dropping so that hands_gametypesid_idx is not dropped; even better! (The reason I chose not to drop this one is because it is used every time a hand is inserted in the duplicate lookup).
I was multitasking a bit in the middle there.