Developer Notes

Developer Notes

Page for recording discussion on the intent of various classes.

Discussion on new storage classes (14/5/09)

Inserting into the database from/with Hand.py

  • Must have a reverse function! ie. Be able to assemble the Hand class from the database (given a handid) or from a HH file Separate the Hand insert from the Stats insert
  • Allows us to regenerate stats independently (see Analysis below)
  • Allows the possibility of updating stats directly in SQL, which may be more efficient
  • Possibly SQL procedure/stored query to update the HudCache
  • Ability to either do this 1 hand at a time (autoimport) or efficiently cache and insert mutiple hands (bulkimport)

Relevant chat comments:

<nutomatic> If the hudcache stuff could be delayed until the bulk was complete, we could save a mess of inserts
<Worros> One of the issues with the current code is that it does a bunch of selects on the database each hand before inserting
<Worros> Thats the biggest killer of our performance
<nutomatic> the fewer selects the better, even simple queries will add up when you are importing 400k hands
<nutomatic> When I was doing imports the other night, I noticed that mysql was using >5x the cpu of python. That is not right.
<_mt> well its better that mysql is cpubound than iobound
<Worros> Once you've inserted a bunch of hands its starts recalculating indexes, or its uses cpu looking up large tables for HandId without a hash lookup
<Worros> "select, do increment in python, update" is a massive performance hit

DerivedStats:(to be a member of Hand - ie. for a single Hand)
Specialisation for each game type Stud|Holdem|DrawDerivedStats - caclulating steal in Stud is different from holdem
Class will map very closely with HUD cache with a few extra details (perhaps?)

More on DerivedStats format from Ray on 15 July, 2009.

We have discussed making our queries look like this:
INSERT INTO handsplayers (handid, playerid, startCash, position) VALUES (%(handid)s, %(playerid)s, %(startCash)d, %(position)d);
with the corresponding python to look like this:
db.execute(query['hands_players_insert_query'], hp_dict)
where hp_dict = {'handid': "2222", 'playerid', "1111", 'startCash', …}
Using this form for parameter passing makes supporting other databases (notably SQLite) much easier. (because we can mung the queries into an SQLite-compatible form with a regex, or maybe several regexes). It also means that we just have to build a dict with the proper values to be inserted, and don't have to call execute with 50 parameters that have to be in the correct order.

Therefore, I suggest that we build the DerivedStats attrs so that they can be used directly in the execute statements above. To continue with the handsplayers example:
DerivedStats.hp = { 'OneBigFish' : {'playerid': '22222', 'startCash': 555, 'postion': None, …},
'2ndBigFish': {'playerid': …..}
}

Relevant chat comments:
<nutomatic> OK, so DerivedStats calculates all the 1 and 0s for calculating vpip, pfr, aggr freq and fact, checkraise, steal, defend, etc.
<Worros> Ok, lets solve a smaller part of the problem first then. What sort of class structure do we want to calculate derived stats from a Hand
<nutomatic> If it is somewhat parallel to the Hand structure it will simplify the code.
<Worros> Do we want different stats for each game? I think so, we definetly calculate the same stat differently in some cases - Stud vs Holdem steal comes to mind
<Worros> I'd suggest the 2 classes will be highly coupled
<_mt> um, I imagine individual Hands can easily expose a load of methods/attributes that tell you wther this Hand increments a players stat
<Worros> Not without calculating the stat in Hand
<Worros> I'm thinking a parallel set of classes DerivedStats - FlopDerivedStats StudDerivedStats DrawDerivedStats
<Worros> The DeriveStats class could be a member of Hand?
<_mt> its just a list or dict from stat names to true / false tho
<_mt> maybe there are some more complex ones ?
<Worros> The functions to calculate then wont be though
<_mt> yes you're right
<_mt> in that case it is nice to have it separate
<_mt> Hand is already quite big
<nutomatic> my gut is that analyze should be a separate class
<Worros> DerivedStats 1 hand, Analysis multiple hands perhaps?

Analysis.
Is the sum and additional analysis of n Hands

  • A close map to HudCache but with the ability to have additional queries and data structures
  • eg. Card distributions and EV calculations
  • Possibly SQL procedure/stored query to update the HudCache

Relevant chat comments:
<_mt> yep, Anaylsis is little more than a list
<_mt> that adds the DerivedStats up of each hand in it
<Worros> Analyse would be things like 'all hands in bb vs opponent x' i'm thinking
<_mt> well thats more like a select -> list of Hands, isnt it?
<Worros> select list of hands and total them
<Worros> That actually gives us Session if you think about it
<_mt> y Ray. y Carl, a more general Session isnt it, because it can be on any criteria
<Worros> Another thing I think we should try for is to keep the Hand insert and the stats insert separate
<Worros> Gives us the opportunity to possibly generate the stats directly from the database
<nutomatic> I think PT3 has a thing where they generate the hudcache from the db
<_mt> i can see why you would want to be able to generate stats for a subset of hands
<Worros> Could mean that HudCache could be configured to only work with 3 months of hands
<_mt> but why would you want to regenerate the entire db hud cache unless you thought you had some kind of bug
<nutomatic> yes, if would be nice to see my vpip by month, for example
<_mt> oh i see, yes.
<_mt> but then… you would want to update that one incrementally.
<_mt> if you had it for the last 10000 hands, you wouldnt want to regenerate from the last 10000 each time you add one :)
<_mt> if it was last 3 months… what happens, it regenerates entirely at midnight?
<Worros> I'm thinking temp db tables and HudCaches
<Worros> I'd suggest manually
<_mt> we're basically saying that an Analysis is going to be this big, and is going to be mapped relatively straightforwardly onto hud cache, arent we?
<Worros> I dont think Analysis would be limited to Hudcache - Card distributions and ev calcs come to mind
<nutomatic> you could also then use a trigger/stored proc to update the hudcache

Other stuff:
<_mt> fwiw i love the action-line filters of HEM
<nutomatic> can you tell me how those work
<_mt> but the interface to using them could use some Apple-like love and attention
<Worros> I'd like a wizard type session reviewer
<_mt> well, if you want to see all your hands where you faced a 3bet preflop, check 'filter by preflop' and select all the 'bet raise xxx' actions
<nutomatic> ah ok
<_mt> thats where it could do with some love … it doesnt distinguish between whether it was you or others involved
<_mt> and you're most interested in comparing subsets of things: like, compare 'bet raise fold' to 'bet raise call'

Index dropping

When 'drop' is auto the shaded region is where we currently attempt to drop the indexes and foreign keys
When you look at it like this, it probably makes more sense to "drop when files+hands < x"
Also, looking at numbers of files might not be the best indicator — total size of all files to import might be better. We have to be careful to only be totting up the sizes of actual history files though - total size is easier to fool if there's a big non-history file in the directory than number of files, which would require special attention to fool).
[[image:dropregion.png]]
Finally, the indexes are not recreated if the import is interrupted. This should be catered for, either
- the import should ideally be done in a try…finally block so that the indexes are guaranteed to be recreated
- fpdb should check all the indexes exist on startup or something.

web analytics

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License