Given the generic nature of RIMBAA databases - are there impacts on performance? Isn't the number of joins causing issues?
Ewout: Both Ernst and we (and maybe Michael as well) have cut down on queries by limiting the number of queries to 1 per class (or hierarchy,depending on your mapping strategy) in the RIM. Something like this:
- Query the Act table for all Acts for this patient (you might need to denormalize the patientid here) within some timeconstraint (for example)
- Query the ActRelationship table for ActRelationships belonging to the Acts queried in the first step.
- Query the Participation table for all Participations relevant to the Acts queried before
Because of its built-in identity-map hibernate will recombine the results of these separate queries into a connected tree of objects. This way you do not need to send queries for each level of acts with "deeper" connected acts.
Of course this will not cater for more complex queries with constraints on the combination of acts, but this should be good enough to do a first rough selection using the database server. Further detailed queries can then be done in-memory on the application server once the queries have run. Since appservers generally scale better than database servers, this seems quite acceptable to me.
- Bertil Reppen: put the subject id and author Id in the act itself (as attributes). And have participation table. Then joins are OK.
- Andy Harris: Yes, querying is complex and relatively slow because of recursion, but that's why there should be a BI stack for heavy report lifting. at UK Biobank, we were 'crosstabbing' 200,000 patients and approx 600 data points from an act table within approx 8 minutes - 120,000,000 data points
(relevant extract to be made)
- [18/09/2008 15:45:23] Rene Spronk says: 1. How about performance? - if one uses the RIM as the basis for db tabes, people realize there will be lots of joins, with a performance hit. What's the best practice here ?
- [18/09/2008 15:48:35] Grahame Grieve says: joins shjoins. Who cares. db's are pretty good at these now-a-days. Unless you have a real domain model that establishes a transaction model boundary you can cache around, it's better to use the db server
- [18/09/2008 15:49:06] Grahame Grieve says: sole exception for me is recursive joins - they're not so good. Which is one reason I've deliberately excluded all recursion from the datatypes in R2
- [18/09/2008 15:55:44] Alex Zupan says: 1) we are testing different strategies, different db,different hibernate mapping ..., to reach the optimal solution.
- We tried the following hibernate mapping strategies :
- -- Table for class hierarchy (Javasig default mapping)
- -- Table for concrete class
- But in any case there are too many join.
- and now will try also:
- -- Table for subclass
- [18/09/2008 15:57:32] Alex Zupan says: I think that in any case there is to do a little tuning on queries on lazy loading and fetch mechanism of hibernate in order to achieve acceptable speed...
- [18/09/2008 15:57:42] Grahame Grieve says: Alex, I will be interested to hear your results - but surprised if anything you do makes much fundamental difference, since the problem is driven by the complexity of the data, not the layout of it
- [18/09/2008 16:01:01] Alex Zupan says: yes, but a little tuning can increase performance, such as default hibernate mapping of Javasig has many relations with lazy = "false". This leads to do many unnecessary queries.