Design pattern: Observation grab bags
Issue
From time to time, data models will make the use of generic Observation classes as "name/value pair" constructs to allow the capture of "miscellaneous" or "other" data elements which may not be easily expressed within the framework of the base data model. There need to be guidelines on when it is appropriate to use such classes, what sorts of constraints should be placed on their use when they are used, and when it is more appropriate to model data explicitly.
The tendancy to use name/value pairs has been described as PertinentInformationism.
Discussion
Premises
1. It is always possible to express any data element using the RIM. In the worst case, the RIM can adjusted via harmonization to provide the necessary support.
2. It is equally possible to express any data model as a set of related Observations where all semantics are expressed through Observation.code and Observation.value
The extreme of approach 1 leads to in-efficient implementations, extremely large, complex models and unhappy implementers.
The extreme of approach 2 leads to a datamodel that takes little or no advantage of the RIM and results in substantially reduced likelihood of interoperability and increased effort to map between implementations
Considerations
- All other things being equal, explicit modeling improves chances for interoperability and is therefore preferable
- Explicit modeling inevitably means a loss of some efficiency. An "observation-based" approach should not be selected solely because a conveying a desired data element requires walking through 4 or 5 'empty' classes to reach it.
- The "inefficiency" argument associated with explicit modeling will be significantly reduced by a forthcoming ITS which allows for serialization of a collapsed model that avoids the need to communicate most, if not all empty classes.
- Messages which include 'generic' observation classes with broadly defined vocabulary domains will find that those classes become tempting targets for localized extension of messages, rather than using more explicit model-based localization approaches.
- As a result, it is better to create tighter Observation classes that ensure that only information related to the "desired" grab bag area can be included. For example, a domain name such as "FinancialAdjustmentModifierTypes" is better than a domain name "RelatedInformationTypes".
- Factors which tend towards explicit modeling:
- attributes associated with explicit general business rules or which need to be available to low level processes such as security, audit, lookups and preliminary screening
- data requirements which are well known and static
- data requirements which are consistent in interpretation and requirements across jurisdictions (note that it doesn't mean all jurisdictions must support the element - that's managed through optionality. What matters is that if an element is supported, jurisdictions typically need to express the same sort of information
- data elements that are likely to need to appear as explicitly positioned and labeled elements in a user interface (as opposed to in a generic list of miscellaneous elements)
- situations where the attributes of "Observation" other than 'code' and 'value' are either nonsensical or are extremely unlikely to ever be relevant
- situations where there is a clear way to model the data element accoding to the RIM or where the data concept is already explicitly modeled in other domain models. (Note that this criteria requires a degree of modeling skill and knowledge of other domains. Modelers are encouraged to bring situations where they're not sure how to model a given concept forward to MnM.)
- Factors which tend towards the use of general observations:
- requirements which cannot be nailed down at the time of design but which are expected to appear or evolve rapidly during a systems lifetime
- requirements where wide variation in requirements is expected across jurisdictions
- situations where the data can be handled generically by the receiver. Works best if the receiver can safely ignore repetitions it doesn't understand
- situations where the data will tend to be displayed in a list with other repetitions of the observation rather than each repetition being assigned to explicitly labeled and positioned fields
- elements which are not currently supported by the RIM, are extremely domain-specific and are unlikely to make sense, even from a generic perspective, in any other domain. (Note that this situation is quite unusual and it's usually best to bring the concept to MnM for evaluation before assuming that "no-one could ever use anything like this".)
- data elements where the machinery of "Observation" (e.g. effectiveTime, author, methodCode, etc.) makes sense and is required by the use-case or can be reasonably be expected to be needed in the future.
Examples
In order to bring this to closure, or even to have effective discussion on it, we need a set of examples -- with some candidate "good practice" and candidate "bad practice", ideally from an online version of the ballot.
Resolution
<Awaiting discussion on a conference call or WGM session>