Character Set used in v2 messages
HL7 doesn't have its own character set. It has a mechanism for escaping multibyte characters, this is mainly/only used by systems that would otherwise mangle a multibyte character (e.g. lots of US 7-bit ASCII systems)..
UTF-8 is the de-facto standard encoding for v2 messages in North America, in Europe it's ISO 8859-1 (Latin-1). UTF-8 is the commonly used encoding for UNICODE. Note that UNICODE is an example of a character set, it is not a character encoding. Use "UNICODE UTF-8" in MSH.18 and you're all set.
See UNICODE FAQ for details about UNICODE and its encodings (UTF-8, UTF-16, UTF-32)
Note: HL7 2.x versions prior to version 2.8 indicated that any of the UNICODE encodings are acceptable in the HL7 messages. After some practical experience with using UNICODE, it was determined that only UTF-8 can be safely used as a UNICODE encoding in HL7 messages and as of HL7 version 2.8 only UTF-8 is listed as an allowed encoding. This applies to earlier versions of HL7 messages as well: in order to insure interoperability, UTF-16 or UTF-32 SHALL NOT BE USED IN HL7 v2.x messages.
Hints
- Try to avoid using operating system specific character pages (e.g. Windows cp1252, Mac code pages, EBCDIC variations)