I'm testing OpenCDISC 1.5 with some dataset with Japanese characters.
OpenCDISC 1.5 can handle Japanese values correctly when it is passed by Dataset-XML format. (but currently .xpt format seems not to be supported yet).
But still there is a problem to validate datasets with non-ASCII characters. Checking controlled terminologies is done by parsing text files in config/data/CDISC/SDTM/yyyy-mm-dd/ folder, but there is no way to specify character encoding for these terminology files. It will be a barrier when users would like to check their data by localized terminologies.
So I wrote a small patch to enable specifying text encoding of terminology files.
As I failed to attach patch file to this forum post, please download patch file from this link: https://www.dropbox.com/s/ps5w8spk1p8no06/OpenCDISC_CTEncoding.patch
When this patch is applied, users can specify text encoding of terminology files by setting Engine.ControlledTerminology.FileEncoding property.
For example, if localized terminology file is encoded by EUC-JP encoding, users should append following line to lib/settings/settings.properties file.
Engine.ControlledTerminology.FileEncoding = EUC-JP
I will present some test dataset in my next post.
* Required Field