d Douglas
on

 

Anyone knows what is the size limitation of this tool to handle SAS files?

Are there any limits to the size of the file it can process.

Let me know. Thanks in advance.

Forums: Troubleshooting and Problems

t Tim
on August 13, 2009

Hello, The study size limitation of the Validator depends on a number of factors, including the number and type (findings vs events vs interventions) of datasets in a study, the number and type of rules being executed, and hardware configuration. However, the Validator is designed for scalability so with proper hardware and environment setup it should be able to handle studies with 10 million, 100 million, and even 1 billion observations. Probably the most important limiting factor is the amount of RAM memory available to the Validator, which is needed to execute Lookup validation rules. For example Janus rule IR4500 identifies non-Demographics domain subjects that are not found in the Demographics domain. In order to perform this validation our tool uses a Lookup rule that generates an in-memory cache of unique Demographics USUBJIDs for optimum performance. So the larger the study, the more data needs to be cached, and thus more RAM is needed by the Validator. The Beta version of the Validator that is currently downloadable from our site is limited to just 512 MB of RAM. This is a default desktop-friendly configuration designed not to overwhelm anyone's laptop or PC. In our performance tests, we were able to process a study with 1 million records under this limited configuration. Granted, each study is different, there may be circumstances where a given study behaves better or worse than what we've seen so far. The only other known limitation is that a particular dataset file cannot have more than 231 - 1 observations (2.1 billion), but we are able to increase this to 263 - 1 if needed. There is no known limitation on the actual byte-size of the file. As you can see, size limitation is a resource-dependent issue, which could be easily scaled via proper hardware and environment configuration. Hopefully that answers your question. Regards, Tim
d David
on December 15, 2009

Hi Tim,

I have a VS domain with 850K records and a SUPPVS domain with 850K records. When I run the OpenCDISC validator the process procedes about half way then just hangs ( I waited more than 2 hours). I have 4GB RAM on a dual core Dell machine.

 

Regards,

David

m Max
on December 15, 2009

Hi David,

Try this.

Edit the client.bat file. You will see the following:

START /B javaw -Xms256m -Xmx768m -jar lib/validator-gui-1.0-beta-1.jar

The highlighted number is what controls how much of your system RAM will be used for the Validator. Since you have 4GB you can increase it to use 3GB (75% of total memory). So, just change it to -Xmx3072m.

Give this a try and let me know what happens.

Also, we are currently doing performance testing and tuning and in a couple of weeks will have better guidence around what the file size limits are on various hardware configurations.

If you have some large datasets and would like to help out, let me know.

Thanks,

Max

 

 

 

 

d David
on December 15, 2009

Hi Max,

I was only able to increase the value to 1024. WHen I used anything above 1280 I get the message:

Java Virtual Machine Launcher

Could not create the Java Virtual machine.

I would be interested in testing with larger datasets - we often have LB and VS around 800 MB uncompressed.

REgards,

David

m Max
on December 18, 2009

Hi David,

We have identified a bug that is causing this issue. It has to do with rule R4083 and affects any SUPPQUAL dataset with around 300,000 records or more. We are working on a fix and will release it in a few days.

Thanks,

Max

m Max
on December 30, 2009

Hi David,

 

We have fixed the bug that was causing this issue. Please download the latest version at

 

http://www.opencdisc.org/downloads/opencdisc-validator-1.0-beta-3-bin.zip

 

We tested it on a study with the largest SUPPQUAL dataset having 2 million records.

Please give it a try and let me know if it works for you.

 

Thanks,

Max

 

 

 

e Ed
on January 29, 2010

Hi

the advice given is incorrect , the problem lies with the maxmuim ammount of memory available for the VM. Using the java JVM there is a fundamental limit of 2Gb of ram per JVM process, this 2Gb has to also run the JVM which has a footprint of about 450mb for java 1.6 . Hence the maximum value of memory you can set for Xmx is about 1550m , the best i have ever got is 1570m. You can do some tweaks to windows to split more ram for programs by setting "/3GB" in the windows boot ini but that dosn't increase the JVM memeory limit , just ensures there is a full 3Gb avaialable for the JVM to grab 2Gb ( otherwise windows splits the ram into 2Gb for programs and 2 gb for windows , which would explain why you cannot alloacte more than about 1024m)

Having done develolment work on handling submission datasets, i suggest you devote extensive time to making sure there are no memory leaks and tighting up memory usuage, and get some very large test data sets (in the region of 5 - 15 Gb of is not uncommon) to do scalability tests. In the end we had to adopt a streaming archtecture, which although slow , was infinatley scalable.

m Max
on January 29, 2010

Hi Ed,


Thanks for the information, but it needs to be put in the right context. The limit that you describe only applies to a 32-bit JVM running on a 32-bit operating system.  When using the Validator on a 64-bit machine the 2GB limit disappears. For example, my laptop runs a 64-bit version of Vista and I can run the Validator with the memory setting of 3GB or more. Please refer to Sun’s website for more information:


http://java.sun.com/performance/reference/whitepapers/tuning.html#section4.1.2


One important thing to remember is that the Validator is designed to run on variety of software and hardware configurations, so the settings and limitations will vary depending on a given environment. With our upcoming production release of the Validator we plan to document these in great detail.


On a subject of architecture, the Validator does indeed use a streaming architecture, which is actually pretty fast and extremely scalable. If you download the latest Beta release, you will see that it can validate millions of records under the smallest of settings (32-bit OS, 1024m of memory).


http://www.opencdisc.org/downloads/opencdisc-validator-1.0-beta-3-bin.zip


If you run into a scenario where the Validator doesn’t scale, please let us know so we can fix any outstanding issues before we go into production.


Thanks again for your feedback,
Max

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.