Skip to content

Releases: choishingwan/PRSice

MAF calculation

17 Sep 15:31
Compare
Choose a tag to compare

Update Log

  • Fix Segmentation fault related to MAF calculation

PRSet Clumping Fix

16 Sep 19:16
Compare
Choose a tag to compare
  • Fixed bug in PRSet clumping which caused different results to be generated when different sets were used
  • Fixed multi-threading memory detection problem. Multi-thread for set-based permutation should now work as expected

Set based permutation speed up, memory mapping and more

11 Sep 15:44
Compare
Choose a tag to compare

Update Log

  • PRSice can now recognize gz files without without the gz suffix
  • Now have better check in place for parameters related to distance
  • Cleaning up some codes. Try to make the code base more readable
  • Fix some memory leak problem related to #128, #131, #137
  • Completely remove Pearson Correlation clumping. We don't have the manpower to maintain the code base, and Pearson correlation clumping does not provide enough benefit for us to consider supporting it.
  • Add in memory map feature (--enable-mmap). When large amount of memory is available, and when all genotypes are stored in the same file, --enable-mmap might help to speed things up a bit
  • Updated code for linear regression. Now adopted codes from RcppEigen
  • Update internal variable types for all score printing. We can now generate an all score file for more sample and thresholds. We can allow 5.270498e+17 samples if there's one threshold, or 1.844674e+18 thresholds if there are 500k samples.
  • Update internal variable types for p-value thresholding. Previously, if user require an ultra small step size (e.g. < 1e-20), PRSice will generate abnormal thresholds (see here). To accommodate this use case, PRSice will now detect whether the number of threshold required exceed what we can store and use a slower alternative to generate the thresholds.
  • Set based permutation were too slow to be practical. We perform some algebra tricks to speed the process up. For more detail, you can refer to our full manual.

Bug Fix

05 Aug 19:14
Compare
Choose a tag to compare

Update Log

  • Now output number of SNPs in each threshold for --no-regress analyses
  • Fix problem related to bed file. Can now read bed file with chromosome format of chrX
  • Fix problem related to --maf when --ld is used. When --ld is used, the calculation of --maf is done on the target file but using the SNP location within the reference, which causes problems.

GLM improvement

30 Jul 19:46
Compare
Choose a tag to compare

Update Log

  • Fix problem with glibc error with 2.2.4. This was due to an array out of bound error, which is now fixed
  • We now adapted the code from the fastglm and speedglm package. For 100k samples and no covariates, we are expecting around 50% increase in speed. This new implementation also allow us to incoporate different link function and distribution "family" in the future if those functions are required.
  • --use-reef-maf function is also available now. --use-ref-maf will cause the missingness imputation to be performed using the maf of the reference sample instead of the target.

2019-08-05

  • Temporary upload the PRSet no memory check version for linux

BGEN bug fix

25 Jul 15:15
Compare
Choose a tag to compare

Update Log

  • Fix #125, BGEN encoding was wrong for SNPs that are 100% confident to be the non-effective allele.
  • Fix problem of --remove and --keep with BGEN file when the sample ID is embedded within the BGEN file
  • Fix problem related --keep-ambig where base file filtering was unexpectedly affected by the --keep-ambig flag
  • Now re-implemented the check for 100% genotype missingness.
  • PRSice should now correctly terminate when none of the phenotypes were identify in the phenotype file
  • SNPs extraction / exclusion are now performed when reading the base file. This should provide a small performance boost

2019-07-29

  • Trying to fix problem of glibc error w.r.t the linux binary

Bug Fix

15 Jul 15:16
Compare
Choose a tag to compare

Update Log

  • Fix problem with --base-maf and --base-info flag with the Rscript
  • PRSice will now only issue a warning instead of error out when the info flag or the MAF flags isn't found in the base
  • Fix the log message output
  • Minor improvement to the file read during MAF calculation. Should in theory speed up the MAF calculation and avoid bed file read error

Bug Fix

27 Jun 19:24
Compare
Choose a tag to compare

Update Log

  • Disabled --pearson as it is currently bugged and I don't have time to fix it
  • Fix calculation of MAF when --maf is used. (for binary plink format)
  • Fix LD calculation for BGEN files
  • BGEN --hard results should now produce identical results as those generated from the plink2 -> bed -> PRSice pipeline
  • Disabled --msse4 optimization for window binary as some Windows machine cannot run such binary
  • For OS X binary, it is noted that machine more than 4 years of age will not be able to run the shipped binary. You will need to compile the software yourself.

Minor Bug fix

04 Jun 21:17
Compare
Choose a tag to compare
Minor Bug fix Pre-release
Pre-release

Update Log

  • Fix problem with the Rscript
  • Fix problem with --x-range when no location information were provided in the summary statistic file
  • Update default of clumping. --clump-kb now reset to 250kb for PRSice and is 1mb for PRSet
  • Cleaer information regarding the unit for distance parameters (--clump-kb default is kb, --wind-5 and --wind-3 has default of bp). In most case, if you include the unit, PRSice should always be able to recognize it (e.g. B,M,G,BP,MB,GB etc)
  • Some minor update to the multi-threading w.r.t permutation analysis
  • When SNP sets are provided, PRSice should now correctly consider PRSet is activated

NOTE

  • We currently found that --pearson will generate incorrect result. Please refine from using it at the moment.
  • We have now uploaded the correct version for linux

Warning

  • We found that --pearson is currently wrong
  • We also found that bgen doesn't produce the expected results

Major Release: Better Integration of PRSet

17 May 14:54
Compare
Choose a tag to compare

Update Log

General

  • Standardize command line parameters. For any parameters that act on files other than target, they will contain a prefix of the file name. For example, --base-info will perform INFO score filtering on base file, --ld-info will perform INFO score filtering on the LD reference file and --info will perform INFO score filtering on the target file.
  • Changed --cov-file and --pheno-file to --cov and --pheno because I am lazy
  • Removed --se and --prslice because we don't use those options. Might add them back when we introduce new function
  • Add --id-delim to allow more flexible control of sample ID concatenation
  • --maf and --ld-maf calculation now restricted in founder similar to PLINK.
  • Restructured the code to allow easier diagnosis
  • Add full unit testing for some of the classes, such as Region and SNP. Don't have time for all other classes.
  • Slight optimization of the GLM algorithm.
  • Executable for OS X and Linux are now compiled with Intel MKL library, which should provide some speed boost
  • Fix some of the usage and log messages
  • Update and reorganized our user manual

Default Changed

  • Default for --clump-kb changed to --clump-kb 1M from --clump-kb 250K
  • Default for --lower changed from --lower 0.0001 to --lower 5e-08

PRSet

  • Add documentation for --wind-3 and --wind-5, which pad each genomic regions at the 3' or 5' end respectively. (was available since 2.1.9, but forgot to provide document)
  • Combine --snp-set and --snp-sets into --snp-set. PRSet will now automatically detect if the input contain one column (therefore the whole file is one gene set), or if the input contain more than one column (therefore each row is one gene set).
  • Add documentation for --background. Use --background to specify a background region for competitive p-value calculation
  • Add parameter of --full-back, which info PRSet to use the whole genome as the background

Note: if --full-back and --background isn't provided, and --gtf and --set-perm is specified, we will use the GTF file to construct the background. If --gtf is missing, then we cannot perform competitive p-value calculation

BGEN

  • Change --hard-thres and --ld-hard-thres parameter. They are now use to specify the hardcall threshold. i.e. A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved. See out documentation for more information
  • Add --dose-thres and --ld-dose-thres parameter. They are similar to our old --hard-thres, for any SNPs, if the highest probability of any dosage is less than what's specified in --dose-thres, it will be set as missing.
  • We have performed manual check. Scores generated from PRSice when --hard is used are now identical to those generated from PLINK. Scores generated using dosage also have high correlation with those generated from --hard.
  • Support both SNP_ID and RS_ID for BGEN format. If RS_ID not found in base, we will try to match with SNP_ID