Releases: choishingwan/PRSice
Releases · choishingwan/PRSice
MAF calculation
Update Log
- Fix Segmentation fault related to MAF calculation
PRSet Clumping Fix
- Fixed bug in PRSet clumping which caused different results to be generated when different sets were used
- Fixed multi-threading memory detection problem. Multi-thread for set-based permutation should now work as expected
Set based permutation speed up, memory mapping and more
Update Log
- PRSice can now recognize gz files without without the gz suffix
- Now have better check in place for parameters related to distance
- Cleaning up some codes. Try to make the code base more readable
- Fix some memory leak problem related to #128, #131, #137
- Completely remove Pearson Correlation clumping. We don't have the manpower to maintain the code base, and Pearson correlation clumping does not provide enough benefit for us to consider supporting it.
- Add in memory map feature (
--enable-mmap
). When large amount of memory is available, and when all genotypes are stored in the same file,--enable-mmap
might help to speed things up a bit - Updated code for linear regression. Now adopted codes from RcppEigen
- Update internal variable types for all score printing. We can now generate an all score file for more sample and thresholds. We can allow 5.270498e+17 samples if there's one threshold, or 1.844674e+18 thresholds if there are 500k samples.
- Update internal variable types for p-value thresholding. Previously, if user require an ultra small step size (e.g. < 1e-20), PRSice will generate abnormal thresholds (see here). To accommodate this use case, PRSice will now detect whether the number of threshold required exceed what we can store and use a slower alternative to generate the thresholds.
- Set based permutation were too slow to be practical. We perform some algebra tricks to speed the process up. For more detail, you can refer to our full manual.
Bug Fix
Update Log
- Now output number of SNPs in each threshold for
--no-regress
analyses - Fix problem related to bed file. Can now read bed file with chromosome format of chrX
- Fix problem related to
--maf
when--ld
is used. When--ld
is used, the calculation of--maf
is done on the target file but using the SNP location within the reference, which causes problems.
GLM improvement
Update Log
- Fix problem with glibc error with 2.2.4. This was due to an array out of bound error, which is now fixed
- We now adapted the code from the fastglm and speedglm package. For 100k samples and no covariates, we are expecting around 50% increase in speed. This new implementation also allow us to incoporate different link function and distribution "family" in the future if those functions are required.
--use-reef-maf
function is also available now.--use-ref-maf
will cause the missingness imputation to be performed using the maf of the reference sample instead of the target.
2019-08-05
- Temporary upload the PRSet no memory check version for linux
BGEN bug fix
Update Log
- Fix #125, BGEN encoding was wrong for SNPs that are 100% confident to be the non-effective allele.
- Fix problem of
--remove
and--keep
with BGEN file when the sample ID is embedded within the BGEN file - Fix problem related
--keep-ambig
where base file filtering was unexpectedly affected by the--keep-ambig
flag - Now re-implemented the check for 100% genotype missingness.
- PRSice should now correctly terminate when none of the phenotypes were identify in the phenotype file
- SNPs extraction / exclusion are now performed when reading the base file. This should provide a small performance boost
2019-07-29
- Trying to fix problem of glibc error w.r.t the linux binary
Bug Fix
Update Log
- Fix problem with
--base-maf
and--base-info
flag with the Rscript - PRSice will now only issue a warning instead of error out when the info flag or the MAF flags isn't found in the base
- Fix the log message output
- Minor improvement to the file read during MAF calculation. Should in theory speed up the MAF calculation and avoid bed file read error
Bug Fix
Update Log
- Disabled
--pearson
as it is currently bugged and I don't have time to fix it - Fix calculation of MAF when
--maf
is used. (for binary plink format) - Fix LD calculation for BGEN files
- BGEN
--hard
results should now produce identical results as those generated from the plink2 -> bed -> PRSice pipeline - Disabled
--msse4
optimization for window binary as some Windows machine cannot run such binary - For OS X binary, it is noted that machine more than 4 years of age will not be able to run the shipped binary. You will need to compile the software yourself.
Minor Bug fix
Update Log
- Fix problem with the Rscript
- Fix problem with
--x-range
when no location information were provided in the summary statistic file - Update default of clumping.
--clump-kb
now reset to 250kb for PRSice and is 1mb for PRSet - Cleaer information regarding the unit for distance parameters (
--clump-kb
default is kb,--wind-5
and--wind-3
has default of bp). In most case, if you include the unit, PRSice should always be able to recognize it (e.g. B,M,G,BP,MB,GB etc) - Some minor update to the multi-threading w.r.t permutation analysis
- When SNP sets are provided, PRSice should now correctly consider PRSet is activated
NOTE
- We currently found that --pearson will generate incorrect result. Please refine from using it at the moment.
- We have now uploaded the correct version for linux
Warning
- We found that
--pearson
is currently wrong - We also found that bgen doesn't produce the expected results
Major Release: Better Integration of PRSet
Update Log
General
- Standardize command line parameters. For any parameters that act on files other than target, they will contain a prefix of the file name. For example,
--base-info
will perform INFO score filtering on base file,--ld-info
will perform INFO score filtering on the LD reference file and--info
will perform INFO score filtering on the target file. - Changed
--cov-file
and--pheno-file
to--cov
and--pheno
because I am lazy - Removed
--se
and--prslice
because we don't use those options. Might add them back when we introduce new function - Add
--id-delim
to allow more flexible control of sample ID concatenation --maf
and--ld-maf
calculation now restricted in founder similar to PLINK.- Restructured the code to allow easier diagnosis
- Add full unit testing for some of the classes, such as Region and SNP. Don't have time for all other classes.
- Slight optimization of the GLM algorithm.
- Executable for OS X and Linux are now compiled with Intel MKL library, which should provide some speed boost
- Fix some of the usage and log messages
- Update and reorganized our user manual
Default Changed
- Default for
--clump-kb
changed to--clump-kb 1M
from--clump-kb 250K
- Default for
--lower
changed from--lower 0.0001
to--lower 5e-08
PRSet
- Add documentation for
--wind-3
and--wind-5
, which pad each genomic regions at the 3' or 5' end respectively. (was available since 2.1.9, but forgot to provide document) - Combine
--snp-set
and--snp-sets
into--snp-set
. PRSet will now automatically detect if the input contain one column (therefore the whole file is one gene set), or if the input contain more than one column (therefore each row is one gene set). - Add documentation for
--background
. Use--background
to specify a background region for competitive p-value calculation - Add parameter of
--full-back
, which info PRSet to use the whole genome as the background
Note: if
--full-back
and--background
isn't provided, and--gtf
and--set-perm
is specified, we will use the GTF file to construct the background. If--gtf
is missing, then we cannot perform competitive p-value calculation
BGEN
- Change
--hard-thres
and--ld-hard-thres
parameter. They are now use to specify the hardcall threshold. i.e. A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved. See out documentation for more information - Add
--dose-thres
and--ld-dose-thres
parameter. They are similar to our old--hard-thres
, for any SNPs, if the highest probability of any dosage is less than what's specified in--dose-thres
, it will be set as missing. - We have performed manual check. Scores generated from PRSice when
--hard
is used are now identical to those generated from PLINK. Scores generated using dosage also have high correlation with those generated from--hard
. - Support both SNP_ID and RS_ID for BGEN format. If RS_ID not found in base, we will try to match with SNP_ID