-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lr_get_spec with csv mixed with others #53
Comments
Thanks for the report! The use case you are describing is already possible in lightr: library(lightr)
spec <- lr_get_spec(
system.file("testdata", package = "lightr"),
ext = c(
"txt", "csv"
)
)
#> 6 files found; importing spectra:
#> Warning in value[[3L]](cond): Parsing failed.
#> Please a different value for "sep" argument
#> Warning: Could not import one or more files:
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/spec.csv
spec <- lr_get_spec(
system.file("testdata", package = "lightr"),
ext = c(
"txt", "csv"
),
sep = ","
)
#> 6 files found; importing spectra:
head(spec)
#> wl CRAIC_export OOusb4000 OceanView UK5 avasoft8 spec
#> 1 300 5.710900 8.148667 -1.759185 64.27088 252.16336 9.885286
#> 2 301 5.794053 8.090286 -2.337772 52.36948 0.00000 7.078429
#> 3 302 4.326612 8.085000 -2.361941 55.71795 74.36793 1.991190
#> 4 303 4.178850 8.042000 -2.142622 36.30163 34.58589 5.415524
#> 5 304 3.671765 7.959000 -2.577687 43.05407 101.20355 11.820286
#> 6 305 2.784688 8.022333 -2.478087 42.62310 0.00000 1.497571 Created on 2023-06-09 with reprex v2.0.2 In fact, it's even something we're doing in our test suite: lightr/tests/testthat/test_getspec.R Lines 4 to 16 in 82bebc3
However, as you rightfully guessed, this can cause issues for spectra exported on machines on a non-English locale which may use "," as a decimal separator (examples). This is the reason why we don't put "," in the defaults for The worst part about it is that library(lightr)
# Fails if we don't specify dec = ","
lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt"
)
#> 2 files found; importing spectra:
#> Warning in storage.mode(rawsplit) <- "numeric": NAs introduced by coercion
#> Warning in value[[3L]](cond): subscript out of bounds
#> Warning in storage.mode(rawsplit) <- "numeric": NAs introduced by coercion
#> Warning in value[[3L]](cond): subscript out of bounds
#> Warning: File import failed.
#> Check input files and function arguments.
#> NULL
# Works BUT RETURNS WRONG RESULT if we passed sep = ","
spec_wrong <- lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt",
sep = ","
)
#> 2 files found; importing spectra:
head(spec_wrong)
#> wl OO_comma OceanView_nonEN
#> 1 300 407 453
#> 2 301 426 434
#> 3 302 520 411
#> 4 303 634 412
#> 5 304 691 410
#> 6 305 726 422
# Works once we say decimal = ","
spec_correct <- lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt",
decimal = ","
)
#> 2 files found; importing spectra:
head(spec_correct)
#> wl OO_comma OceanView_nonEN
#> 1 300 5.500324 0.4324595
#> 2 301 5.398784 0.4268919
#> 3 302 5.428368 0.4334054
#> 4 303 5.589730 0.4080000
#> 5 304 5.593622 0.4225405
#> 6 305 5.678289 0.4128378 Created on 2023-06-09 with reprex v2.0.2 A marginally better & safer solution would be to set This will also come with a dedicated parser for csv files, which will lead to better performance, and if all goes well, the ability to also read csv files with After this change, you might still not be able to parse files with "." and "," decimal point in a single call but you should at least be able to parse csv files without running the risk of creating of wrong output for files with "," decimal point. |
Thanks Hugo,
I’ve seen this problem many times with csv files.
Pierre-Paul
From: Hugo Gruson ***@***.***>
Sent: Friday, June 9, 2023 1:00 PM
To: ropensci/lightr ***@***.***>
Cc: Bitton, Pierre-Paul ***@***.***>; Author ***@***.***>
Subject: Re: [ropensci/lightr] lr_get_spec with csv mixed with others (Issue #53)
Thanks for the report!
The use case you are describing is already possible in lightr:
library(lightr)
spec <- lr_get_spec(
system.file("testdata", package = "lightr"),
ext = c(
"txt", "csv"
)
)
#> 6 files found; importing spectra:
#> Warning in value[[3L]](cond): Parsing failed.
#> Please a different value for "sep" argument
#> Warning: Could not import one or more files:
#> /home/hugo/.local/share/R/x86_64-pc-linux-gnu-library/4.3/lightr/testdata/spec.csv
spec <- lr_get_spec(
system.file("testdata", package = "lightr"),
ext = c(
"txt", "csv"
),
sep = ","
)
#> 6 files found; importing spectra:
head(spec)
#> wl CRAIC_export OOusb4000 OceanView UK5 avasoft8 spec
#> 1 300 5.710900 8.148667 -1.759185 64.27088 252.16336 9.885286
#> 2 301 5.794053 8.090286 -2.337772 52.36948 0.00000 7.078429
#> 3 302 4.326612 8.085000 -2.361941 55.71795 74.36793 1.991190
#> 4 303 4.178850 8.042000 -2.142622 36.30163 34.58589 5.415524
#> 5 304 3.671765 7.959000 -2.577687 43.05407 101.20355 11.820286
#> 6 305 2.784688 8.022333 -2.478087 42.62310 0.00000 1.497571
Created on 2023-06-09 with reprex v2.0.2<https://reprex.tidyverse.org>
In fact, it's even something we're doing in our test suite:
https://github.com/ropensci/lightr/blob/82bebc3a72d9f0dc6215140a949476e0ef9176ed/tests/testthat/test_getspec.R#L4-L16
However, as you rightfully guessed, this can cause issues for spectra exported on machines on a non-English locale which may use "," as a decimal separator<https://en.wikipedia.org/wiki/Decimal_separator> (examples<https://github.com/ropensci/lightr/tree/82bebc3a72d9f0dc6215140a949476e0ef9176ed/inst/testdata/non_english>). This is the reason why we don't put "," in the defaults for sep alongside tabs, spaces and ",".
The worst part about it is that lr_get_spec() will successfully return but the output will be completely wrong:
library(lightr)
# Fails if we don't specify dec = ","
lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt"
)
#> 2 files found; importing spectra:
#> Warning in storage.mode(rawsplit) <- "numeric": NAs introduced by coercion
#> Warning in value[[3L]](cond): subscript out of bounds
#> Warning in storage.mode(rawsplit) <- "numeric": NAs introduced by coercion
#> Warning in value[[3L]](cond): subscript out of bounds
#> Warning: File import failed.
#> Check input files and function arguments.
#> NULL
# Works BUT RETURNS WRONG RESULT if we passed sep = ","
spec_wrong <- lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt",
sep = ","
)
#> 2 files found; importing spectra:
head(spec_wrong)
#> wl OO_comma OceanView_nonEN
#> 1 300 407 453
#> 2 301 426 434
#> 3 302 520 411
#> 4 303 634 412
#> 5 304 691 410
#> 6 305 726 422
# Works once we say decimal = ","
spec_correct <- lr_get_spec(
system.file("testdata", "non_english", package = "lightr"),
ext = "txt",
decimal = ","
)
#> 2 files found; importing spectra:
head(spec_correct)
#> wl OO_comma OceanView_nonEN
#> 1 300 5.500324 0.4324595
#> 2 301 5.398784 0.4268919
#> 3 302 5.428368 0.4334054
#> 4 303 5.589730 0.4080000
#> 5 304 5.593622 0.4225405
#> 6 305 5.678289 0.4128378
Created on 2023-06-09 with reprex v2.0.2<https://reprex.tidyverse.org>
A marginally better & safer solution would be to set sep = "," only for csv files. We would then limit the number of files that could have a wrong result. It is not possible with the current codebase but I am working on a version where default options for parsers (dec, sep and specnum) are not set globally by lr_get_spec() but can be set individually by each parser.
This will also come with a dedicated parser for csv files, which will lead to better performance, and if all goes well, the ability to also read csv files with decimal = "," (to be confirmed).
After this change, you might still not be able to parse files with "." and "," decimal point in a single call but you should at least be able to parse csv files without running the risk of creating of wrong output for files with "," decimal point.
—
Reply to this email directly, view it on GitHub<#53 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AATBJ5VPV2ZUZDMWWEFJ7Y3XKM6PLANCNFSM6AAAAAAZARUVS4>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
lr_get_spec()
has challenges when using multiple extensions.When ‘csv’ files are included with others e.g. ‘Master.Transmission’ we have a problem because the ‘csv’ requires the
sep = ‘,’
argument.For:
you would need:
But this may affect how other file types are read
The text was updated successfully, but these errors were encountered: