-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with numbers #11
Comments
Hi, @maweed . Thank you for posting this issue! :-) I'm definitely open to your thoughts if you have any ideas! There's a few things you could do to increase the confidence of your results, but unfortunately they are a bit hackish. Check if Full Year Found in Text from date_extractor import extract_dates
text="The meeting will be held at paris Allé 6, 0208 paris. Election 30 of a chairperson in france."
dates = extract_dates(text)
# filter out if full 4-letter year doesn't match
dates = [date for date in dates if str(date.year) in text] Check Precision from date_extractor import extract_dates
text="The meeting will be held at paris Allé 6, 0208 paris. Election 30 of a chairperson in france."
dates = extract_dates(text, return_precision=True)
# filter out if only matched year and not month and day
dates = [date for date, precision in dates if precision != 'year'] Check If White Space Between Year, Month and Day Have a different set of rules for text versus filenames Thoughts? What would work for you? Also, open to pull requests if you want to make a contribution! :-) |
Thanks for your efforts to solve my issues, but unfortunately still not working for me, the first suggestion doesn't work (becouse the text may include other date! ) so i wonder if there is some combination to detect if the precision is year and 4 digit or not ?
|
@maweed, those are great suggestions. I'm definitely open to any improvements that can be made and pull requests! Part of the history is that back when I initially created this library a few years ago, regex based parsing was substantially faster than the alternatives. There also wasn't a lot of training data in some of the languages this library supports. That said, times change and maybe it's time for an upgrade :-) |
Hi
Thanks a lot for your great job.
I have some issues regarding number, most of numbers in the text is converted to date !
for example
text2="The meeting will be held at paris Allé 6, 0208 paris. Election 30 of a chairperson in france. page 18 of 20"
then we did get
[datetime.datetime(2008, 2, 6, 0, 0, tzinfo=), datetime.datetime(1930, 1, 1, 0, 0, tzinfo=), datetime.datetime(2018, 1, 1, 0, 0, tzinfo=), datetime.datetime(1920, 1, 1, 0, 0, tzinfo=)]
As you see all numbers her should not be extracted as date.
Is there any sulotion ?
Thanks
The text was updated successfully, but these errors were encountered: