Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More metrics #54

Merged
merged 20 commits into from
Apr 3, 2024
Merged

More metrics #54

merged 20 commits into from
Apr 3, 2024

Conversation

ocefpaf
Copy link
Member

@ocefpaf ocefpaf commented Feb 13, 2024

  • ATN Deployments
  • COMT Projects
  • Federal Partners
  • HAB Pilot Projects
  • IOOS
  • IOOS Core Variables
  • MBON Projects
  • Metadata Records
  • OTT Projects
  • QARTOD Manuals
  • Regional Associations
  • Regional Platforms
  • log pass, fail, changed
  • log counter names when they exists
  • use previous metrics as a constant to check for changes
  • use fake random user-agent
  • refactor all tests that only checks if the answer is a natural number

TODO (moved to #56):

  • parallelize update_metrics
  • HF Radar Stations # This is a hardcoded number in the notebook (165), is that still valid @MathewBiddle?
  • National Platforms # Lots of services and should be into a separate module
  • NGDAC Glider Days # Implemented but must change to glider-data-days

@ocefpaf ocefpaf marked this pull request as ready for review February 13, 2024 19:14
@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 13, 2024

@MathewBiddle this one is getting too big to review. I'll address the remaining points in another PR.

@MathewBiddle
Copy link
Contributor

HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.

@MathewBiddle
Copy link
Contributor

Is this ready for review?

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 14, 2024

HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.

OK. I'll add a note to check hfrnet again in the future.

Is this ready for review?

Yep. I have some extra changes that would be nice in a fresh PR to avoid clashing with the ones here.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 14, 2024

PS: The next changes parallelize things. It takes ~7 s against +20 s from before. The more metrics we add, the more the speedup will be important (we are still missing the national platforms and that hits different data sources).

In [2]: %time update_metrics()
CPU times: user 88.1 ms, sys: 86.3 ms, total: 174 ms
Wall time: 6.59 s
Out[2]: 
     date_UTC Federal Partners Regional Associations  HF Radar Stations NGDAC Glider Days  ...  QARTOD Manuals IOOS Core Variables Metadata Records IOOS COMT Projects
0  2018-02-01               17                    11                150             52027  ...              13                  34             8600    1          <NA>
1  2022-04-22               17                    11                165             53672  ...              13                  34             7213    1             5
2  2022-07-08               17                    11                165             55448  ...              13                  34             6217    1             5
3  2022-10-05               17                    11                165             59088  ...              13                  34            24499    1             5
4  2023-01-05               17                    11                165             62042  ...              13                  34            11840    1             5
5  2024-02-14               17                    11               <NA>             76075  ...              13                  34            35249    1             5

[6 rows x 16 columns]

@ocefpaf ocefpaf mentioned this pull request Feb 14, 2024
4 tasks
try:
num = function()
except Exception as err:
log.error(f"{err}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioos_metrics.ioos_metrics.update_metrics()
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 424, in update_metrics
    num = function()
          ^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 376, in hab_pilot_projects
    from pdfminer.high_level import extract_text
  File "C:\Users\Mathew.Biddle\programs\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pdfminer.high_level'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-da9d358039b7>", line 1, in <module>
    df2 = ioos_metrics.ioos_metrics.update_metrics()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 426, in update_metrics
    log.error(f"{err}")
    ^^^
NameError: name 'log' is not defined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in the last commit.

Copy link
Contributor

@MathewBiddle MathewBiddle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one fix of a duplicated line.

"Federal Partners": federal_partners,
"HAB Pilot Projects": hab_pilot_projects,
"IOOS Core Variables": ioos_core_variables,
"IOOS Core Variables": ioos_core_variables,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love ❤️ this!!

@ocefpaf ocefpaf requested a review from MathewBiddle March 15, 2024 23:12
@MathewBiddle
Copy link
Contributor

MathewBiddle commented Apr 3, 2024

Just ran this and received this error:

import ioos_metrics.ioos_metrics
df2 = ioos_metrics.ioos_metrics.update_metrics()
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-da9d358039b7>", line 1, in <module>
    df2 = ioos_metrics.ioos_metrics.update_metrics()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 429, in update_metrics
    message = _compare_metrics(column=column, num=num)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 65, in _compare_metrics
    elif num < old:
         ^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'

@MathewBiddle
Copy link
Contributor

It looks like, since HAB Pilot Projects doesnt exist previously, this catches the if loop.

I added some print statements to help debug:

df2 = ioos_metrics.ioos_metrics.update_metrics()
column: ATN Deployments
old: 4444
num: 5298
column: COMT Projects
old: 5
num: 5
column: Federal Partners
old: 17
num: 17
column: HAB Pilot Projects
old: 9
num: None
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-da9d358039b7>", line 1, in <module>
    df2 = ioos_metrics.ioos_metrics.update_metrics()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 432, in update_metrics
    message = _compare_metrics(column=column, num=num)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 68, in _compare_metrics
    elif num < old:
         ^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'

@MathewBiddle
Copy link
Contributor

ahh, it looks like its a problem with hab_pilot_projects()

ioos_metrics.ioos_metrics.hab_pilot_projects()
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-a4a8446718c8>", line 1, in <module>
    ioos_metrics.ioos_metrics.hab_pilot_projects()
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 379, in hab_pilot_projects
    from pdfminer.high_level import extract_text
  File "C:\Users\Mathew.Biddle\programs\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pdfminer.high_level'

@MathewBiddle
Copy link
Contributor

Okay, I updated pdfminer and now the HABs function works.

@MathewBiddle
Copy link
Contributor

ioos_metrics.ioos_metrics.update_metrics() was broken too as I need to install ckanapi.

@MathewBiddle
Copy link
Contributor

MathewBiddle commented Apr 3, 2024

Looks like my env was all out of date. Updating my env then I'll try again.

conda env update --file environment.yml --prune

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 3, 2024

I guess I could more gracefully when a dependency is missing. Let me see if I can fix those.

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 3, 2024

@MathewBiddle latest commit should make the update_metrics run even when there is a missing dependency. Note that, b/c we want it to run all the way to the end, the metric will be None but the error will be in the logs like:

INFO:root:[2023-01-05] : COMT Projects equal 5 = 5.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ioos.noaa.gov:443
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national HTTP/1.1" 301 0
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national/ HTTP/1.1" 200 None
INFO:root:df_fed_partners[0].to_string()='0     National Oceanic and Atmospheric Administratio...\n1     National Aeronautics and Space Administration ...\n2     Bureau of Ocean Energy Manage
ment, Regulation ...\n3                        Office of Naval Research (ONR)\n4                  U.S. Army Corps of Engineers (USACE)\n5                         U.S. Geological Survey (USGS)
\n6                            Department of Energy (DOE)\n7                    Department of Transportation (DOT)\n8               U.S. Arctic Research Commission (USARC)\n9                 
    National Science Foundation (NSF)\n10                Environmental Protection Agency (EPA)\n11                       Marine Mammal Commission (MMC)\n12    Oceanographer of the Navy, repre
senting the Jo...\n13                              U.S. Coast Guard (USCG)\n14    Department of Agriculture, Cooperative State R...\n15                            Department of State (DOS)\n1
6                   Food and Drug Administration (FDA)'
INFO:root:[2023-01-05] : Federal Partners equal 17 = 17.
ERROR:root:No module named 'pdfminer'

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 3, 2024

I added some print statements to help debug

Matt, I should mention that update_metrics never fails, it keeps going and logs everyting in the metric.log file. You can either inspect the logs, to figure out why some metric is None, or run the specific function by itself. Here is what happens if I run hab_pilot_projects outside of update_metrics without pdfminer.six:

from ioos_metrics.ioos_metrics import hab_pilot_projects

hab_pilot_projects()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 hab_pilot_projects()

File ~/Dropbox/pymodules/01-forks/IOOS/ioos_metrics/ioos_metrics/ioos_metrics.py:378, in hab_pilot_projects()
    368 def hab_pilot_projects():
    369     """
    370     These are the National Harmful Algal Bloom Observing Network Pilot Project awards.
    371     Currently these were calculated from the
   (...)
    376 
    377     """
--> 378     from pdfminer.high_level import extract_text
    380     url = "https://cdn.ioos.noaa.gov/media/2022/10/NHABON-Funding-Awards-FY22.pdf"
    382     data = requests.get(url)

ModuleNotFoundError: No module named 'pdfminer'

@MathewBiddle
Copy link
Contributor

After updating the env things are looking good.

It looks like pdfminer writes a lot of stuff to the log file. (166082 lines worth) We can clean that up in a further PR.

@MathewBiddle
Copy link
Contributor

MathewBiddle commented Apr 3, 2024

missing

  • hf radar - hardcoded for now = 165
  • National Platforms - building into a separate module??

@MathewBiddle MathewBiddle merged commit 5381080 into ioos:main Apr 3, 2024
1 check passed
@ocefpaf ocefpaf deleted the more_metrics branch April 3, 2024 15:14
@ocefpaf ocefpaf mentioned this pull request Apr 3, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants