Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds real metrics for diseases to the log #115

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

amuller26
Copy link
Contributor

@amuller26 amuller26 commented Jul 29, 2024

Adds metrics for all diseases logged individually. Disease stats are currently stored in a separate log file, which can be either a CSV file or JSON file.

@amuller26
Copy link
Contributor Author

I ran into an issue where if you change the number of starting diseases per agent, in sugarscape.py, each disease is added multiple times. The number of times it's added is not correlated with the number of starting diseases per agent either.

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Jul 30, 2024

The intended behavior is that startingDiseases is the master pool of diseases and startingDiseasesPerAgent is how many diseases each agent will be endowed with at the beginning of the simulation. So if there are 25 diseases and 10 diseases per agent, each agent should catch a random 10 of the 25 diseases (they will not catch diseases they are immune to). Can you send me a config where you're seeing incorrect behavior?

@amuller26
Copy link
Contributor Author

@colinhanrahan Everything in config.json is the same except startingDiseasesPerAgent is anything but [0, 0].

@colinhanrahan
Copy link
Contributor

Everything I'm seeing with "startingDiseasesPerAgent" != [0, 0] matches up with pg. 147 in the book. Can you elaborate a little more on what's happening?

@amuller26
Copy link
Contributor Author

@colinhanrahan sugarscape.py, line 178-201

diseases = []
for i in range(numDiseases):
diseaseID = self.generateDiseaseID()
diseaseConfiguration = diseaseEndowments[i]
newDisease = disease.Disease(diseaseID, diseaseConfiguration)
diseases.append(newDisease)
startingDiseases = self.configuration["startingDiseasesPerAgent"]
minStartingDiseases = startingDiseases[0]
maxStartingDiseases = startingDiseases[1]
currStartingDiseases = minStartingDiseases
for agent in self.agents:
random.shuffle(diseases)
for newDisease in diseases:
if len(agent.diseases) >= currStartingDiseases and startingDiseases != [0, 0]:
currStartingDiseases += 1
break
hammingDistance = agent.findNearestHammingDistanceInDisease(newDisease)["distance"]
if hammingDistance == 0:
continue
agent.catchDisease(newDisease)
self.diseases.append(newDisease)
if startingDiseases == [0, 0]:
diseases.remove(newDisease)
break

The disease loop is in the agent loop, so if an agent gets assigned the same disease as another agent, a duplicate is appended to self.diseases

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Jul 31, 2024

Oh, I see. You can either put that in a conditional and only add the disease if it's not already in self.diseases or set self.diseases = diseases after the for i in range(numDiseases) loop finishes. The disadvantage with the second approach is that there's a small chance a disease in the self.diseases list will be completely absent from the population.

For more built-in stability, we could use a set since diseases are unique and unordered.

@amuller26
Copy link
Contributor Author

I'm not sure if this is an error, but when configuration['startingDiseasesPerAgent'] != [0,0], it isn't guaranteed that all diseases will get used. Let's say configuration["startingDiseases"] = 50, self.diseases might only use 45. Is that normal or to be expected?

@colinhanrahan
Copy link
Contributor

It's not an error with the current implementation, but we could change the implementation if necessary. We endow each agent with startingDiseasesPerAgent random diseases that they are not immune to from the master list of diseases, so there are two causes for unused diseases:

  • all agents are immune to the disease, so they do not contract it
  • some or all agents are not immune to the disease, but the disease is never selected by chance

@amuller26
Copy link
Contributor Author

I consolidated the diseases' metrics so far into the logfile. The naming scheme so far is runtimeStats["disease{disease.ID}{metric}"].
The only metric so far is the R-Value, waiting on Dr. R's response for which other metrics to track.
With the current naming scheme, in the JSON format, it will show up disease0, disease1, disease2, disease3, etc. However, with the CSV format, it lists them disease0, disease1, disease10, disease11, etc. Is there a better way to name the disease metrics?

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Aug 2, 2024

Yeah, it looks like CSV variables are sorted alphabetically (agentAgingDeaths,agentCombatDeaths,agentDiseaseDeaths) while JSON variables are intentionally ordered ("timestep": 0, "population": 250, "meanMetabolism": 2.49). There are two approaches you could take:

  1. Add the disease stats after the runtime stats are sorted. Disease stats are added on lines 64-65 of __init__:
diseaseStats = {f"disease{disease.ID}RValue": 0 for disease in self.diseases}
self.runtimeStats.update(diseaseStats)

And runtime stats are sorted alphabetically in startLog and endLog. You could just add the disease stats after those sorts instead of in __init__.

  1. When naming disease variables, pack the disease IDs with 0s on the left to maintain alphabetical order. So if there were 11-100 diseases, you would add one extra 0 to diseases 0-9: disease00, disease01, disease02....

+3. Maybe the runtime stats don't need to be sorted at all — you could ask NKH when he becomes available again. They should be in a consistent order regardless so I don't understand the comment # Ensure consistent ordering for CSV format.

Edit: the above comment might have been because dictionaries didn't officially maintain insertion order until Python 3.7. If that is the case, we should be good to remove the sorting.

@amuller26
Copy link
Contributor Author

amuller26 commented Aug 8, 2024

I started implemented my feature to start diseases at different set timesteps. However, if the diseaseStartTimeframe, it either won't initialize the rest of the diseases (if diseaseStartTimeframe = [0, 1] for example]. Or, if it starts at any timestep past 0, it will only initialize half of the diseases it's supposed to (diseaseStartTimeframe = [1, 1] for example). I am not sure where I am going wrong. It's reading the infectAgents() function and looks like it's doing everything right, but maybe I've been staring at it too long. Some suggestions and advice would be great.
I have everything logged to a separate file so I can look at the numbers and make sure it's correct. Once I get this sorted I will add all of the data to the main log.

@amuller26
Copy link
Contributor Author

amuller26 commented Aug 8, 2024

I'm also not sure where to put the infectAgents() function because the infected agents don't change colors until the day after the disease is initialized.

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Aug 8, 2024

How is the start timestep per disease intented to work with startingDiseasesPerAgent? If "diseaseStartTimeframe": [0, 1], for example, does every agent get infected with a value in the range startingDiseasesPerAgent on timestep 0 (selected from the pool of diseases with that starting timestep — so half the total pool), and then the same on timestep 1? Or are the settings mutually exclusive?

I'm also seeing some zombie agents that don't move but don't die — I'll investigate this, but these are always caused by agents being removed from sugarscape.agents before calling their own death function.

@amuller26
Copy link
Contributor Author

"diseaseStartTimeframe": [0, 1] works similarly to the pollutionTimeframe. The disease is assigned a timestep to be initialized and infect however many agents. Agents will still be infected with however many diseases depending on startingDiseasesPerAgent but they might not all be at the same timestep. If "startingDiseasePerAgent: [0, 5] and "diseaseStartTimeframe": [0, 5], then an agent could be infected by up to 5 diseases within timesteps 0-5.

The disease is assigned a randomized timestep and then entered in a 2D array called diseasesCount where infectAgents() will use only diseasesCount[timestep.

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Aug 8, 2024

self.infectAgents should be moved to the top of self.doTimestep, probably after removing and replacing agents — stepping forward on the GUI is not part of the runSimulation loop and calls sugarscape.doTimestep directly.

Edit: Move it to after self.timestep is incremented so that the timestep is correct. This should fix the GUI drawing issue.
I recommend renaming diseasesCount something like newDiseasesPerTimestep for clarity.

@amuller26
Copy link
Contributor Author

I moved it to after self.doTimestep and something's still wrong. It looks like the diseases were initialized on the correct timestep but the GUI didn't recognize them and the stats are logged the timestep after. It also looks like the timestep after is day 2 of the disease in the simulation.

@colinhanrahan
Copy link
Contributor

Move it into doTimestep here:

        self.timestep += 1
        self.infectAgents()
        if self.end == True or (len(self.agents) == 0 and self.keepAlive == False):
            self.toggleEnd()

That seems to work pretty well on my side.

@colinhanrahan
Copy link
Contributor

I saw that you added another comment through my notifications, but I can't see it right now. If you deleted it please disregard.

The diseases are introduced at the beginning of the timestep. If you're checking the diseases per agent at the end of the timestep (or after any agents have completed their their timestep), it's likely that the diseases will have spread ("duplicated").

If "startingDiseasesPerAgent" != [0, 0], then new diseases can infect multiple agents at the same time. Otherwise, they should only infect one agent when they are introduced into the population.

Do either of those fix your issue? If not, can you give me some more detail?

@amuller26 amuller26 marked this pull request as ready for review August 10, 2024 03:25
@amuller26 amuller26 changed the title Adds metrics for diseases in a separate log file Adds real metrics for diseases to the log Aug 13, 2024
@amuller26 amuller26 marked this pull request as draft August 14, 2024 02:59
@amuller26
Copy link
Contributor Author

There are some agents who do not move, and I don't know if it's because they can't move or they're zombies.

@colinhanrahan
Copy link
Contributor

Zombie agents will have negative sugar and metabolism and their age won't increase over time. Sick agents with 0 functional range will have 0 vision and/or 0 movement.

@amuller26
Copy link
Contributor Author

I modified agent behavior to avoid any cell with sick agents in the vicinity when they're not sick, and to avoid only areas with the same tribe if the agent is sick. @colinhanrahan Can you please check that it is implemented correctly?

@colinhanrahan
Copy link
Contributor

colinhanrahan commented Aug 24, 2024

Screenshot 2024-08-23 at 5 53 27 PM

Running disease_basic.json (after fixing the breaks on my two active PRs), agents refuse to move next to any agent with a disease and end up in a gridlock where many can't move at all. This seems like a natural consequence of the logic you introduced rather than a bug. Thoughts?

Edit: for cells directly adjacent to the agent's current cell, both checkInfectedArea and checkTribeArea are triggered by the self agent, so they'll never move 1 cell when they're sick. You should exclude self from these checks.

@nkremerh
Copy link
Owner

A thought for the self-quarantining and the exiling behavior:

These should be configurable parameters for the agent (such as self.quarantineFactor) which impact the scores for potential cells. A considerate sick agent (i.e. one with a quarantineFactor of 1) will impose strong score penalties to cells next to others. Likewise, a healthy agent with a high quarantineFactor will impose strong score penalties to cells next to sick agents.

That way, there's less likely to be gridlock. Agents may not find too many cells with good scores, but they're still likely to move since those cells next to sick/healthy individuals aren't completely removed from consideration. They're simply less appealing.

@amuller26
Copy link
Contributor Author

@nkremerh the quarantine behavior is commented, and I believe everything is done that we were planning to add for now.

@amuller26 amuller26 marked this pull request as ready for review November 18, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants