-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: Suggest similar person to rename them #262
base: master
Are you sure you want to change the base?
Conversation
By default, the deviation is 0.0 and therefore is disabled. On the other hand it depends on dlib 1.0.2 to use dlib_vector_lenght()
Wow, I expected some other objection to this feature.. 😅 |
ops, wait, I was looking only last commit! give me more time, please:D |
js/fr-dialogs.js
Outdated
} | ||
|
||
var buttonlist = [{ | ||
text: t('facerecognition', 'I don\'t know'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know => I am not sure
@@ -57,6 +57,12 @@ class SettingsService { | |||
const DEFAULT_SENSITIVITY = '0.4'; | |||
const MAXIMUM_SENSITIVITY = '0.6'; | |||
|
|||
/** Deviation used to suggestions */ | |||
const DEVIATION_KEY = 'deviation'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe cluster_deviation
to denote what this devaition is actually for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use deviation as an analogy to the standard deviation in a probability, but certainly cannot be applied directly.. 😅
But I guess I can keep it.. Maybe clusters_deviation
lib/Db/FaceMapper.php
Outdated
$facesCount = array(); | ||
for ($i = 0, $face_count1 = count($faces); $i < $face_count1; $i++) { | ||
$face1 = $faces[$i]; | ||
for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can start with $i+1
(second loop), as everyone will have $distance=0
with itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do this, I can't compare the singles clusters..
lib/Db/FaceMapper.php
Outdated
@@ -131,6 +131,51 @@ public function findFacesFromPerson(string $userId, int $personId, int $model, $ | |||
return $faces; | |||
} | |||
|
|||
public function findRepresentativeFromPerson(string $userId, int $personId, float $sensitivity, int $model) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be handy to precompute during cluster creation? It is not slower than cluster creation itself, and can dramatically speed up this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already have all these distances in that logic, so maybe there is not even a need to call dlib_vector_length
two times (I am not sure about this, though)
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
lib/Controller/PersonController.php
Outdated
$mainFace = $this->faceMapper->findRepresentativeFromPerson($this->userId, $id, $sensitivity, $modelId); | ||
|
||
$suggestions = array(); | ||
$persons = $this->personMapper->findAll($this->userId, $modelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test, but I hardly think this will not timeout for me. I have hundreds of persons with thousands of faces, this will hit DB very hard:D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the comment I expected. haha..
...and for this made it configurable and disabled by default ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, I am that predictable?:D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't have any objections. Code for getting "representative" face is something that can be reused on a lot of places (tihs algo, maybe frontend...), so it is nice addition. My main concern is how my DB will handle this lazy fetching of faces:) Consider precomputing "representative" face in separate task, or during cluster creation, as this is good thing to have. This can speed up this logic too.
In some ideal case - we would have additional table to merge same persons (oc_facerec_person_groups) and this code can be done as background tasks (for each N:N persons) and "propose" possible similar ones. Then frontend can fetch this suggestions and present them. Schema for oc_facerec_person_groups
could be:
id (some guid, doesn't matter)
person1_id (FK to persons)
person2_id (FK to persons)
state (0-proposed, 1-rejected, 2-merged)
Lot of code you added here would be reusable (except maybe frontend logic during renaming)
I forgot to answer this.. Well.. you say that we must work over persons, and beyond the calculation that we forget the faces. My focus is on faces, because just adding an two-faces edge entry to chinese_whispers it would automatically return a new cluster with the merged persons of that faces. I think it is much cleaner. As for this PR, the table you describe may work, but if dont have the most representative face, the merge feature describe before will not work. On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user. That said, it ends up being what was originally proposed in #134 except that would be added the state column you comment. Do you agree with this? It would be something like ... oc_facerec_relations: {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user.
I think cluster have really hard IDs (I created algorithm such that IDs are stable), so this should not be issue.
OK, my idea was to have:
oc_facerec_relations: {
id
person1_id
person2_id
state
}
oc_facerec_persons: {
...
+ representantive_face_id
}
Your idea is similar, just you "merge" representative face in oc_facerec_relations
directly, if I understood you?
oc_facerec_relations: {
id
face1_id
face2_id
state
}
Don't forget to remove all rows in relations when image is deleted (and faces are deleted). This also means you need to treat that this table will not always have rows for each person. Other than that, both approaches are functional, I think. I guess this is good:)
|
||
private function fillFaceRelationsFromPersons(string $userId) { | ||
$deviation = $this->settingsService->getDeviation(); | ||
if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.. Probably in the next nightly version keep these guards.
But I am seriously thinking to indicate that 1.0.2 will be necessary for the first public release.
|
||
private function fillFaceRelationsFromPersons(string $userId) { | ||
$deviation = $this->settingsService->getDeviation(); | ||
if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?
@@ -343,4 +353,39 @@ public function mergeClusters(array $oldCluster, array $newCluster): array { | |||
} | |||
return $result; | |||
} | |||
|
|||
private function fillFaceRelationsFromPersons(string $userId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider passing $modelId here - less complexity in this method and feels more natural (to me)
|
||
/** | ||
* State of two face relation. These are proposed, and can be accepted | ||
* as as the same person, or rejected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add as comment - "Rejected relations are never proposed again"
I confirm that it is quite stable, especially when you finish analyzing your current photos, and only add an few of new photos. But while you analyze progressively from scratch, it happened to me that I lost some names.. Generally when two or more important clusters merge together to create the main cluster. After that, it is difficult to lose any info.. 😉
Yes. This complicate the SQL queries a bit since the important thing for the fronted is the person, but beyond that the user deletes a photo as you say, I can trust the relationship of the faces. It is very likely that when merge two clusters <I'm not sure yet!>, the representative person of that cluster changes to a new one, And in this case, a new one row (And all its comparisons..) is created, but the previous relationships will not be eliminated since speaking of faces they will still be valid.
Yes.
Once the groups are created, they should exist .. but in any case, it will return that there are no suggestions ..
Ok. 😄 |
Drop relations when resetting clusters
3204f0a
to
e726cff
Compare
@@ -252,6 +254,15 @@ private function getNewClusters(array $faces): array { | |||
} | |||
for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) { | |||
$face2 = $faces[$j]; | |||
if ($this->relationMapper->existsOnMatrix($face1->id, $face2->id, $relations)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOooh, I liiiike this idea!:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOooh, I liiiike this idea!:)
I guess now you understand why I insisted on implementing the relations tables with faces, and not with persons. 😅
On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.
In what circumstances do you suppose it will be splited again?
At this point, we only add or avoid adding new edges and they will be processed just like always by chinese_whispers. These will be repeated between all executions, therefore it will be as stable as until today.
When joining multiple clusters, can change the main face in the resulting cluster, but since we talk about face relations, these are still valid and we use them again in all the executions beyond the resulting clusters.
In any case, when the main faces change, new face relations will be created. In the example in my last comment, 44 clusters were merged, and 4 just new relations were created.
I am concerned that this table grows a lot, but it seems that it is little, and in any case, we can eliminate the faces that were not accepted before adding the new ones.
So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops
I don't quite understand what you mean, but maybe the previous comments will clarify it.
The only worrying point is that we need something like undo, in case the user accepts any wrong face by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, let's see how this plays out!:)
Then filter (Auto accepting or rejecting the proposals) while you accept the proposals. This ensures that you can apply as many relations as possible, but must accept fewer suggestions. On the other hand, now not apply any change until finish with all the suggestions. So, if you close the dialog no changes will be applied.
Summary of current status. As proof of concept, it was excellent. 😬 The suggestions of similar persons are very good, but my assumption that the clusters could be joined by forcing faces is not entirely correct. 😞 Effectively the most faces and joined correctly, but there is always some separate, and looks like a never-ending process. For example when I add a face of a double person, the one I add joins well, but the second one is separated and I have to join it too. This leads me to replant it, and go back to what was originally planned. We need a superior table that represents the Persons (Id, Name) that includes the clusters related by another table (Id, person_id, cluster_id), and the current persons table must become to cluster table. Well, From what has been developed here, we can take the suggestions, and dialogs, but the logic of backgroud job, join persons and much more is different. |
Too shame:( If you go with this idea ( |
Yes.. persons are created by the user, and clusters are the persons that we have now and are created exactly the same as now. Must we trust in chinese_whispers with the same considerations. When a name is assigned for the first time, a person is created, and the cluster is related to it. If the name is assigned to another cluster, it is related to the right person. The Sql queries would be just a little more complicated than now.
Surely. I do not know. |
The screenshot that says everything..
Well, almost everything.. Some notes.
This pull can be considered as the starting point for issue #134 (Aka Join clusters together... ), but it is not expected to implement this feature. I'm just trying to discover similar clusters, and suggest renaming them.
Again, note that this works on the clusters that chineses_whisper discovered, and in no way is it changed.
Ok. How to get similar clusters?
First of all.. I assume that the face with the most relationships within the cluster is the most important and representative of the clusters. This is done with the same sensitivity that I originally group them.
So, when a user renames a cluster.
I was really surprised that the calculations are very fast, but probably in my case I have few photos. However if we decide to move forward, this will be optional.
As it show the faces of the whole cluster (14 or less of the main view), it's easy to avoid renaming the mixed groups.
Changing the facrecog_person, adding the representative face row (Which would be calculated only after merge_cluster()), the speed will be further improved. What's more, the final calculation could be done in the browser. However, it was not a priority.
I guess this may be a nice optional feature for release version, But if you accept it, I would publish a release candidate to evaluate before any public release.