Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: Suggest similar person to rename them #262

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

matiasdelellis
Copy link
Owner

The screenshot that says everything..
Captura 2020-04-18 11 06 24

Well, almost everything.. Some notes.

This pull can be considered as the starting point for issue #134 (Aka Join clusters together... ), but it is not expected to implement this feature. I'm just trying to discover similar clusters, and suggest renaming them.

Again, note that this works on the clusters that chineses_whisper discovered, and in no way is it changed.

Ok. How to get similar clusters?
First of all.. I assume that the face with the most relationships within the cluster is the most important and representative of the clusters. This is done with the same sensitivity that I originally group them.

So, when a user renames a cluster.

  1. Find the representative face of the cluster.
  2. Search all groups that are not named with the same name.
  3. Find the representative face of each cluster named different.
  4. Compare all the descriptors of the representatives with the original renamed.
  5. If the representative faces are similar (Here using sensitivity+0.1), return them to the fronted as an array of suggestions.
  6. Finally the fronted, shows the dialog of this screenshot. 😄
  7. Every time the user accepts to rename a cluster, it is searched again.

I was really surprised that the calculations are very fast, but probably in my case I have few photos. However if we decide to move forward, this will be optional.
As it show the faces of the whole cluster (14 or less of the main view), it's easy to avoid renaming the mixed groups.
Changing the facrecog_person, adding the representative face row (Which would be calculated only after merge_cluster()), the speed will be further improved. What's more, the final calculation could be done in the browser. However, it was not a priority.

I guess this may be a nice optional feature for release version, But if you accept it, I would publish a release candidate to evaluate before any public release.

By default, the deviation is 0.0 and therefore is disabled.

On the other hand it depends on dlib 1.0.2 to use dlib_vector_lenght()
@matiasdelellis
Copy link
Owner Author

Wow, I expected some other objection to this feature.. 😅

@stalker314314
Copy link
Collaborator

ops, wait, I was looking only last commit! give me more time, please:D

js/fr-dialogs.js Outdated
}

var buttonlist = [{
text: t('facerecognition', 'I don\'t know'),
Copy link
Collaborator

@stalker314314 stalker314314 Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know => I am not sure

@@ -57,6 +57,12 @@ class SettingsService {
const DEFAULT_SENSITIVITY = '0.4';
const MAXIMUM_SENSITIVITY = '0.6';

/** Deviation used to suggestions */
const DEVIATION_KEY = 'deviation';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe cluster_deviation to denote what this devaition is actually for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use deviation as an analogy to the standard deviation in a probability, but certainly cannot be applied directly.. 😅
But I guess I can keep it.. Maybe clusters_deviation

$facesCount = array();
for ($i = 0, $face_count1 = count($faces); $i < $face_count1; $i++) {
$face1 = $faces[$i];
for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can start with $i+1 (second loop), as everyone will have $distance=0 with itself?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do this, I can't compare the singles clusters..

@@ -131,6 +131,51 @@ public function findFacesFromPerson(string $userId, int $personId, int $model, $
return $faces;
}

public function findRepresentativeFromPerson(string $userId, int $personId, float $sensitivity, int $model) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be handy to precompute during cluster creation? It is not slower than cluster creation itself, and can dramatically speed up this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already have all these distances in that logic, so maybe there is not even a need to call dlib_vector_length two times (I am not sure about this, though)

This comment was marked as off-topic.

$mainFace = $this->faceMapper->findRepresentativeFromPerson($this->userId, $id, $sensitivity, $modelId);

$suggestions = array();
$persons = $this->personMapper->findAll($this->userId, $modelId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test, but I hardly think this will not timeout for me. I have hundreds of persons with thousands of faces, this will hit DB very hard:D

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the comment I expected. haha..
...and for this made it configurable and disabled by default ..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, I am that predictable?:D

Copy link
Collaborator

@stalker314314 stalker314314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't have any objections. Code for getting "representative" face is something that can be reused on a lot of places (tihs algo, maybe frontend...), so it is nice addition. My main concern is how my DB will handle this lazy fetching of faces:) Consider precomputing "representative" face in separate task, or during cluster creation, as this is good thing to have. This can speed up this logic too.

In some ideal case - we would have additional table to merge same persons (oc_facerec_person_groups) and this code can be done as background tasks (for each N:N persons) and "propose" possible similar ones. Then frontend can fetch this suggestions and present them. Schema for oc_facerec_person_groups could be:
id (some guid, doesn't matter)
person1_id (FK to persons)
person2_id (FK to persons)
state (0-proposed, 1-rejected, 2-merged)

Lot of code you added here would be reusable (except maybe frontend logic during renaming)

@matiasdelellis
Copy link
Owner Author

matiasdelellis commented Apr 20, 2020

In some ideal case - we would have additional table to merge same persons (oc_facerec_person_groups) and this code can be done as background tasks (for each N:N persons) and "propose" possible similar ones. Then frontend can fetch this suggestions and present them. Schema for oc_facerec_person_groups could be:
id (some guid, doesn't matter)
person1_id (FK to persons)
person2_id (FK to persons)
state (0-proposed, 1-rejected, 2-merged)

I forgot to answer this..
It was not intended to address the merge of clusters, but I understand that it is a direct step.

Well.. you say that we must work over persons, and beyond the calculation that we forget the faces. My focus is on faces, because just adding an two-faces edge entry to chinese_whispers it would automatically return a new cluster with the merged persons of that faces. I think it is much cleaner.

As for this PR, the table you describe may work, but if dont have the most representative face, the merge feature describe before will not work. On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user.

That said, it ends up being what was originally proposed in #134 except that would be added the state column you comment.

Do you agree with this? It would be something like ...

oc_facerec_relations: {
id (some guid, doesn't matter)
face1_id (FK to faces)
face2_id (FK to face)
state (0-proposed, 1-rejected, 2-accepted)
}

  1. After generating the clusters, we look for the representative face, compare and fill this table.
  2. The user accepts or rejects these suggestions. At this point if user accept, just rename the person.
  3. And just add new edges of "accepted relations" to chinese_whisper, and let this do the magic?

Copy link
Collaborator

@stalker314314 stalker314314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user.

I think cluster have really hard IDs (I created algorithm such that IDs are stable), so this should not be issue.

OK, my idea was to have:

oc_facerec_relations: {
    id
    person1_id
    person2_id
    state
}

oc_facerec_persons: {
    ...
    + representantive_face_id
}

Your idea is similar, just you "merge" representative face in oc_facerec_relations directly, if I understood you?

oc_facerec_relations: {
id
face1_id
face2_id
state
}

Don't forget to remove all rows in relations when image is deleted (and faces are deleted). This also means you need to treat that this table will not always have rows for each person. Other than that, both approaches are functional, I think. I guess this is good:)


private function fillFaceRelationsFromPersons(string $userId) {
$deviation = $this->settingsService->getDeviation();
if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.. Probably in the next nightly version keep these guards.
But I am seriously thinking to indicate that 1.0.2 will be necessary for the first public release.


private function fillFaceRelationsFromPersons(string $userId) {
$deviation = $this->settingsService->getDeviation();
if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?

@@ -343,4 +353,39 @@ public function mergeClusters(array $oldCluster, array $newCluster): array {
}
return $result;
}

private function fillFaceRelationsFromPersons(string $userId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider passing $modelId here - less complexity in this method and feels more natural (to me)


/**
* State of two face relation. These are proposed, and can be accepted
* as as the same person, or rejected.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add as comment - "Rejected relations are never proposed again"

@matiasdelellis
Copy link
Owner Author

I think cluster have really hard IDs (I created algorithm such that IDs are stable), so this should not be issue.

I confirm that it is quite stable, especially when you finish analyzing your current photos, and only add an few of new photos. But while you analyze progressively from scratch, it happened to me that I lost some names.. Generally when two or more important clusters merge together to create the main cluster. After that, it is difficult to lose any info.. 😉

Your idea is similar, just you "merge" representative face in oc_facerec_relations directly, if I understood you?

Yes. This complicate the SQL queries a bit since the important thing for the fronted is the person, but beyond that the user deletes a photo as you say, I can trust the relationship of the faces.

It is very likely that when merge two clusters <I'm not sure yet!>, the representative person of that cluster changes to a new one, And in this case, a new one row (And all its comparisons..) is created, but the previous relationships will not be eliminated since speaking of faces they will still be valid.
Also I am concerned that this table will grow indefinitely, and I will need to check it.

Don't forget to remove all rows in relations when image is deleted (and faces are deleted).

Yes.

This also means you need to treat that this table will not always have rows for each person.

Once the groups are created, they should exist .. but in any case, it will return that there are no suggestions ..

Other than that, both approaches are functional, I think. I guess this is good:)

Ok. 😄

@matiasdelellis
Copy link
Owner Author

Ok.. First test with sensivity 0.4 and deviation 0.1

[matias@nube nextcloud]$ sudo -u apache php occ face:reset --clustering -u user -vvvvv
Reset clustering done
[matias@nube nextcloud]$ sudo -u apache php occ face:background_job -u user -t 25200 -vvvv
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	979 faces found for clustering
	487 persons found after clustering
	1433 relations added as suggestions
[matias@nube nextcloud]$
[matias@nube nextcloud]$ sudo -u apache php occ face:background_job -u user -t 25200 -vvvv
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	Found 0 faces without associated persons for user user and model 1
	Found 0 changed persons for user user and model 1
	Clusters already exist, but there was some change that requires recreating the clusters
	979 faces found for clustering
	444 persons found after clustering
	3 relations added as suggestions
[matias@nube nextcloud]$ 

After reset clustering, and rename the first Sheldon cluster and accept all validity suggestions. These are all the cluster affected.
imagen

I force update clusters, that now takes into account relations.

imagen

In summary 44 clusters joined. After doing this with the main clusters, I reduced the clusters from 487 to 274 persons. It is strange the 3 individual groups, because it relate the name, but did not join any other cluster. I have to investigate..

Note that not all clusters are joining joined integrally, because I am only strictly accepting the proposed relations, but could create new relationships between all the clusters when the user finishes accepting all.

The performance at least the first time is obviously slower, and I'm in doubt if slower than before dlib_vector_lenght(), But while it is optional, i guess is an acceptable feature ..

@@ -252,6 +254,15 @@ private function getNewClusters(array $faces): array {
}
for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) {
$face2 = $faces[$j];
if ($this->relationMapper->existsOnMatrix($face1->id, $face2->id, $relations)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOooh, I liiiike this idea!:)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOooh, I liiiike this idea!:)

I guess now you understand why I insisted on implementing the relations tables with faces, and not with persons. 😅

On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.

In what circumstances do you suppose it will be splited again?
At this point, we only add or avoid adding new edges and they will be processed just like always by chinese_whispers. These will be repeated between all executions, therefore it will be as stable as until today.

When joining multiple clusters, can change the main face in the resulting cluster, but since we talk about face relations, these are still valid and we use them again in all the executions beyond the resulting clusters.

In any case, when the main faces change, new face relations will be created. In the example in my last comment, 44 clusters were merged, and 4 just new relations were created.
I am concerned that this table grows a lot, but it seems that it is little, and in any case, we can eliminate the faces that were not accepted before adding the new ones.

So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops

I don't quite understand what you mean, but maybe the previous comments will clarify it.

The only worrying point is that we need something like undo, in case the user accepts any wrong face by mistake.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, let's see how this plays out!:)

Then filter (Auto accepting or rejecting the proposals) while you accept the
proposals.

This ensures that you can apply as many relations as possible, but must accept
fewer suggestions.

On the other hand, now not apply any change until finish with all the
suggestions. So, if you close the dialog no changes will be applied.
@matiasdelellis matiasdelellis changed the title Suggest similar person to rename them Proof of concept: Suggest similar person to rename them Apr 30, 2020
@matiasdelellis
Copy link
Owner Author

Summary of current status. As proof of concept, it was excellent. 😬 The suggestions of similar persons are very good, but my assumption that the clusters could be joined by forcing faces is not entirely correct. 😞

Effectively the most faces and joined correctly, but there is always some separate, and looks like a never-ending process. For example when I add a face of a double person, the one I add joins well, but the second one is separated and I have to join it too.

This leads me to replant it, and go back to what was originally planned. We need a superior table that represents the Persons (Id, Name) that includes the clusters related by another table (Id, person_id, cluster_id), and the current persons table must become to cluster table.

Well, From what has been developed here, we can take the suggestions, and dialogs, but the logic of backgroud job, join persons and much more is different.

@stalker314314
Copy link
Collaborator

Too shame:(

If you go with this idea (Id, person_id, cluster_id), that means persons are getting created dynamically? As soon as I merge two clusters, I got person, right? Or person is created together when cluster is created? Maybe you still need state to have, as before, but with this schema - I am not sure where it would reside

@matiasdelellis
Copy link
Owner Author

If you go with this idea (Id, person_id, cluster_id), that means persons are getting created dynamically? As soon as I merge two clusters, I got person, right? Or person is created together when cluster is created

Yes.. persons are created by the user, and clusters are the persons that we have now and are created exactly the same as now. Must we trust in chinese_whispers with the same considerations.

When a name is assigned for the first time, a person is created, and the cluster is related to it. If the name is assigned to another cluster, it is related to the right person. The Sql queries would be just a little more complicated than now.

Maybe you still need state to have, as before, but with this schema - I am not sure where it would reside

Surely. I do not know.
Of course, it seems simple, but we must see it in practice.. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants