Proof of concept: Suggest similar person to rename them #262

matiasdelellis · 2020-04-18T15:12:29Z

The screenshot that says everything..

Well, almost everything.. Some notes.

This pull can be considered as the starting point for issue #134 (Aka Join clusters together... ), but it is not expected to implement this feature. I'm just trying to discover similar clusters, and suggest renaming them.

Again, note that this works on the clusters that chineses_whisper discovered, and in no way is it changed.

Ok. How to get similar clusters?
First of all.. I assume that the face with the most relationships within the cluster is the most important and representative of the clusters. This is done with the same sensitivity that I originally group them.

So, when a user renames a cluster.

Find the representative face of the cluster.
Search all groups that are not named with the same name.
Find the representative face of each cluster named different.
Compare all the descriptors of the representatives with the original renamed.
If the representative faces are similar (Here using sensitivity+0.1), return them to the fronted as an array of suggestions.
Finally the fronted, shows the dialog of this screenshot. 😄
Every time the user accepts to rename a cluster, it is searched again.

I was really surprised that the calculations are very fast, but probably in my case I have few photos. However if we decide to move forward, this will be optional.
As it show the faces of the whole cluster (14 or less of the main view), it's easy to avoid renaming the mixed groups.
Changing the facrecog_person, adding the representative face row (Which would be calculated only after merge_cluster()), the speed will be further improved. What's more, the final calculation could be done in the browser. However, it was not a priority.

I guess this may be a nice optional feature for release version, But if you accept it, I would publish a release candidate to evaluate before any public release.

By default, the deviation is 0.0 and therefore is disabled. On the other hand it depends on dlib 1.0.2 to use dlib_vector_lenght()

matiasdelellis · 2020-04-20T17:27:19Z

Wow, I expected some other objection to this feature.. 😅

stalker314314 · 2020-04-20T17:32:50Z

ops, wait, I was looking only last commit! give me more time, please:D

stalker314314 · 2020-04-20T17:37:37Z

js/fr-dialogs.js

+			}
+
+			var buttonlist = [{
+				text: t('facerecognition', 'I don\'t know'),


I don't know => I am not sure

stalker314314 · 2020-04-20T17:43:01Z

lib/Service/SettingsService.php

@@ -57,6 +57,12 @@ class SettingsService {
 	const DEFAULT_SENSITIVITY = '0.4';
 	const MAXIMUM_SENSITIVITY = '0.6';

+	/** Deviation used to suggestions */
+	const DEVIATION_KEY = 'deviation';


maybe cluster_deviation to denote what this devaition is actually for?

Use deviation as an analogy to the standard deviation in a probability, but certainly cannot be applied directly.. 😅
But I guess I can keep it.. Maybe clusters_deviation

stalker314314 · 2020-04-20T17:45:37Z

lib/Db/FaceMapper.php

+		$facesCount = array();
+		for ($i = 0, $face_count1 = count($faces); $i < $face_count1; $i++) {
+			$face1 = $faces[$i];
+			for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) {


you can start with $i+1 (second loop), as everyone will have $distance=0 with itself?

If I do this, I can't compare the singles clusters..

stalker314314 · 2020-04-20T17:47:18Z

lib/Db/FaceMapper.php

@@ -131,6 +131,51 @@ public function findFacesFromPerson(string $userId, int $personId, int $model, $
 		return $faces;
 	}

+	public function findRepresentativeFromPerson(string $userId, int $personId, float $sensitivity, int $model) {


this would be handy to precompute during cluster creation? It is not slower than cluster creation itself, and can dramatically speed up this logic.

I think you already have all these distances in that logic, so maybe there is not even a need to call dlib_vector_length two times (I am not sure about this, though)

stalker314314 · 2020-04-20T17:49:31Z

lib/Controller/PersonController.php

+		$mainFace = $this->faceMapper->findRepresentativeFromPerson($this->userId, $id, $sensitivity, $modelId);
+
+		$suggestions = array();
+		$persons = $this->personMapper->findAll($this->userId, $modelId);


I didn't test, but I hardly think this will not timeout for me. I have hundreds of persons with thousands of faces, this will hit DB very hard:D

This was the comment I expected. haha..
...and for this made it configurable and disabled by default ..

haha, I am that predictable?:D

stalker314314

Overall, I don't have any objections. Code for getting "representative" face is something that can be reused on a lot of places (tihs algo, maybe frontend...), so it is nice addition. My main concern is how my DB will handle this lazy fetching of faces:) Consider precomputing "representative" face in separate task, or during cluster creation, as this is good thing to have. This can speed up this logic too.

In some ideal case - we would have additional table to merge same persons (oc_facerec_person_groups) and this code can be done as background tasks (for each N:N persons) and "propose" possible similar ones. Then frontend can fetch this suggestions and present them. Schema for oc_facerec_person_groups could be:
id (some guid, doesn't matter)
person1_id (FK to persons)
person2_id (FK to persons)
state (0-proposed, 1-rejected, 2-merged)

Lot of code you added here would be reusable (except maybe frontend logic during renaming)

matiasdelellis · 2020-04-20T23:27:17Z

In some ideal case - we would have additional table to merge same persons (oc_facerec_person_groups) and this code can be done as background tasks (for each N:N persons) and "propose" possible similar ones. Then frontend can fetch this suggestions and present them. Schema for oc_facerec_person_groups could be:
id (some guid, doesn't matter)
person1_id (FK to persons)
person2_id (FK to persons)
state (0-proposed, 1-rejected, 2-merged)

I forgot to answer this..
It was not intended to address the merge of clusters, but I understand that it is a direct step.

Well.. you say that we must work over persons, and beyond the calculation that we forget the faces. My focus is on faces, because just adding an two-faces edge entry to chinese_whispers it would automatically return a new cluster with the merged persons of that faces. I think it is much cleaner.

As for this PR, the table you describe may work, but if dont have the most representative face, the merge feature describe before will not work. On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user.

That said, it ends up being what was originally proposed in #134 except that would be added the state column you comment.

Do you agree with this? It would be something like ...

oc_facerec_relations: {
id (some guid, doesn't matter)
face1_id (FK to faces)
face2_id (FK to face)
state (0-proposed, 1-rejected, 2-accepted)
}

After generating the clusters, we look for the representative face, compare and fill this table.
The user accepts or rejects these suggestions. At this point if user accept, just rename the person.
And just add new edges of "accepted relations" to chinese_whisper, and let this do the magic?

stalker314314

On the other hand, every time the clusters are created or destroyed, we need to change this table, and I may lose information given by the user.

I think cluster have really hard IDs (I created algorithm such that IDs are stable), so this should not be issue.

OK, my idea was to have:

oc_facerec_relations: {
    id
    person1_id
    person2_id
    state
}

oc_facerec_persons: {
    ...
    + representantive_face_id
}

Your idea is similar, just you "merge" representative face in oc_facerec_relations directly, if I understood you?

oc_facerec_relations: {
id
face1_id
face2_id
state
}

Don't forget to remove all rows in relations when image is deleted (and faces are deleted). This also means you need to treat that this table will not always have rows for each person. Other than that, both approaches are functional, I think. I guess this is good:)

stalker314314 · 2020-04-21T16:23:04Z

lib/BackgroundJob/Tasks/CreateClustersTask.php

+
+	private function fillFaceRelationsFromPersons(string $userId) {
+		$deviation = $this->settingsService->getDeviation();
+		if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0))


Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?

Maybe.. Probably in the next nightly version keep these guards.
But I am seriously thinking to indicate that 1.0.2 will be necessary for the first public release.

stalker314314 · 2020-04-21T16:23:57Z

lib/BackgroundJob/Tasks/CreateClustersTask.php

+
+	private function fillFaceRelationsFromPersons(string $userId) {
+		$deviation = $this->settingsService->getDeviation();
+		if (!version_compare(phpversion('pdlib'), '1.0.2', '>=') || ($deviation === 0.0))


Thinking aloud... should you put some print statement here? So, we can see from logs that it is not getting executed?

stalker314314 · 2020-04-21T16:24:37Z

lib/BackgroundJob/Tasks/CreateClustersTask.php

@@ -343,4 +353,39 @@ public function mergeClusters(array $oldCluster, array $newCluster): array {
 		}
 		return $result;
 	}
+
+	private function fillFaceRelationsFromPersons(string $userId) {


consider passing $modelId here - less complexity in this method and feels more natural (to me)

stalker314314 · 2020-04-21T16:25:45Z

lib/Db/Relation.php

+
+	/**
+	 * State of two face relation. These are proposed, and can be accepted
+	 * as as the same person, or rejected.


maybe add as comment - "Rejected relations are never proposed again"

matiasdelellis · 2020-04-21T21:28:31Z

I think cluster have really hard IDs (I created algorithm such that IDs are stable), so this should not be issue.

I confirm that it is quite stable, especially when you finish analyzing your current photos, and only add an few of new photos. But while you analyze progressively from scratch, it happened to me that I lost some names.. Generally when two or more important clusters merge together to create the main cluster. After that, it is difficult to lose any info.. 😉

Your idea is similar, just you "merge" representative face in oc_facerec_relations directly, if I understood you?

Yes. This complicate the SQL queries a bit since the important thing for the fronted is the person, but beyond that the user deletes a photo as you say, I can trust the relationship of the faces.

It is very likely that when merge two clusters <I'm not sure yet!>, the representative person of that cluster changes to a new one, And in this case, a new one row (And all its comparisons..) is created, but the previous relationships will not be eliminated since speaking of faces they will still be valid.
Also I am concerned that this table will grow indefinitely, and I will need to check it.

Don't forget to remove all rows in relations when image is deleted (and faces are deleted).

Yes.

This also means you need to treat that this table will not always have rows for each person.

Once the groups are created, they should exist .. but in any case, it will return that there are no suggestions ..

Other than that, both approaches are functional, I think. I guess this is good:)

Ok. 😄

…oposal

… about it

Drop relations when resetting clusters

matiasdelellis · 2020-04-25T00:18:02Z

Ok.. First test with sensivity 0.4 and deviation 0.1

[matias@nube nextcloud]$ sudo -u apache php occ face:reset --clustering -u user -vvvvv
Reset clustering done
[matias@nube nextcloud]$ sudo -u apache php occ face:background_job -u user -t 25200 -vvvv
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	979 faces found for clustering
	487 persons found after clustering
	1433 relations added as suggestions
[matias@nube nextcloud]$
[matias@nube nextcloud]$ sudo -u apache php occ face:background_job -u user -t 25200 -vvvv
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	Found 0 faces without associated persons for user user and model 1
	Found 0 changed persons for user user and model 1
	Clusters already exist, but there was some change that requires recreating the clusters
	979 faces found for clustering
	444 persons found after clustering
	3 relations added as suggestions
[matias@nube nextcloud]$

After reset clustering, and rename the first Sheldon cluster and accept all validity suggestions. These are all the cluster affected.

I force update clusters, that now takes into account relations.

In summary 44 clusters joined. After doing this with the main clusters, I reduced the clusters from 487 to 274 persons. It is strange the 3 individual groups, because it relate the name, but did not join any other cluster. I have to investigate..

Note that not all clusters are joining joined integrally, because I am only strictly accepting the proposed relations, but could create new relationships between all the clusters when the user finishes accepting all.

The performance at least the first time is obviously slower, and I'm in doubt if slower than before dlib_vector_lenght(), But while it is optional, i guess is an acceptable feature ..

stalker314314 · 2020-04-25T12:29:41Z

lib/BackgroundJob/Tasks/CreateClustersTask.php

@@ -252,6 +254,15 @@ private function getNewClusters(array $faces): array {
 				}
 				for ($j = $i, $face_count2 = count($faces); $j < $face_count2; $j++) {
 					$face2 = $faces[$j];
+					if ($this->relationMapper->existsOnMatrix($face1->id, $face2->id, $relations)) {


OOooh, I liiiike this idea!:)

On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.

So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops

OOooh, I liiiike this idea!:)

I guess now you understand why I insisted on implementing the relations tables with faces, and not with persons. 😅

On the other hand, now that I thinj... this will affect cluster creation. So, we might end up with those faces in same bucket. Which is OK, but next execution will "split" those faces into two cluster again. And now user will have to again accept merges.

In what circumstances do you suppose it will be splited again?
At this point, we only add or avoid adding new edges and they will be processed just like always by chinese_whispers. These will be repeated between all executions, therefore it will be as stable as until today.

When joining multiple clusters, can change the main face in the resulting cluster, but since we talk about face relations, these are still valid and we use them again in all the executions beyond the resulting clusters.

In any case, when the main faces change, new face relations will be created. In the example in my last comment, 44 clusters were merged, and 4 just new relations were created.
I am concerned that this table grows a lot, but it seems that it is little, and in any case, we can eliminate the faces that were not accepted before adding the new ones.

So, consider to keep those two system ("automatic cluster creation" and "manual approvals" decouples, as coupled they can lead to unintended feedback loops

I don't quite understand what you mean, but maybe the previous comments will clarify it.

The only worrying point is that we need something like undo, in case the user accepts any wrong face by mistake.

I see. So, let's see how this plays out!:)

Then filter (Auto accepting or rejecting the proposals) while you accept the proposals. This ensures that you can apply as many relations as possible, but must accept fewer suggestions. On the other hand, now not apply any change until finish with all the suggestions. So, if you close the dialog no changes will be applied.

matiasdelellis · 2020-05-03T22:41:52Z

Summary of current status. As proof of concept, it was excellent. 😬 The suggestions of similar persons are very good, but my assumption that the clusters could be joined by forcing faces is not entirely correct. 😞

Effectively the most faces and joined correctly, but there is always some separate, and looks like a never-ending process. For example when I add a face of a double person, the one I add joins well, but the second one is separated and I have to join it too.

This leads me to replant it, and go back to what was originally planned. We need a superior table that represents the Persons (Id, Name) that includes the clusters related by another table (Id, person_id, cluster_id), and the current persons table must become to cluster table.

Well, From what has been developed here, we can take the suggestions, and dialogs, but the logic of backgroud job, join persons and much more is different.

stalker314314 · 2020-05-03T22:53:20Z

Too shame:(

If you go with this idea (Id, person_id, cluster_id), that means persons are getting created dynamically? As soon as I merge two clusters, I got person, right? Or person is created together when cluster is created? Maybe you still need state to have, as before, but with this schema - I am not sure where it would reside

matiasdelellis · 2020-05-03T23:11:53Z

If you go with this idea (Id, person_id, cluster_id), that means persons are getting created dynamically? As soon as I merge two clusters, I got person, right? Or person is created together when cluster is created

Yes.. persons are created by the user, and clusters are the persons that we have now and are created exactly the same as now. Must we trust in chinese_whispers with the same considerations.

When a name is assigned for the first time, a person is created, and the cluster is related to it. If the name is assigned to another cluster, it is related to the right person. The Sql queries would be just a little more complicated than now.

Maybe you still need state to have, as before, but with this schema - I am not sure where it would reside

Surely. I do not know.
Of course, it seems simple, but we must see it in practice.. 😅

Suggest similar person to rename them

493fd25

matiasdelellis requested a review from stalker314314 April 18, 2020 15:12

matiasdelellis added 2 commits April 19, 2020 12:11

Add a "deviation" setting to control the suggestions.

333d649

By default, the deviation is 0.0 and therefore is disabled. On the other hand it depends on dlib 1.0.2 to use dlib_vector_lenght()

ot show any message to the user if the suggestion is disabled

e73d272

stalker314314 approved these changes Apr 20, 2020

View reviewed changes

stalker314314 reviewed Apr 20, 2020

View reviewed changes

stalker314314 approved these changes Apr 20, 2020

View reviewed changes

matiasdelellis added 2 commits April 21, 2020 10:26

Add Relation table, and fill after create clusters

200d3c8

Migrate suggestions to proposed as relation

1dc82a3

stalker314314 approved these changes Apr 21, 2020

View reviewed changes

matiasdelellis added 6 commits April 21, 2020 23:25

Add Relation controller and use it

3431b13

Add an true 'I am not sure' button, that not reject completely the pr…

218439c

…oposal

Fix query to find Relations between two persons.

18b209e

Also search inverted condition when find relations

d95067e

Some changes to optimize the filling of relations and print some info…

db8a672

… about it

Use array as NxN array to optimize search for relations.

e726cff

Drop relations when resetting clusters

matiasdelellis force-pushed the find-similar branch from 3204f0a to e726cff Compare April 24, 2020 22:13

Proof of concept on improving clustering according to relation.

a578af5

Doh. Also don't compare a good face with a bad one!

f7a85f9

stalker314314 reviewed Apr 25, 2020

View reviewed changes

matiasdelellis added 2 commits April 28, 2020 12:21

Also, dont suggest persons/faces with less than minimum confidence.

0b22b79

Take into account state when find first relations

8189e17

matiasdelellis changed the title ~~Suggest similar person to rename them~~ Proof of concept: Suggest similar person to rename them Apr 30, 2020

stalker314314 mentioned this pull request Jun 6, 2020

[Draft] Delete single recognized face #280

Open

3 tasks

This was referenced Aug 1, 2020

Remove the preference for the minimum size of faces but it remains as… #310

Merged

[brainstorm] Pull ideas from Digikam #303

Open

This was referenced Oct 7, 2020

Bye bye 'New person..' #336

Merged

Join clusters together... #134

Open

matiasdelellis mentioned this pull request Nov 2, 2020

Assign name to single face from main view #369

Open

matiasdelellis mentioned this pull request Nov 17, 2020

How to scan Groupfolders? #364

Closed

matiasdelellis mentioned this pull request May 22, 2024

Feature request: Add auto suggestions to name input #737

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: Suggest similar person to rename them #262

Proof of concept: Suggest similar person to rename them #262

matiasdelellis commented Apr 18, 2020

matiasdelellis commented Apr 20, 2020

stalker314314 commented Apr 20, 2020

stalker314314 Apr 20, 2020 •

edited

Loading

stalker314314 Apr 20, 2020

matiasdelellis Apr 20, 2020

stalker314314 Apr 20, 2020

matiasdelellis Apr 20, 2020

stalker314314 Apr 20, 2020

stalker314314 Apr 20, 2020

This comment was marked as off-topic.

stalker314314 Apr 20, 2020

matiasdelellis Apr 20, 2020

stalker314314 Apr 20, 2020

stalker314314 left a comment

matiasdelellis commented Apr 20, 2020 •

edited

Loading

stalker314314 left a comment

stalker314314 Apr 21, 2020

matiasdelellis Apr 21, 2020

stalker314314 Apr 21, 2020

stalker314314 Apr 21, 2020

stalker314314 Apr 21, 2020

matiasdelellis commented Apr 21, 2020

matiasdelellis commented Apr 25, 2020

stalker314314 Apr 25, 2020

stalker314314 Apr 25, 2020

stalker314314 Apr 25, 2020

matiasdelellis Apr 25, 2020

stalker314314 Apr 26, 2020

matiasdelellis commented May 3, 2020

stalker314314 commented May 3, 2020

matiasdelellis commented May 3, 2020

Proof of concept: Suggest similar person to rename them #262

Are you sure you want to change the base?

Proof of concept: Suggest similar person to rename them #262

Conversation

matiasdelellis commented Apr 18, 2020

matiasdelellis commented Apr 20, 2020

stalker314314 commented Apr 20, 2020

stalker314314 Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stalker314314 left a comment

Choose a reason for hiding this comment

matiasdelellis commented Apr 20, 2020 • edited Loading

stalker314314 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matiasdelellis commented Apr 21, 2020

matiasdelellis commented Apr 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matiasdelellis commented May 3, 2020

stalker314314 commented May 3, 2020

matiasdelellis commented May 3, 2020

stalker314314 Apr 20, 2020 •

edited

Loading

matiasdelellis commented Apr 20, 2020 •

edited

Loading