Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide community similarity algorithms #244

Open
johnlinp opened this issue Jan 27, 2023 · 2 comments
Open

Please provide community similarity algorithms #244

johnlinp opened this issue Jan 27, 2023 · 2 comments
Labels
feature request A suggestion for a new feature

Comments

@johnlinp
Copy link

Is your feature request related to a problem? Please describe.
I have a graph of social media data. I used community detection algorithms (e.g. Louvain) to detect different sets of communities, based on different properties, like location, timestamp, etc. Therefore, I have a set of communities that are detected based on the location of the data, and another set of communities that are detected based on the timestamp of the data.

My next step would be comparing the similarity between these sets of communities. I saw some algorithms like Rand Index will do the job. Can GDS provide such algorithms? Thank you.

Describe the solution you would like
I wish GDS can provide community similarity algorithms, e.g. Rand Index.

Describe alternatives you have considered
If GDS doesn't provide it, I'll have to implement on my own.

@johnlinp johnlinp added the feature request A suggestion for a new feature label Jan 27, 2023
@johnlinp
Copy link
Author

If anyone need a simple version of Rand Index implementation, here it is.

Assume that we are analyzing a set of social media posts (:Post). We have did 2 Louvain community detection based on 2 different attributes and put community_1_id and community_2_id on the nodes. The way to calculate the Rand Index between these 2 community sets will be:

CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id = m.community_1_id
  AND n.community_2_id = m.community_2_id
  RETURN count(*) AS a
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id <> m.community_1_id
  AND n.community_2_id <> m.community_2_id
  RETURN count(*) AS b
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id = m.community_1_id
  AND n.community_2_id <> m.community_2_id
  RETURN count(*) AS c
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id <> m.community_1_id
  AND n.community_2_id = m.community_2_id
  RETURN count(*) AS d
}
RETURN 1.0 * (a + b) / (a + b + c + d) AS rand_index;

@gminneci
Copy link
Collaborator

Hi @johnlinp! I am a product manager at Neo4j. Thank you for this feature request. We are looking at these type of features as 'subgraph similarity', but don't have an implementation plan just yet. Great to see that you have an implementation already - how is it working for you? Are there any specific limitations in what you are trying to achieve that you'd like to mention?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A suggestion for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants