Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neo4j Online Meetup 2017-11-30 Materials #3

Open
dhimmel opened this issue Nov 30, 2017 · 3 comments
Open

Neo4j Online Meetup 2017-11-30 Materials #3

dhimmel opened this issue Nov 30, 2017 · 3 comments

Comments

@dhimmel
Copy link
Owner

dhimmel commented Nov 30, 2017

How Project Rephetio used Neo4j to predict drug repurposing

Thursday, November 30, 2017 on YouTube. Below is the event description from Meetup:

This meetup will explore Hetionet (https://neo4j.het.io), a public Neo4j database that encodes biomedical knowledge. Hetionet v1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types.

Project Rephetio applied Hetionet to predict new uses for existing compounds, an act called drug repurposing. We'll discuss the Cypher implementation of the algorithms used for relationship prediction on hetnets (networks with multiple node and relationship types).

We'll be taking questions live during the session but if you have any before hand be sure to post them in the #neo4j-online-meetup channel of the Neo4j users slack.

We'll be hosting this session on YouTube live.

Time

09:00 PST (UTC - 8 hours)
12:00 EST (UTC - 5 hours)
17:00 UTC
18:00 CEST (UTC + 1 hour)

About The Speaker

Daniel Himmelstein, a data scientist at the University of Pennsylvania, will lead the meetup.

Previously, Daniel has discussed Project Rephetio at GraphConnect 2016 and on the Graphistania podcast.

In addition, an introductory GraphGist on the project won the Open/Government Data category of the 2016 GraphGist Challenge.

@dhimmel
Copy link
Owner Author

dhimmel commented Nov 30, 2017

Meetup Outline

This meetup will go over how we used Neo4j in our study titled Project Rephetio:

rephetio-head

Project Rephetio is also available on Thinklab and as a Manubot manuscript. This project had two parts:

  1. Creating Hetionet, a hetnet of biomedical knowledge
  2. Predicting new uses for existing compounds (drugs)

Hetionet

Project Rephetio

Advanced Cypher

We'll go over computing degree-weighted path counts (DWPCs) in Cypher (discussion) though a series of steps.

Trails

Path count from Bupropion to nicotine dependence for the Compound–binds–Gene–participates–Pathway–participates–Disease metapath:

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
RETURN path

Note how relationship types are uniquely named for optimized querying, e.g. GpPW.

Modified RETURN statements to provide a table:

RETURN extract(node IN nodes(path) | node.name)

Or just return the path/trail count:

RETURN count(path) AS PC

Paths

Add the following condition to the WHERE statement to prevent paths with duplicate nodes (discussion):

  AND n1 <> n3

Optimizing the join index (discussion, see neo4j/neo4j#6030 for a radical proposal)

USING JOIN ON n2

Degree-weighted paths

Extract degrees along each path to compute a path_weight (also known as a "path-degree product")

WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  path,
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS path_weight
ORDER BY path_weight DESC
LIMIT 10

Sum weights for all paths to compute the DWPC:

RETURN
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC

Putting it altogether:

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-
  (n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.name = 'Bupropion'
  AND n4.name = 'nicotine dependence'
  AND n1 <> n3
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n2)),
  size((n2)-[:PARTICIPATES_GpPW]-()),
  size(()-[:PARTICIPATES_GpPW]-(n3)),
  size((n3)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
RETURN
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC

@hooligian
Copy link

Trying to access https://neo4j.het.io/browser/, but I'm getting a "WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver." error. A little digging on the net indicated that the neo4j.conf file would need to be updated to allow remote browser connections.

@dhimmel
Copy link
Owner Author

dhimmel commented Nov 30, 2017

@hooligian odd! I'm just as remote as you I believe. Can you try again? Or perhaps in a different browser? https://neo4j.het.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants