Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read any data using custom schemas #918

Open
4 tasks done
swayam18 opened this issue May 27, 2019 · 10 comments
Open
4 tasks done

Unable to read any data using custom schemas #918

swayam18 opened this issue May 27, 2019 · 10 comments

Comments

@swayam18
Copy link

swayam18 commented May 27, 2019

I am unable to read any data from Neo4j using this library and a custom schema.

Here is how I am setting up the connection:

object ExtractUsernames extends MorpheusApp {
  val neo4j = connectNeo4j("bolt://xx.xx.xx.xx:7687", "neo4j", "xxxxxxxxx")
  implicit val morpheus: MorpheusSession = MorpheusSession.local()

  val schemaFile = Source.fromFile(getClass.getResource("/schema.json").getPath).getLines.mkString
  val schema = PropertyGraphSchema.fromJson(schemaFile)
  private val datasource = GraphSources.cypher.neo4j(neo4j.config, Some(schema))

  morpheus.registerSource(Namespace("Neo4j"), datasource)
}

And this is what our schema.json looks like:

{
  "version": 1,
  "labelPropertyMap": [
    {
      "labels": [
        "User"
      ],
      "properties": {
        "username": "STRING"
      }
    }
  ],
  "relTypePropertyMap": []
}

This is the query I am trying to run:

 val result = morpheus.cypher(
    s"""
       |FROM Neo4j.graph
       |MATCH (i:User)
       |RETURN i.username
     """.stripMargin)

  result.show

However, when I run the query, I get an empty table:

╔════════════╗
║ i.username ║
╚════════════╝
(no rows)

I have verified that:

  • The schema file can be read.
  • Neo4j is accessible and the port is open to the machine running this code.
  • There is a graph called Neo4j.graph available in morpheus.catalog.graphNames
  • The same error is present in atleast 2 versions of Neo4j: 3.2.0 and 3.4.1

Morpheus version: 0.4.0
Spark version: 2.4.3
Neo4j version: 3.2.0 and 3.4.1

Does anyone have any idea as to what is going on?

@swayam18
Copy link
Author

swayam18 commented May 29, 2019

Does the team need me to add any more details? @s1ck

@s1ck
Copy link
Contributor

s1ck commented May 29, 2019

I could not reproduce the error on master, Spark 2.4.3 and Neo4j 3.4.1.

Does the query return any results when executed in Neo4j, e.g. via browser? Maybe you misspelled the label (lower/uppercase)?

AFAIK there were no recent changes to the Neo4j data source, but you could still try Morpheus 0.4.1.

@swayam18
Copy link
Author

I ran the same query on the browser and it works fine.

Did you try your test with a custom schema?

@s1ck
Copy link
Contributor

s1ck commented Jun 10, 2019

Yes, I did the exact same experiment.

@Mats-SX
Copy link
Member

Mats-SX commented Jun 17, 2019

Closing this as not reproducible. Please reopen if this is still an issue.

@Mats-SX Mats-SX closed this as completed Jun 17, 2019
@swayam18
Copy link
Author

@Mats-SX Can the custom schema be a subset of the actual graph schema? I still can't get it to work, even after updating to 0.4.2 of morpheus

@swayam18
Copy link
Author

swayam18 commented Jul 25, 2019

@Mats-SX I figured out that this happens when nodes have more than one label on them.
You can reproduce this by adding another label, say :Offline to every :User, without adding this new label to the schema.

I found a query being run which is related to this:

MATCH (e:`User`)
WHERE length(labels(e)) = 1
RETURN id(e) AS ___morpheusID, e.username

Why does morpheus check for the number of labels on a node?

@Mats-SX Mats-SX reopened this Jul 26, 2019
@Mats-SX
Copy link
Member

Mats-SX commented Jul 26, 2019

Ah, that makes sense! That's an interesting idea. I guess at this moment no, we don't allow the schema to be a subset of the actual schema, but I can definitely see why you would be interested in that sort of feature. I don't see directly why we wouldn't allow that, but I'll need to discuss it with the team.

Nice find, thank you!

@swayam18
Copy link
Author

Great, thanks for the update. Since Neo4j doesn't enforce a strict schema it makes sense to let users decide the properties they care about. On your end, you can simply reject the spark job if there is a mismatch between the data being read and the schema it expects

@DarthMax
Copy link
Contributor

DarthMax commented Jul 30, 2019

Actually there is a way one could achieve this.
The problem is about how the Morpheus schema represents nodes internally. Nodes are grouped by their label set, e.g. :User:Offline and :User:Online. The JSON representation uses the same separation.

So if you want to query for all users, you have to describe every existing label combination with :User in the schema:

{
  "version": 1,
  "labelPropertyMap": [
    {
      "labels": [
        "User",
        "Offline"
      ],
      "properties": {
        "username": "STRING"
      }
    },
    {
      "labels": [
        "User",
        "Online"
      ],
      "properties": {
        "username": "STRING"
      }
    }
  ],
  "relTypePropertyMap": []
}

This will then allow you to query for every existing :User just as you did above

FROM Neo4j.graph
MATCH (i:User)
RETURN i.username

This is of course not ideal, but will allow you to accomplish the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants