> "To make sure everyone understands that, I prefer label property graphs over RDF."
I have two major issues with virtually all graph DBMSs that are not RDF/SPARQL-based:
1) They do not allow structure-preserving querying. That is, I query a graph and want the results to be a smaller graph. This is trivial in SQL, you just 'SELECT * FROM x WHERE ...' and the result set you get is tabular just like the table x. In SPARQL, there are a CONSTRUCT/DESCRIBE queries that do just that - give you the results as a graph.
2) They don't use any (internationally recognized) standard to represent graph data. RDF is the only such format known to me (ignore all the semantic web stuff associated with it and just consider the format).
230k edges is peanuts for a graph db. It's like when the number of rows times columns in your SQL DB is 230k. NASA could (should?) have just used Oxigraph, RDF4J, or Jena. Stardog and Ontotext are the paid options. However, it is quite nice to see more interest in graph-based DBMSs in general!
> “Which employees have cross-disciplinary expertise in AI/ML?”
Regarding the study itself, I did not understand who is the target user of this. I would rather be more interested in the Lessons Learned 2.0 study (I understand it was attempted once before [1]). I don't think the study at hand would be able to correctly answer questions about expertise.
On the technical side, as far as I understand, the cosine similarity was computed per triplet? In that case, I could see how pgvector could be used for this. Relevance expansion is the only thing in the article that made me think that it would be cool if it works well. But I could see how in a combo of a regular RDF DBMS + pgvector, one could first do a cosine similarity query via pgvector and then compute an (S)CBD [2] of the subject (the from node) of the triplet.
"They do not allow structure-preserving querying. That is, I query a graph and want the results to be a smaller graph."
I'm not sure what you mean by this. The result of a query in neo4j is a set of nodes with specified relations linking them. It is much more flexible than the way SQL can only return a single table.
"In the RETURN part of your query, you define which parts of the pattern you are interested in. It can
be nodes, relationships, or properties on these"
you can return all nodes, relationships, and paths that match a query by using this syntax
MATCH p = (a {name: 'A'})-[r]->(b)
RETURN *
This is the exact opposite of a rectangular result set.
I have two major issues with virtually all graph DBMSs that are not RDF/SPARQL-based:
1) They do not allow structure-preserving querying. That is, I query a graph and want the results to be a smaller graph. This is trivial in SQL, you just 'SELECT * FROM x WHERE ...' and the result set you get is tabular just like the table x. In SPARQL, there are a CONSTRUCT/DESCRIBE queries that do just that - give you the results as a graph.
2) They don't use any (internationally recognized) standard to represent graph data. RDF is the only such format known to me (ignore all the semantic web stuff associated with it and just consider the format).
230k edges is peanuts for a graph db. It's like when the number of rows times columns in your SQL DB is 230k. NASA could (should?) have just used Oxigraph, RDF4J, or Jena. Stardog and Ontotext are the paid options. However, it is quite nice to see more interest in graph-based DBMSs in general!
> “Which employees have cross-disciplinary expertise in AI/ML?”
Regarding the study itself, I did not understand who is the target user of this. I would rather be more interested in the Lessons Learned 2.0 study (I understand it was attempted once before [1]). I don't think the study at hand would be able to correctly answer questions about expertise.
On the technical side, as far as I understand, the cosine similarity was computed per triplet? In that case, I could see how pgvector could be used for this. Relevance expansion is the only thing in the article that made me think that it would be cool if it works well. But I could see how in a combo of a regular RDF DBMS + pgvector, one could first do a cosine similarity query via pgvector and then compute an (S)CBD [2] of the subject (the from node) of the triplet.
[1]: https://youtu.be/QEBVoultYJg?t=1653
[2]: https://patterns.dataincubator.org/book/bounded-description....