1. Scale Without Quality
Graphs that grow large without any validation, constraint checking, or entity resolution. The graph is big; the graph is wrong; the graph is loud. Signature: high node count, low precision, no provenance.
GraphSlop is a curated catalogue of failures in graph extraction, GraphRAG, and ontology-quality systems. This page explains the failure taxonomy, the evidence standard, and the scope of this release.
Each entry in GraphSlop is a concrete, source-backed critique of a system, paper, demo, or tool that promises structured knowledge and delivers something else. Every entry traces to a source artifact — a published paper, a public demo, a benchmark, or a reproducible example.
This is not a blog. We don't rate "good" or "bad"; we show how and why a claim collapses under inspection.
v0 includes 10 curated entries across 5 top-level failure categories. This is a tight, high-signal slice — not a comprehensive survey. We chose these categories because they account for roughly 80 % of the failures we've seen so far, and each category describes a distinct structural weakness rather than a superficial symptom.
Graphs that grow large without any validation, constraint checking, or entity resolution. The graph is big; the graph is wrong; the graph is loud. Signature: high node count, low precision, no provenance.
Ontologies with no hierarchy, no constraints, no domain grounding. Everything is a root node; every relation is "related to." Signature: 10+ root nodes, no subsumption, no type safety.
Systems that claim multi-hop reasoning but deliver single-hop retrieval with extra steps. The reasoning chain is longer, not deeper. Signature: long chains, no cross-hop validation, confidence stays high.
Models trained or evaluated on noisy, uncurated, or adversarial corpora — and still produce confident predictions. The graph reflects the noise, not the signal. Signature: high confidence on low-quality input; no abstention mechanism.
Ontologies and schemas that change between runs because the extraction prompt is paraphrased rather than enforced. The schema is a suggestion, not a contract. Signature: type renaming across deployments; predicate count drifts; no versioning.
Every entry in GraphSlop carries at least one of these labels:
A URL to a paper, demo, benchmark, or public artifact.
The failure class the entry belongs to.
Where in the graph pipeline the failure occurs (entity extraction, entity resolution, ontology construction, etc.).
A 1–10 subjective rating of how badly the claim overreaches relative to the evidence.
A plain-language explanation of the failure mechanism.
Entries in v0 are hand-curated. Future releases may include community submissions, but they will need to meet the same evidence standard.
Four types of existing tools set user expectations around browsing and evidence in graph-AI evaluation. GraphSlop sits alongside them but serves a different job.
Litmaps, ResearchRabbit, Connected Papers. High-quality graph visualization for discovery — but they surface connections, not critique. GraphSlop surfaces why a claim fails, not just which papers cite each other.
Scite Smart Citations, Consensus AI, Elicit. They prove that credibility signals at the entry point matter (Scite indexes 1.6B+ citations). GraphSlop borrows this lesson inline: every card shows failure-class tag, evidence strength, and source link — not just a "supports/contrasts" badge.
GraphRAG-Bench (ICLR'26), WildGraphBench, OSKGC. They provide rigorous taxonomy structure organized by difficulty × task type. GraphSlop takes the taxonomy but shifts from "which system scores what" to "what specifically broke and why." Benchmarks give numbers; GraphSlop gives mechanism-level explanations.
StructEval. Shows reusable taxonomy-aware benchmark scripts. GraphSlop shares the taxonomy discipline but is not a benchmark runner — it is a curated catalogue where humans evaluate, not scripts.
The gap: no existing product combines structured critique, graph topology, and traceable evidence in a single browsable catalog. That gap is what GraphSlop targets.
It is a starting point for anyone who wants to talk about graph-system failures without hand-waving.