Taxonomy & Evidence Standard

GraphSlop is a curated catalogue of failures in graph extraction, GraphRAG, and ontology-quality systems. This page explains the failure taxonomy, the evidence standard, and the scope of this release.

What this is

Each entry in GraphSlop is a concrete, source-backed critique of a system, paper, demo, or tool that promises structured knowledge and delivers something else. Every entry traces to a source artifact — a published paper, a public demo, a benchmark, or a reproducible example.

This is not a blog. We don't rate "good" or "bad"; we show how and why a claim collapses under inspection.

v0 Release Scope

v0 includes 10 curated entries across 5 top-level failure categories. This is a tight, high-signal slice — not a comprehensive survey. We chose these categories because they account for roughly 80 % of the failures we've seen so far, and each category describes a distinct structural weakness rather than a superficial symptom.

Failure Taxonomy (v0)

1. Scale Without Quality

Graphs that grow large without any validation, constraint checking, or entity resolution. The graph is big; the graph is wrong; the graph is loud. Signature: high node count, low precision, no provenance.

2. Ontology Soup

Ontologies with no hierarchy, no constraints, no domain grounding. Everything is a root node; every relation is "related to." Signature: 10+ root nodes, no subsumption, no type safety.

3. Multi-Hop Fantasy

Systems that claim multi-hop reasoning but deliver single-hop retrieval with extra steps. The reasoning chain is longer, not deeper. Signature: long chains, no cross-hop validation, confidence stays high.

4. Noisy Corpus Confidence

Models trained or evaluated on noisy, uncurated, or adversarial corpora — and still produce confident predictions. The graph reflects the noise, not the signal. Signature: high confidence on low-quality input; no abstention mechanism.

5. Prompt Schema Drift

Ontologies and schemas that change between runs because the extraction prompt is paraphrased rather than enforced. The schema is a suggestion, not a contract. Signature: type renaming across deployments; predicate count drifts; no versioning.

Evidence Standard

Every entry in GraphSlop carries at least one of these labels:

Source

A URL to a paper, demo, benchmark, or public artifact.

Category

The failure class the entry belongs to.

Pipeline Stage

Where in the graph pipeline the failure occurs (entity extraction, entity resolution, ontology construction, etc.).

Slop Severity

A 1–10 subjective rating of how badly the claim overreaches relative to the evidence.

Why Slop

A plain-language explanation of the failure mechanism.

Entries in v0 are hand-curated. Future releases may include community submissions, but they will need to meet the same evidence standard.

How to Use This Catalog

  1. Browse by category — each category card links to all entries in that failure class.
  2. Check the Slop of the Day — one highlighted entry, rotated by calendar day.
  3. Search by keyword — find entries matching specific claims, tools, or techniques.
  4. Use the "why it's slop" explanations — each entry includes a mechanism-level explanation, not just a verdict.

How This Compares

Four types of existing tools set user expectations around browsing and evidence in graph-AI evaluation. GraphSlop sits alongside them but serves a different job.

Citation-network tools

Litmaps, ResearchRabbit, Connected Papers. High-quality graph visualization for discovery — but they surface connections, not critique. GraphSlop surfaces why a claim fails, not just which papers cite each other.

Evidence-quality tools

Scite Smart Citations, Consensus AI, Elicit. They prove that credibility signals at the entry point matter (Scite indexes 1.6B+ citations). GraphSlop borrows this lesson inline: every card shows failure-class tag, evidence strength, and source link — not just a "supports/contrasts" badge.

Benchmark evaluators

GraphRAG-Bench (ICLR'26), WildGraphBench, OSKGC. They provide rigorous taxonomy structure organized by difficulty × task type. GraphSlop takes the taxonomy but shifts from "which system scores what" to "what specifically broke and why." Benchmarks give numbers; GraphSlop gives mechanism-level explanations.

Structured evaluation frameworks

StructEval. Shows reusable taxonomy-aware benchmark scripts. GraphSlop shares the taxonomy discipline but is not a benchmark runner — it is a curated catalogue where humans evaluate, not scripts.

The gap: no existing product combines structured critique, graph topology, and traceable evidence in a single browsable catalog. That gap is what GraphSlop targets.

What v0 Is Not

  • A literature review
  • A benchmark comparison
  • A product recommendation engine
  • An automated critique generator

It is a starting point for anyone who wants to talk about graph-system failures without hand-waving.