Skip to content

Relation proposer

LOKF keeps two layers in one file: human-facing markdown links in the body, and typed frontmatter fields that carry the kind of link. In practice the prose layer always runs ahead — you write derived from [Orders](/datasets/orders.md) long before you remember to add a derivedFrom: field. lokf propose closes that gap: it reads the links you already wrote and suggests the typed edges they imply.

Terminal window
uv run lokf propose mykb/ # dry-run: table of proposals
uv run lokf propose mykb/ --json # same list, machine-readable
uv run lokf propose mykb/ --min-confidence 0.5 # drop the weak ones
uv run lokf propose mykb/ --apply --min-confidence 0.5 # write frontmatter
uv run lokf propose mykb/ --json --apply # apply + JSON report of what was written

Or from Python:

import lokf
from lokf.propose import apply, propose
bundle = lokf.load_bundle("mykb")
proposals = propose(bundle) # optionally: vocab=..., concept=...
for p in proposals:
print(p.source.concept_id, p.link.text, p.relation.curie,
p.confidence, p.rationale)
applied = apply(proposals, min_confidence=0.5) # round-trips the files

Each Proposal carries the source concept, the prose link (text, target, and the sentence it sits in), the suggested relation (a Relation from the vocabulary), a confidence score, and a rationale — the evidence, so you can judge the suggestion without opening the file. See the proposer in action in the tutorial.

The proposer is deliberately simple, and every rule is inspectable — the whole cue table is one importable tuple, lokf.propose.CUE_TABLE.

Links first. extract_links() collects the markdown links in a concept’s body — skipping images, fenced code blocks, and inline code — and resolves each target against the bundle (root-relative like /datasets/orders.md, file-relative, Concept ID, or full IRI all work). Links that don’t resolve to a concept in the same bundle produce no proposal: the proposer only wires up your own graph.

Cue phrases around the link. For each link, the sentence around it is matched against a priority-ordered table of cue patterns: wording like derived / computed / built from points at derivedFrom (prov:wasDerivedFrom), depends / requires / needs at dependsOn (dcterms:requires), measures / counts at measures, part of / within at isPartOf, same as / alias at sameAs, attributed to / authored by at wasAttributedTo, joins with / joined on at joinsWith, and so on across the relation vocabulary. The first row that matches (and whose relation the source’s type may carry) wins. A link whose sentence matches no cue at all falls back to a low-confidence relatedTo — the weakest, most honest claim available.

Type-aware domains. A relation is only proposed where the schema says it can live. Most relation slots are declared on Concept and apply to every type, but measures is declared on Metric alone — so “measures” in a Playbook’s prose never yields a measures proposal. The domains come from lokf.vocabulary() (relation_slots["measures"].domainsfrozenset({'Metric'})), not from a hardcoded list, so schema changes flow through automatically.

Skip what’s already asserted. An edge whose target IRI already appears in the source’s frontmatter — under a named relation field or a relations: entry — is not proposed again. Re-running lokf propose on a fully-typed bundle proposes nothing, which makes it safe to run repeatedly as the prose evolves.

Confidence is a heuristic score, not a probability. Each cue carries a base score reflecting how unambiguous its wording is (about 0.5 for vague cues like from, up to 0.8 for same as); a cue sitting right next to the link text earns an adjacency boost, and everything is capped at 0.95 — nothing pattern-matched is ever certain. The no-cue relatedTo fallback scores 0.25. The rationale states which case you’re in:

cue "derived" adjacent to link # base + adjacency boost
cue "requires" in sentence # base only
no cue phrase matched # relatedTo fallback

--min-confidence F filters both the printed table and what --apply writes; the Python apply() defaults to 0.0 (everything you pass it). Pick a threshold by reading a dry run, not by treating the number as calibrated — 0.5 keeps every cue-backed proposal and drops only the relatedTo fallbacks.

The default invocation changes nothing — it prints the proposal table and exits. That’s the intended workflow: read the rationale column, then apply.

--apply (or apply() in Python, which returns the proposals it actually wrote) writes accepted proposals into the concept files using a round-trip YAML editor (ruamel), so comments, key order, and quoting in your frontmatter survive the edit. Duplicates are never written, and the body is untouched. Where a proposal lands depends on the relation:

  • Named slot — relations with is_slot=True (derivedFrom, dependsOn, measures, …) become ordinary frontmatter fields, the target’s IRI appended to the list.

  • Reified relations: entry — predicates from the RelationType vocabulary that have no dedicated field (joinsWith, wasAttributedTo) are written as reified relations:

    relations:
    - predicate: joinsWith
    target: https://acme.example/knowledge/tables/customers

The two forms do not project to identical RDF. A named slot becomes a direct triple with its bound predicate (<metric> prov:wasDerivedFrom <dataset>); a relations: entry projects as a reified statement — an rdf:Statement node reached via lokf:relations that carries the predicate and target — not a direct triple. The knowledge-graph page flattens reified entries back into labeled edges for display, so both forms look the same in the picture, but SPARQL over bundle.graph() sees the difference.

--json composes with --apply: the proposals are still written, and the JSON output reports the outcome — every proposal that was written gains "applied": true.