Relation proposer
LOKF keeps two layers in one file: human-facing
markdown links in the body, and typed frontmatter fields that carry the
kind of link. In practice the prose layer always runs ahead — you write
derived from [Orders](/datasets/orders.md) long before you remember to add
a derivedFrom: field. lokf propose closes that gap: it reads the links
you already wrote and suggests the typed edges they imply.
uv run lokf propose mykb/ # dry-run: table of proposalsuv run lokf propose mykb/ --json # same list, machine-readableuv run lokf propose mykb/ --min-confidence 0.5 # drop the weak onesuv run lokf propose mykb/ --apply --min-confidence 0.5 # write frontmatteruv run lokf propose mykb/ --json --apply # apply + JSON report of what was writtenOr from Python:
import lokffrom lokf.propose import apply, propose
bundle = lokf.load_bundle("mykb")proposals = propose(bundle) # optionally: vocab=..., concept=...for p in proposals: print(p.source.concept_id, p.link.text, p.relation.curie, p.confidence, p.rationale)
applied = apply(proposals, min_confidence=0.5) # round-trips the filesEach Proposal carries the source concept, the prose link (text,
target, and the sentence it sits in), the suggested relation (a
Relation from the vocabulary), a confidence
score, and a rationale — the evidence, so you can judge the suggestion
without opening the file. See the proposer in action in the
tutorial.
How the heuristics work
Section titled “How the heuristics work”The proposer is deliberately simple, and every rule is inspectable — the
whole cue table is one importable tuple, lokf.propose.CUE_TABLE.
Links first. extract_links() collects the markdown links in a
concept’s body — skipping images, fenced code blocks, and inline code — and
resolves each target against the bundle (root-relative like
/datasets/orders.md, file-relative, Concept ID, or full IRI all work).
Links that don’t resolve to a concept in the same bundle produce no
proposal: the proposer only wires up your own graph.
Cue phrases around the link. For each link, the sentence around it is
matched against a priority-ordered table of cue patterns: wording like
derived / computed / built from points at derivedFrom
(prov:wasDerivedFrom), depends / requires / needs at dependsOn
(dcterms:requires), measures / counts at measures, part of / within
at isPartOf, same as / alias at sameAs, attributed to / authored by
at wasAttributedTo, joins with / joined on at joinsWith, and so on
across the relation vocabulary. The first row
that matches
(and whose relation the source’s type may carry) wins. A link whose sentence
matches no cue at all falls back to a low-confidence relatedTo — the
weakest, most honest claim available.
Type-aware domains. A relation is only proposed where the schema says it
can live. Most relation slots are declared on Concept and apply to every
type, but measures is declared on Metric alone — so “measures” in a
Playbook’s prose never yields a measures proposal. The domains come from
lokf.vocabulary() (relation_slots["measures"].domains →
frozenset({'Metric'})), not from a hardcoded list, so schema changes flow
through automatically.
Skip what’s already asserted. An edge whose target IRI already appears
in the source’s frontmatter — under a named relation field or a
relations: entry — is not proposed again. Re-running lokf propose on a
fully-typed bundle proposes nothing, which makes it safe to run repeatedly
as the prose evolves.
Confidence
Section titled “Confidence”Confidence is a heuristic score, not a probability. Each cue carries a
base score reflecting how unambiguous its wording is (about 0.5 for vague
cues like from, up to 0.8 for same as); a cue sitting right next to the
link text earns an adjacency boost, and everything is capped at 0.95 —
nothing pattern-matched is ever certain. The no-cue relatedTo fallback
scores 0.25. The rationale states which case you’re in:
cue "derived" adjacent to link # base + adjacency boostcue "requires" in sentence # base onlyno cue phrase matched # relatedTo fallback--min-confidence F filters both the printed table and what --apply
writes; the Python apply() defaults to 0.0 (everything you pass it).
Pick a threshold by reading a dry run, not by treating the number as
calibrated — 0.5 keeps every cue-backed proposal and drops only the
relatedTo fallbacks.
Dry-run, then apply
Section titled “Dry-run, then apply”The default invocation changes nothing — it prints the proposal table and exits. That’s the intended workflow: read the rationale column, then apply.
--apply (or apply() in Python, which returns the proposals it actually
wrote) writes accepted proposals into the concept files using a round-trip
YAML editor (ruamel), so comments, key order, and quoting in your
frontmatter survive the edit. Duplicates are never written, and the body is
untouched. Where a proposal lands depends on the relation:
-
Named slot — relations with
is_slot=True(derivedFrom,dependsOn,measures, …) become ordinary frontmatter fields, the target’s IRI appended to the list. -
Reified
relations:entry — predicates from theRelationTypevocabulary that have no dedicated field (joinsWith,wasAttributedTo) are written as reified relations:relations:- predicate: joinsWithtarget: https://acme.example/knowledge/tables/customers
The two forms do not project to identical RDF. A named slot becomes a
direct triple with its bound predicate
(<metric> prov:wasDerivedFrom <dataset>); a relations: entry projects as
a reified statement
— an rdf:Statement node reached via lokf:relations that carries the
predicate and target — not a direct triple. The
knowledge-graph page flattens reified entries back into
labeled edges for display, so both forms look the same in the picture, but
SPARQL over bundle.graph() sees the difference.
--json composes with --apply: the proposals are still written, and the
JSON output reports the outcome — every proposal that was written gains
"applied": true.