Skip to content

Build your own knowledge base

The reference bundle shows the destination; this tutorial walks the road. Starting from an empty directory you will author a three-concept bundle, validate it, let the proposer upgrade your prose links into typed relations, and end with an RDF graph you can query and visualize.

Every command below is copy-pasteable and assumes a clone of this repository (git clone https://github.com/nicholsn/lokf.git && cd lokf && uv sync — see Getting started), run from the repository root so lokf.yaml is in reach of the validator.

A bundle is just a directory with an index.md. The frontmatter of that one file is what lifts the whole bundle into Linked Data: base_iri turns Concept IDs into IRIs, context names the JSON-LD context that gives every key its meaning.

Terminal window
mkdir -p mykb/datasets mykb/metrics mykb/glossary
mykb/index.md
---
lokf_version: "0.1"
okf_version: "0.1"
base_iri: https://mykb.example/kb/
context: https://w3id.org/lokf/context.jsonld
title: My Knowledge Base
description: A three-concept starter bundle for the LOKF toolkit tutorial.
publisher:
type: Organization
id: https://mykb.example
name: Example Co
---
# Datasets
* [Orders](datasets/orders.md) - One row per completed order.
# Metrics
* [Order Conversion Rate](metrics/order-conversion-rate.md) - Share of sessions that convert.
# Glossary
* [Conversion](glossary/conversion.md) - A session that completes an order.

type is the only required key. Write the connections in prose for now — markdown links in the body, the way you’d write documentation anyway. Step 4 will offer to promote them to typed frontmatter.

mykb/datasets/orders.md
---
type: Dataset
title: Orders
description: One row per completed order, all storefronts.
tags: [commerce, core]
---
# Overview
Every completed order lands here within five minutes. See the
[Order Conversion Rate](/metrics/order-conversion-rate.md) for the north-star
metric built on this dataset.
mykb/metrics/order-conversion-rate.md
---
type: Metric
title: Order Conversion Rate
description: Share of sessions that end in a completed order.
unit: percent
formula: 100 * sessions_with_order / total_sessions
tags: [commerce, north-star]
---
# Definition
**Order Conversion Rate** is derived from [Orders](/datasets/orders.md).
It measures [conversion](/glossary/conversion.md) across all storefronts.
# Notes
- Sessions from internal test accounts are excluded.
mykb/glossary/conversion.md
---
type: GlossaryTerm
title: Conversion
definition: A session in which the visitor completes at least one order.
abbreviation: CVR
---
# Notes
A conversion depends on [Orders](/datasets/orders.md) arriving inside the
session window.

The JSON Schema validates documents, so there is one assembly step: fold the concepts into a single KnowledgeBundle JSON, injecting each concept’s id where it is missing (a setdefault, resolved from base_iri + Concept ID — explicit ids are kept), exactly as lokf-build does for the reference bundle.

Terminal window
uv run python - <<'EOF'
import json, lokf
bundle = lokf.load_bundle("mykb")
doc = dict(bundle.meta)
doc["concepts"] = [dict(c.data, id=bundle.iri(c)) for c in bundle.concepts]
json.dump(doc, open("mykb.bundle.json", "w"), indent=2)
EOF
uv run linkml-validate -s lokf.yaml -C KnowledgeBundle mykb.bundle.json
# -> No issues found

See Validation for the JSON Schema / SHACL split.

Right now the bundle’s graph has no edges between concepts — the connections live only in prose. The proposer reads the markdown links in each body, looks at the surrounding sentence and the concept types, and suggests typed frontmatter fields. Dry-run first:

Terminal window
uv run lokf propose mykb/
SOURCE LINK PREDICATE CONF RATIONALE
datasets/orders Order Conversion Rate dcterms:references 0.70 cue "see" adjacent to link
glossary/conversion Orders dcterms:requires 0.85 cue "depends" adjacent to link
metrics/order-conversion-rate Orders prov:wasDerivedFrom 0.90 cue "derived" adjacent to link
metrics/order-conversion-rate conversion lokf:measures 0.85 cue "measures" adjacent to link

Each row is one prose link: the source concept, the link, the suggested predicate, a confidence score, and the rationale — which cue phrase triggered it, and where. “Is derived from [Orders]…” became prov:wasDerivedFrom; “measures [conversion]…” became lokf:measures (and measures is only ever proposed for Metrics). Add --json for a machine-readable version of the same list.

Happy with what you see? Apply it:

Terminal window
uv run lokf propose mykb/ --apply --min-confidence 0.5

The table prints again, followed by a log of what was written:

wrote references -> https://mykb.example/kb/metrics/order-conversion-rate in mykb/datasets/orders.md
wrote dependsOn -> https://mykb.example/kb/datasets/orders in mykb/glossary/conversion.md
wrote derivedFrom -> https://mykb.example/kb/datasets/orders in mykb/metrics/order-conversion-rate.md
wrote measures -> https://mykb.example/kb/glossary/conversion in mykb/metrics/order-conversion-rate.md
applied 4 of 4 proposal(s).

--apply writes the accepted proposals back into the concept files with a round-trip YAML editor, so your comments, ordering, and formatting survive. git diff (or plain diff) shows exactly what changed — for the metric:

mykb/metrics/order-conversion-rate.md
unit: percent
formula: 100 * sessions_with_order / total_sessions
tags: [commerce, north-star]
derivedFrom:
- https://mykb.example/kb/datasets/orders
measures:
- https://mykb.example/kb/glossary/conversion
---

The prose is untouched — body links stay for humans, the frontmatter now carries the machine-readable layer. How the heuristics work covers cue phrases, domains, confidence, and the limits.

Load the bundle and serialize the graph — same two calls as the five-minute tour:

Terminal window
uv run python - <<'EOF'
import lokf
bundle = lokf.load_bundle("mykb")
graph = bundle.graph()
print(len(graph), "triples")
print(graph.serialize(format="turtle"))
EOF

Three markdown files, 23 triples, real edges (abridged):

<https://mykb.example/kb/metrics/order-conversion-rate> a lokf:Metric ;
schema1:name "Order Conversion Rate" ;
schema1:unitText "percent" ;
prov:wasDerivedFrom <https://mykb.example/kb/datasets/orders> ;
lokf:formula "100 * sessions_with_order / total_sessions" ;
lokf:measures <https://mykb.example/kb/glossary/conversion> .
<https://mykb.example/kb/glossary/conversion> a schema1:DefinedTerm ;
dcterms:requires <https://mykb.example/kb/datasets/orders> ;
skos:definition "A session in which the visitor completes at least one order." .
<https://mykb.example/kb/datasets/orders> a schema1:Dataset ;
dcterms:references <https://mykb.example/kb/metrics/order-conversion-rate> ;
schema1:name "Orders" .

From here it’s standard RDF tooling: SPARQL over bundle.graph(), SHACL with the generated shapes, OWL reasoning — see Markdown to RDF.

The Knowledge graph page on this site is the pattern to copy: lokf.export.to_cytoscape() turns a bundle into Cytoscape.js elements whose edges are the RDF predicates from your typed relations (CURIE-labeled — prov:wasDerivedFrom, lokf:measures, …); plain body hyperlinks are deliberately excluded, so the picture shows exactly what step 4 asserted.

Terminal window
uv run python - <<'EOF'
import json, lokf
from lokf.export import to_cytoscape
bundle = lokf.load_bundle("mykb")
json.dump(to_cytoscape(bundle), open("graph.json", "w"), indent=2)
EOF

Or skip the file entirely: lokf serve mykb/ publishes a live graph explorer (and a SPARQL endpoint) for your bundle with no build step. This documentation site uses the same projection — the lokf export command emits graph.json and the schema.org Dataset JSON-LD (from lokf.export.dataset_search_jsonld()) that its knowledge-graph page and <head> consume, so your datasets are discoverable by search engines.

mykb/ # markdown in …
├── index.md
├── datasets/orders.md
├── metrics/order-conversion-rate.md
└── glossary/conversion.md
# … knowledge graph out
mykb.bundle.json # validated against KnowledgeBundle
graph.json # Cytoscape.js elements
+ 23 RDF triples # bundle.graph()

Author in markdown, validate against the schema, let the proposer type your links, and everything downstream — SPARQL, SHACL, visualization, dataset search markup — comes for free.