flowchart LR
A[📝 Protégé Desktop] --> B[🗄️ GraphDB]
D[🐍 Python Scripts] --> B
B --> E[🤖 LLM Applications]
B --> F[🔀 MOE Systems]
B --> G[🔗 SPARQL Endpoints]
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#1565c0
classDef database fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#ef6c00
classDef output fill:#f1f8e9,stroke:#388e3c,stroke-width:2px,color:#2e7d32
class A,D input
class B database
class E,F,G output
GraphDB Setup & Integration
Setting up GraphDB for ontology storage and SPARQL queries
GraphDB is a powerful semantic database that serves as the backbone for storing and querying ontologies at scale. This guide covers setup, configuration, and integration with our ontology workflow.
What is GraphDB?
GraphDB is an enterprise-ready semantic graph database that:
- Stores RDF triples efficiently at massive scale
- Supports SPARQL 1.1 queries and updates
- Performs reasoning using various algorithms
- Provides REST APIs for programmatic access
- Integrates with Protégé and other tools
Why GraphDB for Our Project?
GraphDB serves as the central semantic hub that connects all components of our ontology-driven AI system:
GraphDB Integration Benefits:
- Centralized Knowledge: Single source of truth for ontological data
- SPARQL Interface: Standard query language for semantic data
- Reasoning Support: Automatic inference and consistency checking
- Scalability: Handles large-scale ontological datasets efficiently
Docker Setup
1. Launch GraphDB
Using the provided Docker configuration:
# Navigate to project directory
cd docker/
# Start GraphDB service
docker-compose -f docker-compose-graphdb.yml up -d
# Check if running
docker-compose -f docker-compose-graphdb.yml ps2. Access GraphDB Workbench
Open your browser and navigate to:
http://localhost:7200
Default credentials (first time setup): - Username: admin - Password: admin
3. Docker Configuration Details
# docker-compose-graphdb.yml
version: '3.8'
services:
graphdb:
image: ontotext/graphdb:10.0.0
ports:
- "7200:7200"
volumes:
- graphdb-data:/opt/graphdb/home
environment:
- GDB_JAVA_OPTS=-Xmx2g
volumes:
graphdb-data:Repository Setup
1. Create a New Repository
Access Workbench: Go to
http://localhost:7200Setup Repositories: Click “Setup” → “Repositories”
Create Repository: Click “Create new repository”
Repository Type: Select “GraphDB Repository”
Configuration:
Repository ID: plant-ontology Repository title: Plant Disease Ontology Repository Storage folder: (leave default) Base URL: http://example.org/plants/
2. Repository Settings
Advanced Settings for optimal performance:
# Reasoning
Enable RDFS/OWL reasoning: true
Reasoning level: OWL-Horst (optimized)
# Query timeout
Query timeout: 60 seconds
# Memory settings
Entity pool size: 200000
Statement indices: posc,pso,osp,spo
Importing Ontologies
Method 1: Web Interface
Navigate to repository: Select “plant-ontology”
Import data: Go to “Import” → “RDF”
Upload files:
- pizza.owl - PizzaTutorial.rdf - your_custom_ontology.owlImport settings:
- Base URI: Keep default or set custom
- Named graphs: Optional grouping
- Processing: Enable reasoning
Method 2: Programmatic Import
import requests
import os
# GraphDB connection details
GRAPHDB_URL = "http://localhost:7200"
REPOSITORY = "plant-ontology"
def upload_ontology(file_path, context_uri=None):
"""Upload an ontology file to GraphDB"""
# Prepare upload URL
upload_url = f"{GRAPHDB_URL}/repositories/{REPOSITORY}/statements"
# Headers for RDF data
headers = {
'Content-Type': 'application/rdf+xml'
}
# Add context (named graph) if specified
params = {}
if context_uri:
params['context'] = f"<{context_uri}>"
# Read and upload file
with open(file_path, 'rb') as f:
response = requests.post(
upload_url,
headers=headers,
params=params,
data=f.read()
)
if response.status_code == 204:
print(f"✅ Successfully uploaded {file_path}")
else:
print(f"❌ Failed to upload {file_path}: {response.text}")
# Example usage
upload_ontology("ontologies/pizza.owl", "http://pizza.org")
upload_ontology("ontologies/plant_disease.owl", "http://plants.org")Method 3: SPARQL Update
# Load ontology via SPARQL
LOAD <file:///path/to/ontology.owl> INTO GRAPH <http://example.org/plants>
# Or from URL
LOAD <https://example.org/remote_ontology.owl> INTO GRAPH <http://example.org/remote>
SPARQL Queries
Basic Queries
# List all classes
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class rdf:type rdfs:Class .
OPTIONAL { ?class rdfs:label ?label }
}
LIMIT 50
# Find plant diseases
PREFIX plant: <http://example.org/plants/>
PREFIX disease: <http://example.org/diseases/>
SELECT ?plant ?disease WHERE {
?plant plant:hasDisease ?disease .
?plant rdf:type plant:Crop .
}
Advanced Reasoning Queries
# Infer treatment recommendations
PREFIX plant: <http://example.org/plants/>
PREFIX treatment: <http://example.org/treatments/>
SELECT ?plant ?disease ?treatment WHERE {
?plant plant:hasDisease ?disease .
?disease rdf:type ?diseaseType .
?diseaseType treatment:recommendedTreatment ?treatment .
}
Python Integration
Setup SPARQLWrapper
from SPARQLWrapper import SPARQLWrapper, JSON, POST, GET
import json
class GraphDBConnector:
def __init__(self, endpoint="http://localhost:7200/repositories/plant-ontology"):
self.endpoint = endpoint
self.sparql = SPARQLWrapper(endpoint)
self.update_endpoint = endpoint + "/statements"
def query(self, sparql_query):
"""Execute SPARQL SELECT query"""
self.sparql.setQuery(sparql_query)
self.sparql.setReturnFormat(JSON)
self.sparql.setMethod(GET)
try:
results = self.sparql.query().convert()
return results["results"]["bindings"]
except Exception as e:
print(f"Query error: {e}")
return []
def update(self, sparql_update):
"""Execute SPARQL UPDATE query"""
self.sparql.setQuery(sparql_update)
self.sparql.setMethod(POST)
try:
self.sparql.query()
return True
except Exception as e:
print(f"Update error: {e}")
return False
def get_all_classes(self):
"""Retrieve all ontology classes"""
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class rdf:type rdfs:Class .
OPTIONAL { ?class rdfs:label ?label }
}
ORDER BY ?class
"""
return self.query(query)
def find_plant_diseases(self, plant_type=None):
"""Find diseases affecting plants"""
filter_clause = ""
if plant_type:
filter_clause = f"FILTER (?plantType = <{plant_type}>)"
query = f"""
PREFIX plant: <http://example.org/plants/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?plant ?plantType ?disease WHERE {{
?plant plant:hasDisease ?disease .
?plant rdf:type ?plantType .
{filter_clause}
}}
"""
return self.query(query)
# Usage example
db = GraphDBConnector()
# Get all classes
classes = db.get_all_classes()
for cls in classes:
print(f"Class: {cls.get('class', {}).get('value', '')}")
print(f"Label: {cls.get('label', {}).get('value', 'No label')}")
# Find diseases
diseases = db.find_plant_diseases()
for disease in diseases:
print(f"Plant: {disease['plant']['value']}")
print(f"Disease: {disease['disease']['value']}")Integration with Protégé
1. Connect Protégé to GraphDB
In Protégé Desktop:
File → New → Create from database
Connection settings:
JDBC URL: jdbc:graphdb:http://localhost:7200/repositories/plant-ontology Driver: GraphDB JDBC Driver Username: admin Password: adminOr use SPARQL endpoint:
SPARQL Endpoint: http://localhost:7200/repositories/plant-ontology
2. Publish from Protégé to GraphDB
Method 1: Manual Export/Import
# In Protégé: File → Export → RDF/XML
# Then upload to GraphDB via web interfaceMethod 2: Direct Connection
# Export from Protégé and upload programmatically
from owlready2 import *
# Load ontology from Protégé
onto = get_ontology("file://path/to/protege_ontology.owl").load()
# Convert to RDF/XML string
rdf_data = onto.as_rdf()
# Upload to GraphDB
import requests
response = requests.post(
"http://localhost:7200/repositories/plant-ontology/statements",
headers={'Content-Type': 'application/rdf+xml'},
data=rdf_data
)Pydantic Integration for Structured Data
1. Ontology-Driven Model Generation
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class DiseaseType(str, Enum):
FUNGAL = "http://example.org/diseases/FungalDisease"
VIRAL = "http://example.org/diseases/ViralDisease"
BACTERIAL = "http://example.org/diseases/BacterialDisease"
class Symptom(BaseModel):
name: str = Field(..., description="Symptom name")
severity: int = Field(..., ge=1, le=10, description="Severity scale 1-10")
location: str = Field(..., description="Where symptom appears")
class Config:
schema_extra = {
"example": {
"name": "leaf_yellowing",
"severity": 7,
"location": "leaves"
}
}
class Disease(BaseModel):
disease_id: str = Field(..., description="Disease identifier")
name: str = Field(..., description="Disease name")
type: DiseaseType = Field(..., description="Disease classification")
symptoms: List[Symptom] = Field(..., description="Associated symptoms")
treatment: Optional[str] = Field(None, description="Recommended treatment")
@classmethod
def from_graphdb(cls, disease_uri: str, db_connector: GraphDBConnector):
"""Create Disease model from GraphDB data"""
query = f"""
PREFIX disease: <http://example.org/diseases/>
PREFIX symptom: <http://example.org/symptoms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name ?type ?symptom ?symptomName WHERE {{
<{disease_uri}> rdfs:label ?name .
<{disease_uri}> rdf:type ?type .
<{disease_uri}> disease:hasSymptom ?symptom .
?symptom rdfs:label ?symptomName .
}}
"""
results = db_connector.query(query)
# Process results and create Pydantic model
if results:
symptoms = [
Symptom(
name=result['symptomName']['value'],
severity=5, # Default, could be queried
location="plant" # Default, could be queried
)
for result in results
]
return cls(
disease_id=disease_uri,
name=results[0]['name']['value'],
type=results[0]['type']['value'],
symptoms=symptoms
)
return None
class Plant(BaseModel):
plant_id: str = Field(..., description="Plant identifier")
scientific_name: str = Field(..., description="Scientific name")
common_name: str = Field(..., description="Common name")
diseases: List[Disease] = Field(default=[], description="Associated diseases")
def add_disease_from_graphdb(self, disease_uri: str, db_connector: GraphDBConnector):
"""Add disease information from GraphDB"""
disease = Disease.from_graphdb(disease_uri, db_connector)
if disease:
self.diseases.append(disease)2. LLM Integration with Structured Data
from openai import OpenAI
import json
class OntologyLLMIntegration:
def __init__(self, db_connector: GraphDBConnector, openai_client: OpenAI):
self.db = db_connector
self.llm = openai_client
def diagnose_plant_disease(self, plant_description: str) -> Plant:
"""Use LLM to diagnose plant disease with ontology constraints"""
# Get available diseases from ontology
available_diseases = self.db.query("""
PREFIX disease: <http://example.org/diseases/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?disease ?name WHERE {
?disease rdf:type disease:Disease .
?disease rdfs:label ?name .
}
""")
# Create prompt with ontology constraints
disease_options = [d['name']['value'] for d in available_diseases]
prompt = f"""
Based on this plant description: "{plant_description}"
Available diseases in our ontology: {disease_options}
Please identify the most likely disease and return a JSON object matching this schema:
{{
"plant_id": "generated_id",
"scientific_name": "species name if identifiable",
"common_name": "common name if identifiable",
"diseases": [{{
"disease_id": "http://example.org/diseases/DiseaseName",
"name": "disease_name",
"type": "http://example.org/diseases/DiseaseType",
"symptoms": [{{
"name": "symptom_name",
"severity": severity_1_to_10,
"location": "affected_location"
}}],
"treatment": "recommended_treatment"
}}]
}}
"""
response = self.llm.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
# Parse and validate with Pydantic
try:
plant_data = json.loads(response.choices[0].message.content)
plant = Plant(**plant_data)
return plant
except Exception as e:
print(f"Error parsing LLM response: {e}")
return None
# Usage
db = GraphDBConnector()
llm_client = OpenAI()
integrator = OntologyLLMIntegration(db, llm_client)
# Diagnose plant
plant = integrator.diagnose_plant_disease(
"My tomato plant has yellow spots on leaves and wilting stems"
)
if plant:
print(f"Diagnosed plant: {plant.common_name}")
for disease in plant.diseases:
print(f"Disease: {disease.name}")
print(f"Treatment: {disease.treatment}")Performance Optimization
1. Indexing
# Create custom indices for frequent queries
PREFIX plant: <http://example.org/plants/>
# This query pattern should have an index on hasDisease predicate
SELECT ?plant ?disease WHERE {
?plant plant:hasDisease ?disease .
}
2. Query Optimization
# Optimized query structure
SELECT ?plant ?disease ?symptom WHERE {
# Most selective triple first
?disease rdf:type :FungalDisease .
?plant :hasDisease ?disease .
?disease :hasSymptom ?symptom .
}
# Instead of starting with ?plant rdf:type :Plant (less selective)
3. Repository Tuning
# GraphDB repository settings for better performance
entity.pool.size=200000
entity.index.size=10000000
query.timeout=60
Monitoring & Maintenance
1. Health Checks
def check_graphdb_health():
"""Monitor GraphDB status"""
try:
response = requests.get("http://localhost:7200/rest/monitor/infrastructure")
if response.status_code == 200:
print("✅ GraphDB is healthy")
return True
except:
print("❌ GraphDB is not responding")
return False
def check_repository_status(repo_name):
"""Check specific repository"""
response = requests.get(f"http://localhost:7200/rest/repositories/{repo_name}/size")
if response.status_code == 200:
size = response.json()
print(f"Repository {repo_name}: {size} triples")
return True
return False2. Backup & Recovery
# Backup repository
curl -X POST "http://localhost:7200/rest/recovery/backup" \
-H "Content-Type: application/json" \
-d '{"repository": "plant-ontology", "backupName": "daily-backup"}'
# List backups
curl "http://localhost:7200/rest/recovery/backup"
# Restore from backup
curl -X POST "http://localhost:7200/rest/recovery/restore" \
-H "Content-Type: application/json" \
-d '{"repository": "plant-ontology", "backupName": "daily-backup"}'Next Steps
- Setup GraphDB: Follow the Docker installation guide
- Import Ontologies: Upload your first ontology files
- Practice SPARQL: Start with basic queries
- Python Integration: Build your first ontology-driven application
- LLM Integration: Explore structured data extraction
Troubleshooting
Common Issues
Connection refused:
- Check if Docker container is running
- Verify port 7200 is not blocked
Out of memory:
- Increase Docker memory limits
- Tune GDB_JAVA_OPTS in docker-compose.yml
Import failures:
- Check ontology file format
- Validate RDF/XML syntax
- Review error logs in GraphDB workbench
GraphDB provides the robust foundation needed for scalable ontology applications. Combined with Pydantic models and LLM integration, it enables powerful semantic AI systems.