Pydantic Models & Structured Data

Bridging ontologies and structured data extraction with LLMs

Learn how to use ontologies from GraphDB with Pydantic models to extract structured data using LLMs and semantic reasoning. This approach combines the best of formal knowledge representation with modern AI capabilities.

Why Pydantic + Ontologies?

The Challenge

Traditional LLMs output unstructured text, making it difficult to:

  • Validate data consistency
  • Integrate with existing systems
  • Reason about relationships
  • Ensure domain correctness

The Solution

The integration follows a clear pipeline that ensures both type safety and semantic consistency:

flowchart LR
    A[🧠 Ontology Knowledge] --> B[πŸ“‹ Pydantic Schema]
    B --> C[πŸ’¬ LLM Prompt]
    C --> D[πŸ“Š Structured Output]
    D --> E[βœ… Validation]
    E --> F[πŸ”— Semantic Reasoning]
    
    classDef knowledge fill:#e8f5e8,stroke:#4caf50,stroke-width:2px,color:#2e7d32
    classDef processing fill:#e3f2fd,stroke:#2196f3,stroke-width:2px,color:#1565c0
    classDef output fill:#fff3e0,stroke:#ff9800,stroke-width:2px,color:#ef6c00
    classDef validation fill:#fce4ec,stroke:#e91e63,stroke-width:2px,color:#c2185b
    
    class A knowledge
    class B,C processing
    class D,F output
    class E validation

Tip

Pipeline Overview:

This architecture ensures both type safety and semantic consistency by validating LLM outputs against formal ontological constraints while maintaining the flexibility of natural language processing.

Benefits:

  • 🎯 Type Safety: Automatic validation of LLM outputs
  • πŸ”„ Consistency: Ontology ensures domain correctness
  • πŸš€ Integration: Seamless API and database integration
  • 🧠 Reasoning: Enable logical inference on extracted data

Core Concepts

1. Ontology-Driven Schema Design

Instead of manually creating Pydantic models, derive them from ontologies:

# Traditional approach (manual)
class Plant(BaseModel):
    name: str
    diseases: List[str]  # Unstructured!

# Ontology-driven approach
class Plant(BaseModel):
    plant_uri: HttpUrl = Field(..., description="Plant ontology URI")
    scientific_name: str = Field(..., regex=r"^[A-Z][a-z]+ [a-z]+$")
    diseases: List['Disease'] = Field(..., description="Diseases from ontology")
    
    @validator('plant_uri')
    def validate_plant_exists_in_ontology(cls, v):
        # Check if URI exists in GraphDB
        return validate_ontology_uri(v, "Plant")

2. Semantic Validation

from pydantic import BaseModel, validator, Field
from typing import List, Optional, Union
from enum import Enum
import requests

class OntologyValidator:
    """Validates data against GraphDB ontology"""
    
    def __init__(self, graphdb_endpoint="http://localhost:7200/repositories/plant-ontology"):
        self.endpoint = graphdb_endpoint
    
    def validate_class_membership(self, uri: str, class_name: str) -> bool:
        """Check if URI is instance of ontology class"""
        query = f"""
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        
        ASK {{
            <{uri}> rdf:type/{class_name}* ?class .
            ?class rdfs:subClassOf* <http://example.org/{class_name}> .
        }}
        """
        
        response = requests.post(
            self.endpoint,
            headers={'Content-Type': 'application/sparql-query'},
            data=query
        )
        
        return response.json().get('boolean', False)
    
    def get_valid_values(self, property_name: str) -> List[str]:
        """Get all valid values for a property from ontology"""
        query = f"""
        PREFIX plant: <http://example.org/plants/>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        
        SELECT DISTINCT ?value WHERE {{
            ?subject plant:{property_name} ?value .
        }}
        """
        
        response = requests.post(
            self.endpoint,
            headers={'Content-Type': 'application/sparql-query'},
            data=query
        )
        
        bindings = response.json().get('results', {}).get('bindings', [])
        return [b['value']['value'] for b in bindings]

# Global validator instance
ontology_validator = OntologyValidator()

Building Ontology-Driven Models

1. Disease Classification Model

from pydantic import BaseModel, Field, validator, root_validator
from typing import List, Optional, Dict, Any
from enum import Enum
from datetime import datetime

class DiseaseType(str, Enum):
    """Disease types from ontology"""
    FUNGAL = "http://example.org/diseases/FungalDisease"
    VIRAL = "http://example.org/diseases/ViralDisease"  
    BACTERIAL = "http://example.org/diseases/BacterialDisease"
    NUTRITIONAL = "http://example.org/diseases/NutritionalDisease"
    ENVIRONMENTAL = "http://example.org/diseases/EnvironmentalDisease"

class SeverityLevel(int, Enum):
    """Severity scale from ontology"""
    MINIMAL = 1
    LIGHT = 2
    MODERATE = 3
    SEVERE = 4
    CRITICAL = 5

class Symptom(BaseModel):
    """Plant disease symptom with ontology validation"""
    
    symptom_uri: str = Field(
        ..., 
        description="Symptom URI from ontology",
        example="http://example.org/symptoms/LeafYellowing"
    )
    name: str = Field(..., description="Human-readable symptom name")
    severity: SeverityLevel = Field(..., description="Severity level 1-5")
    location: str = Field(..., description="Where symptom appears on plant")
    confidence: float = Field(..., ge=0.0, le=1.0, description="Detection confidence")
    
    @validator('symptom_uri')
    def validate_symptom_uri(cls, v):
        """Ensure symptom exists in ontology"""
        if not ontology_validator.validate_class_membership(v, "Symptom"):
            raise ValueError(f"Symptom URI {v} not found in ontology")
        return v
    
    @validator('location')  
    def validate_location(cls, v):
        """Ensure location is valid plant part"""
        valid_locations = ontology_validator.get_valid_values("hasLocation")
        if v not in valid_locations:
            raise ValueError(f"Location '{v}' not valid. Must be one of: {valid_locations}")
        return v

class Treatment(BaseModel):
    """Treatment recommendation from ontology"""
    
    treatment_uri: str = Field(..., description="Treatment URI from ontology")
    name: str = Field(..., description="Treatment name")
    type: str = Field(..., description="Treatment type (chemical, biological, cultural)")
    application_method: str = Field(..., description="How to apply treatment")
    effectiveness: float = Field(..., ge=0.0, le=1.0, description="Expected effectiveness")
    
    @validator('treatment_uri')
    def validate_treatment_uri(cls, v):
        if not ontology_validator.validate_class_membership(v, "Treatment"):
            raise ValueError(f"Treatment URI {v} not found in ontology")
        return v

class Disease(BaseModel):
    """Plant disease with full ontology integration"""
    
    disease_uri: str = Field(..., description="Disease URI from ontology")
    name: str = Field(..., description="Disease name")
    scientific_name: Optional[str] = Field(None, description="Scientific name if pathogen")
    type: DiseaseType = Field(..., description="Disease classification")
    symptoms: List[Symptom] = Field(..., min_items=1, description="Observed symptoms")
    treatments: List[Treatment] = Field(default=[], description="Recommended treatments")
    confidence: float = Field(..., ge=0.0, le=1.0, description="Diagnosis confidence")
    
    @validator('disease_uri')
    def validate_disease_uri(cls, v):
        if not ontology_validator.validate_class_membership(v, "Disease"):
            raise ValueError(f"Disease URI {v} not found in ontology")
        return v
    
    @root_validator
    def validate_disease_symptom_consistency(cls, values):
        """Ensure symptoms are consistent with disease type"""
        disease_uri = values.get('disease_uri')
        symptoms = values.get('symptoms', [])
        
        if disease_uri and symptoms:
            # Query ontology for valid symptoms for this disease
            valid_symptoms = get_valid_symptoms_for_disease(disease_uri)
            
            for symptom in symptoms:
                if symptom.symptom_uri not in valid_symptoms:
                    raise ValueError(
                        f"Symptom {symptom.name} not associated with disease {disease_uri} in ontology"
                    )
        
        return values

def get_valid_symptoms_for_disease(disease_uri: str) -> List[str]:
    """Get symptoms associated with disease from ontology"""
    query = f"""
    PREFIX disease: <http://example.org/diseases/>
    PREFIX symptom: <http://example.org/symptoms/>
    
    SELECT ?symptom WHERE {{
        <{disease_uri}> disease:hasSymptom ?symptom .
    }}
    """
    
    # Execute query and return symptom URIs
    # Implementation depends on your GraphDB connection
    return []  # Placeholder

class Plant(BaseModel):
    """Plant with comprehensive ontology integration"""
    
    plant_uri: str = Field(..., description="Plant URI from ontology")
    scientific_name: str = Field(..., regex=r"^[A-Z][a-z]+ [a-z]+$")
    common_names: List[str] = Field(default=[], description="Common names")
    plant_family: str = Field(..., description="Taxonomic family")
    diseases: List[Disease] = Field(default=[], description="Diagnosed diseases")
    health_status: str = Field(default="unknown", description="Overall health assessment")
    diagnosis_date: datetime = Field(default_factory=datetime.now)
    
    @validator('plant_uri')
    def validate_plant_uri(cls, v):
        if not ontology_validator.validate_class_membership(v, "Plant"):
            raise ValueError(f"Plant URI {v} not found in ontology")
        return v
    
    @validator('plant_family')
    def validate_plant_family(cls, v):
        valid_families = ontology_validator.get_valid_values("belongsToFamily")
        if v not in valid_families:
            raise ValueError(f"Plant family '{v}' not found in ontology")
        return v
    
    def add_disease_from_ontology(self, disease_uri: str, symptoms: List[Dict]) -> None:
        """Add disease based on ontology data"""
        # Query ontology for disease details
        disease_data = query_disease_details(disease_uri)
        
        # Create symptom objects
        symptom_objects = [
            Symptom(
                symptom_uri=s['uri'],
                name=s['name'],
                severity=s.get('severity', 3),
                location=s.get('location', 'unknown'),
                confidence=s.get('confidence', 0.8)
            )
            for s in symptoms
        ]
        
        # Create disease object
        disease = Disease(
            disease_uri=disease_uri,
            name=disease_data['name'],
            scientific_name=disease_data.get('scientific_name'),
            type=disease_data['type'],
            symptoms=symptom_objects,
            confidence=calculate_diagnosis_confidence(symptom_objects)
        )
        
        self.diseases.append(disease)

def query_disease_details(disease_uri: str) -> Dict[str, Any]:
    """Query ontology for disease details"""
    # Implementation would query GraphDB
    return {
        'name': 'Example Disease',
        'type': DiseaseType.FUNGAL,
        'scientific_name': 'Fungus example'
    }

def calculate_diagnosis_confidence(symptoms: List[Symptom]) -> float:
    """Calculate overall diagnosis confidence from symptoms"""
    if not symptoms:
        return 0.0
    return sum(s.confidence for s in symptoms) / len(symptoms)

2. LLM Integration with Structured Extraction

from openai import OpenAI
import json
from typing import Type, TypeVar

T = TypeVar('T', bound=BaseModel)

class OntologyLLMExtractor:
    """Extract structured data from text using LLM + Ontology validation"""
    
    def __init__(self, 
                 llm_client: OpenAI,
                 graphdb_endpoint: str = "http://localhost:7200/repositories/plant-ontology"):
        self.llm = llm_client
        self.ontology_validator = OntologyValidator(graphdb_endpoint)
    
    def extract_structured_data(self, 
                              text: str, 
                              target_model: Type[T],
                              context: Optional[str] = None) -> Optional[T]:
        """Extract and validate structured data from text"""
        
        # Get ontology constraints for the model
        ontology_context = self._get_ontology_context(target_model)
        
        # Build prompt with ontology constraints
        prompt = self._build_extraction_prompt(text, target_model, ontology_context, context)
        
        # Get LLM response
        response = self._call_llm(prompt)
        
        # Parse and validate with Pydantic
        try:
            structured_data = target_model.parse_raw(response)
            return structured_data
        except Exception as e:
            print(f"Validation error: {e}")
            return None
    
    def _get_ontology_context(self, model_class: Type[BaseModel]) -> Dict[str, Any]:
        """Extract ontology constraints from Pydantic model"""
        context = {}
        
        # Get field information
        for field_name, field_info in model_class.__fields__.items():
            if hasattr(field_info.type_, '__members__'):  # Enum
                context[field_name] = {
                    'type': 'enum',
                    'values': list(field_info.type_.__members__.keys())
                }
            elif field_name.endswith('_uri'):
                context[field_name] = {
                    'type': 'uri',
                    'ontology_class': field_name.replace('_uri', '').title()
                }
        
        return context
    
    def _build_extraction_prompt(self, 
                               text: str,
                               target_model: Type[BaseModel],
                               ontology_context: Dict,
                               context: Optional[str]) -> str:
        """Build extraction prompt with ontology constraints"""
        
        # Get JSON schema
        schema = target_model.schema()
        
        # Build constraint descriptions
        constraints = []
        for field, info in ontology_context.items():
            if info['type'] == 'enum':
                constraints.append(f"- {field}: Must be one of {info['values']}")
            elif info['type'] == 'uri':
                constraints.append(f"- {field}: Must be valid ontology URI for {info['ontology_class']}")
        
        constraint_text = "\n".join(constraints) if constraints else "No specific constraints"
        
        context_text = f"\nAdditional context: {context}" if context else ""
        
        prompt = f"""
        Extract structured information from the following text and format it according to the JSON schema provided.
        
        Text to analyze:
        "{text}"
        {context_text}
        
        JSON Schema:
        {json.dumps(schema, indent=2)}
        
        Ontology Constraints:
        {constraint_text}
        
        Important:
        - Use actual URIs from the ontology (format: http://example.org/category/SpecificItem)
        - Ensure all enum values match exactly  
        - Include confidence scores based on text evidence
        - If information is not available, use null or appropriate defaults
        
        Return only valid JSON that matches the schema:
        """
        
        return prompt
    
    def _call_llm(self, prompt: str) -> str:
        """Call LLM with structured output"""
        response = self.llm.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are an expert at extracting structured data from text using formal ontologies. Always return valid JSON."},
                {"role": "user", "content": prompt}
            ],
            response_format={"type": "json_object"},
            temperature=0.1  # Lower temperature for more consistent extraction
        )
        
        return response.choices[0].message.content

# Example Usage
def diagnose_plant_from_text(description: str) -> Optional[Plant]:
    """Diagnose plant disease from natural language description"""
    
    llm_client = OpenAI()
    extractor = OntologyLLMExtractor(llm_client)
    
    # Extract structured plant data
    plant = extractor.extract_structured_data(
        text=description,
        target_model=Plant,
        context="Plant disease diagnosis context. Focus on identifying symptoms, diseases, and treatments."
    )
    
    if plant:
        print(f"βœ… Extracted plant data: {plant.scientific_name}")
        for disease in plant.diseases:
            print(f"  🦠 Disease: {disease.name} (confidence: {disease.confidence:.2f})")
            for symptom in disease.symptoms:
                print(f"    πŸ” Symptom: {symptom.name} - {symptom.location}")
    else:
        print("❌ Failed to extract valid plant data")
    
    return plant

# Test the extraction
description = """
I have a tomato plant (Solanum lycopersicum) with yellow spots on the leaves 
that are spreading quickly. The spots have dark centers and the leaves are 
starting to wilt. Some of the stems also show dark streaks. This started 
about a week ago after heavy rains.
"""

plant = diagnose_plant_from_text(description)

3. MOE Integration with Semantic Routing

from typing import Dict, List, Callable
import numpy as np

class SemanticExpertRouter:
    """Route queries to appropriate experts based on ontology concepts"""
    
    def __init__(self, graphdb_endpoint: str):
        self.ontology_validator = OntologyValidator(graphdb_endpoint)
        self.experts: Dict[str, Callable] = {}
        
    def register_expert(self, domain_uri: str, expert_function: Callable):
        """Register an expert for a specific ontology domain"""
        self.experts[domain_uri] = expert_function
    
    def route_query(self, query: str, extracted_data: BaseModel) -> str:
        """Route query to most appropriate expert based on ontology concepts"""
        
        # Extract ontology concepts from the data
        concepts = self._extract_concepts(extracted_data)
        
        # Find best matching expert
        best_expert = self._find_best_expert(concepts)
        
        if best_expert:
            return best_expert(query, extracted_data)
        else:
            return self._general_expert(query, extracted_data)
    
    def _extract_concepts(self, data: BaseModel) -> List[str]:
        """Extract ontology URIs from Pydantic model"""
        concepts = []
        
        # Get all URI fields
        for field_name, field_value in data.__dict__.items():
            if field_name.endswith('_uri') and isinstance(field_value, str):
                concepts.append(field_value)
            elif isinstance(field_value, list):
                for item in field_value:
                    if hasattr(item, '__dict__'):
                        concepts.extend(self._extract_concepts(item))
        
        return concepts
    
    def _find_best_expert(self, concepts: List[str]) -> Optional[Callable]:
        """Find expert with highest concept overlap"""
        best_score = 0
        best_expert = None
        
        for domain_uri, expert in self.experts.items():
            score = self._calculate_similarity(domain_uri, concepts)
            if score > best_score:
                best_score = score
                best_expert = expert
        
        return best_expert if best_score > 0.3 else None  # Threshold
    
    def _calculate_similarity(self, domain_uri: str, concepts: List[str]) -> float:
        """Calculate semantic similarity between domain and concepts"""
        # Query ontology for related concepts
        related_concepts = self._get_related_concepts(domain_uri)
        
        # Calculate overlap
        overlap = len(set(concepts) & set(related_concepts))
        total = len(set(concepts) | set(related_concepts))
        
        return overlap / total if total > 0 else 0.0
    
    def _get_related_concepts(self, domain_uri: str) -> List[str]:
        """Get concepts related to domain from ontology"""
        query = f"""
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        
        SELECT ?concept WHERE {{
            {{
                <{domain_uri}> rdfs:subClassOf* ?concept .
            }} UNION {{
                ?concept rdfs:subClassOf* <{domain_uri}> .
            }} UNION {{
                <{domain_uri}> ?property ?concept .
            }}
        }}
        """
        
        # Execute query and return related concepts
        # Implementation depends on GraphDB connection
        return []
    
    def _general_expert(self, query: str, data: BaseModel) -> str:
        """Fallback expert for unmatched queries"""
        return f"General analysis of {type(data).__name__}: {data.json()}"

# Expert functions for different domains
def fungal_disease_expert(query: str, plant: Plant) -> str:
    """Expert specialized in fungal diseases"""
    fungal_diseases = [d for d in plant.diseases if d.type == DiseaseType.FUNGAL]
    
    if fungal_diseases:
        disease = fungal_diseases[0]  # Focus on primary disease
        
        analysis = f"""
        πŸ”¬ FUNGAL DISEASE ANALYSIS for {plant.scientific_name}
        
        Primary Disease: {disease.name}
        Confidence: {disease.confidence:.1%}
        
        Key Symptoms:
        {chr(10).join(f"- {s.name} ({s.location}, severity {s.severity})" for s in disease.symptoms)}
        
        Recommended Actions:
        1. Apply broad-spectrum fungicide
        2. Improve air circulation  
        3. Reduce leaf wetness
        4. Remove infected plant material
        
        Prognosis: {"Good" if disease.confidence < 0.7 else "Requires immediate attention"}
        """
        return analysis
    
    return "No fungal diseases detected in the provided data."

def viral_disease_expert(query: str, plant: Plant) -> str:
    """Expert specialized in viral diseases"""
    viral_diseases = [d for d in plant.diseases if d.type == DiseaseType.VIRAL]
    
    if viral_diseases:
        return f"Viral disease detected: {viral_diseases[0].name}. No chemical treatment available. Focus on vector control and plant removal."
    
    return "No viral diseases detected."

# Setup MOE system
def setup_moe_system():
    """Initialize MOE system with ontology-based routing"""
    router = SemanticExpertRouter("http://localhost:7200/repositories/plant-ontology")
    
    # Register domain experts
    router.register_expert("http://example.org/diseases/FungalDisease", fungal_disease_expert)
    router.register_expert("http://example.org/diseases/ViralDisease", viral_disease_expert)
    
    return router

# Example usage
def analyze_plant_with_moe(description: str) -> str:
    """Complete analysis pipeline with MOE routing"""
    
    # Step 1: Extract structured data
    plant = diagnose_plant_from_text(description)
    
    if not plant:
        return "Unable to extract plant information from description."
    
    # Step 2: Route to appropriate expert
    router = setup_moe_system()
    analysis = router.route_query(description, plant)
    
    return analysis

# Test complete pipeline
test_description = """
My tomato plants have developed circular brown spots with yellow halos on the leaves.
The spots started small but are growing larger and some leaves are turning completely yellow.
I also notice dark lesions on the stems near the soil line. The problem started after 
several days of high humidity and warm temperatures.
"""

result = analyze_plant_with_moe(test_description)
print(result)

Advanced Patterns

1. Hierarchical Validation

class HierarchicalValidator(BaseModel):
    """Validate data at multiple ontology levels"""
    
    @validator('*', pre=True)
    def validate_hierarchy(cls, v, field):
        """Validate against ontology hierarchy"""
        if field.name.endswith('_uri'):
            # Check if URI exists at correct hierarchy level
            expected_class = field.name.replace('_uri', '').title()
            if not validate_ontology_hierarchy(v, expected_class):
                raise ValueError(f"URI {v} not in correct hierarchy for {expected_class}")
        return v

def validate_ontology_hierarchy(uri: str, expected_class: str) -> bool:
    """Check if URI is in correct ontology hierarchy"""
    query = f"""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    ASK {{
        <{uri}> rdfs:subClassOf* <http://example.org/{expected_class}> .
    }}
    """
    # Execute query and return boolean result
    return True  # Placeholder

2. Dynamic Schema Generation

def generate_pydantic_model_from_ontology(class_uri: str) -> Type[BaseModel]:
    """Generate Pydantic model from ontology class definition"""
    
    # Query ontology for class properties
    properties = query_class_properties(class_uri)
    
    # Build field definitions
    fields = {}
    validators = {}
    
    for prop in properties:
        field_name = prop['name']
        field_type = map_ontology_type_to_python(prop['range'])
        field_info = Field(..., description=prop.get('comment', ''))
        
        fields[field_name] = (field_type, field_info)
        
        # Add ontology validator
        if prop.get('validation_query'):
            validators[f"validate_{field_name}"] = create_ontology_validator(prop['validation_query'])
    
    # Create dynamic model class
    model_class = create_model(
        f"Generated{class_uri.split('/')[-1]}",
        **fields,
        __validators__=validators
    )
    
    return model_class

def query_class_properties(class_uri: str) -> List[Dict]:
    """Query ontology for class properties"""
    # Implementation would query GraphDB
    return []

def map_ontology_type_to_python(ontology_type: str) -> type:
    """Map ontology data types to Python types"""
    mapping = {
        'xsd:string': str,
        'xsd:int': int,
        'xsd:float': float,
        'xsd:boolean': bool,
        'xsd:dateTime': datetime
    }
    return mapping.get(ontology_type, str)

Best Practices

1. Schema Design

  • Start with ontology: Design ontology first, then generate Pydantic models
  • Use URIs: Always reference ontology concepts by URI
  • Validate hierarchies: Ensure data respects ontological relationships
  • Include confidence: Track certainty of extracted information

2. LLM Prompting

  • Provide context: Include relevant ontology constraints in prompts
  • Use examples: Show expected URI formats and structure
  • Validate iteratively: Re-prompt if validation fails
  • Lower temperature: Use consistent extraction settings

3. Performance

  • Cache queries: Store frequently used ontology queries
  • Batch validation: Validate multiple items together
  • Async processing: Use async for LLM calls and database queries
  • Index ontologies: Ensure GraphDB has proper indices

4. Error Handling

  • Graceful degradation: Fall back to partial extraction
  • Detailed logging: Track validation failures for improvement
  • User feedback: Allow manual correction of extracted data
  • Incremental learning: Update ontology based on common failures

Integration Examples

Plant Disease Diagnosis API

from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse

app = FastAPI()

@app.post("/diagnose", response_model=Plant)
async def diagnose_plant(description: str):
    """API endpoint for plant disease diagnosis"""
    try:
        # Extract structured data
        plant = diagnose_plant_from_text(description)
        
        if not plant:
            raise HTTPException(status_code=400, detail="Could not extract plant information")
        
        # Route to expert analysis
        router = setup_moe_system()
        analysis = router.route_query(description, plant)
        
        # Add analysis to plant data
        plant.analysis = analysis
        
        return plant
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/ontology/diseases")
async def get_diseases():
    """Get all diseases from ontology"""
    diseases = ontology_validator.get_valid_values("Disease")
    return {"diseases": diseases}

Next Steps

  1. Setup GraphDB: Ensure GraphDB is running with plant ontology
  2. Install Dependencies: pip install pydantic openai requests
  3. Create Models: Start with simple Plant/Disease models
  4. Test Extraction: Try extracting data from text descriptions
  5. Add Validation: Implement ontology validation functions
  6. Build MOE: Create expert routing system
  7. Deploy API: Build FastAPI service for plant diagnosis

This integration of Pydantic models with ontologies provides a robust foundation for structured data extraction that maintains semantic consistency while leveraging the power of modern LLMs.