Skip to content

Named Entity Recognition (NER)

Project Neo uses GLiNER2 (Generalist and Lightweight model for Named Entity Recognition) to identify and extract structured entities from document text. NER runs as a dedicated microservice (ner_service) and processes files asynchronously via the worker queue after text extraction completes.

How It Works

  1. A file is uploaded to a share that has enable_ner_analysis turned on.
  2. The worker service extracts text from the file (PDF, DOCX, etc.).
  3. The worker sends the extracted text to the NER service over an internal HTTP call.
  4. The NER service runs the GLiNER2 model against the text using the share's configured schema.
  5. Detected entities, classifications, and structured extractions are stored in PostgreSQL.
  6. Results are available immediately through the API.

The underlying model is DeBERTa-v3-base fine-tuned for zero-shot NER. It accepts a list of target entity labels and returns spans with confidence scores -- no task-specific training is needed.

NER Schemas

A schema defines which entity types to extract, which document classifications to apply, and what structured fields to pull from the text. Five pre-built schemas ship with Neo.

Default Schema

General-purpose extraction suitable for most document types.

Entity TypeDescription
personNames of people, individuals, or human beings
organizationCompany names, institutions, agencies, or organizations
locationGeographic locations, cities, countries, addresses
dateDates, time periods, or temporal references
moneyMonetary amounts, prices, or financial values
emailEmail addresses
phonePhone numbers or contact numbers
urlWeb URLs or links

Classifications: document_type (report, memo, email, contract, invoice, policy, manual, other), language (english, spanish, french, german, other)

Default confidence threshold: 0.7

Optimized for contracts, agreements, and court filings.

Entity TypeDescription
partyLegal parties, signatories, or contracting entities
personNames of individuals mentioned in the document
organizationCompany names, law firms, or institutions
dateDates, deadlines, or time periods
moneyMonetary amounts, fees, or financial terms
jurisdictionLegal jurisdictions, courts, or governing law references
case_numberCase numbers, docket numbers, or reference numbers
law_referenceReferences to laws, statutes, or regulations

Classifications: document_type (contract, agreement, amendment, nda, mou, letter_of_intent, court_filing, legal_opinion, terms_of_service, privacy_policy, other), contract_status (draft, pending_signature, executed, expired, terminated)

Structured extraction: contract_terms -- parties, effective_date, expiration_date, term_length, renewal, termination_notice, governing_law, total_value

Default confidence threshold: 0.75

Financial Schema

Tailored for invoices, bank statements, and financial reports.

Entity TypeDescription
companyCompany names, corporations, or business entities
personNames of individuals, executives, or account holders
moneyMonetary amounts, prices, or financial values
percentagePercentage values, rates, or ratios
dateDates, fiscal periods, or time references
account_numberBank account numbers or financial account identifiers
tickerStock ticker symbols
currencyCurrency types or codes

Classifications: document_type (invoice, receipt, bank_statement, financial_report, tax_document, expense_report, purchase_order, quote, other), transaction_type (payment, refund, transfer, deposit, withdrawal, fee, other)

Structured extraction: transaction (amount, date, description, type, account, reference), invoice_details (invoice_number, vendor, customer, subtotal, tax, total, due_date)

Default confidence threshold: 0.8

Healthcare Schema

Designed for medical records, lab reports, and prescriptions.

Entity TypeDescription
patientPatient names or identifiers
providerHealthcare provider names, doctors, or medical staff
organizationHospitals, clinics, or healthcare facilities
medicationDrug names, medications, or pharmaceutical substances
dosageMedication dosages, amounts, or frequencies
conditionMedical conditions, diagnoses, or symptoms
procedureMedical procedures, treatments, or interventions
dateDates, appointment times, or time references
lab_valueLaboratory test values or measurements

Classifications: document_type (medical_record, lab_report, prescription, discharge_summary, referral, insurance_claim, consent_form, other), urgency (routine, urgent, emergency)

Structured extraction: patient_info, prescription, visit_summary

Default confidence threshold: 0.8

HR Schema

Built for resumes, offer letters, and employee records.

Entity TypeDescription
personNames of individuals, candidates, or employees
organizationCompany names, employers, or institutions
job_titleJob titles, positions, or roles
skillSkills, competencies, or qualifications
educationEducational institutions, degrees, or certifications
dateDates, employment periods, or time references
locationWork locations, offices, or addresses
salarySalary amounts, compensation, or benefits
emailEmail addresses
phonePhone numbers

Classifications: document_type (resume, cover_letter, job_description, offer_letter, performance_review, employee_handbook, policy, other), experience_level (entry, mid, senior, executive)

Structured extraction: candidate_info, employment_history, education

Default confidence threshold: 0.7

Configuration

The NER service is configured through environment variables set on the ner container.

VariableDefaultDescription
NER_MODEL_NAMEfastino/gliner2-base-v1Hugging Face model identifier. The model is downloaded on first startup.
NER_CONFIDENCE_THRESHOLD0.7Global minimum confidence score (0.0--1.0). Per-share thresholds override this.
NER_DEVICEautoCompute device: auto, cuda, or cpu. auto selects CUDA when a GPU is detected.
NER_MAX_TEXT_LENGTH8000Maximum characters per chunk sent to the model. GLiNER2 has a 2048-token context window (~4 chars/token = ~8000 chars).
NER_CUDA_MAX_TEXT_LENGTH8000Hard cap on chunk size when running on CUDA to prevent out-of-memory errors.
NER_CHUNK_OVERLAP500Character overlap between adjacent chunks so entities at boundaries are not missed.
NER_TOKEN_BUDGET2048Token budget per chunk for the tokenizer-aware chunking path.
NER_MAX_BATCH_SIZE32Upper limit for adaptive batch sizing on GPU.
NER_ESTIMATED_MB_PER_CHUNK1500Estimated VRAM (MB) per chunk, used for initial batch size calculation before probing.
NER_CPU_BATCH_SIZE16Fixed batch size when running on CPU.
NER_BATCH_RECOVERY_INTERVAL50Number of successful inferences before attempting to increase batch size after an OOM.
NER_MAX_CONSECUTIVE_OOMS3After this many consecutive OOM errors, the engine falls back to CPU.
NER_CUDA_RETRY_COOLDOWN200Number of successful CPU inferences before retrying CUDA after an OOM fallback.
NER_MAX_CUDA_RETRIES3Maximum number of times to retry moving back to CUDA after OOM fallback.

Per-Share NER Settings

NER is enabled and configured per share through the share's rules. When creating or updating a share, include the NER fields in the rules object.

Rule FieldTypeDefaultDescription
enable_ner_analysisboolfalseMaster switch -- set to true to run NER on files in this share.
ner_schemastring"default"Which pre-built schema to use (default, legal, financial, healthcare, hr).
ner_entity_typeslist["person", "organization", "location", "date", "money"]Override the schema's entity types with a custom list.
ner_classificationsobjectnullOverride classification labels.
ner_structured_extractionobjectnullOverride structured extraction fields.
ner_confidence_thresholdfloat0.7Minimum confidence for this share (overrides global and schema defaults).
bash
curl -s -X POST "$NEO_URL/shares" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Legal Contracts",
    "path": "/mnt/contracts",
    "rules": {
      "enable_ner_analysis": true,
      "ner_schema": "legal",
      "ner_confidence_threshold": 0.75
    }
  }'

Example: Custom Entity Types (No Schema)

bash
curl -s -X POST "$NEO_URL/shares" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Papers",
    "path": "/mnt/research",
    "rules": {
      "enable_ner_analysis": true,
      "ner_entity_types": ["person", "organization", "date", "location", "chemical_compound", "gene_name"],
      "ner_confidence_threshold": 0.65
    }
  }'

API Endpoints

All NER endpoints are under /ner and require a valid bearer token.

List Schemas

bash
curl -s "$NEO_URL/ner/schemas" \
  -H "Authorization: Bearer $TOKEN" | jq .

Returns all registered schemas with their entity types, classification availability, and structured extraction availability.

Get a Specific Schema

bash
curl -s "$NEO_URL/ner/schemas/legal" \
  -H "Authorization: Bearer $TOKEN" | jq .

Get NER Results for a File

bash
curl -s "$NEO_URL/ner/files/{file_id}" \
  -H "Authorization: Bearer $TOKEN" | jq .

Returns entities, classifications, and structured extractions for a single file.

Get NER Results for a Share

bash
curl -s "$NEO_URL/ner/shares/{share_id}/results?page=1&page_size=50" \
  -H "Authorization: Bearer $TOKEN" | jq .

Paginated results across all files in a share. Use entity_type to filter:

bash
curl -s "$NEO_URL/ner/shares/{share_id}/results?entity_type=person" \
  -H "Authorization: Bearer $TOKEN" | jq .

Global NER Statistics

bash
curl -s "$NEO_URL/ner/stats" \
  -H "Authorization: Bearer $TOKEN" | jq .

Returns total_entities, total_files_processed, and a breakdown by entity_types.

Per-Share NER Statistics

bash
curl -s "$NEO_URL/ner/shares/{share_id}/stats" \
  -H "Authorization: Bearer $TOKEN" | jq .

Search Entities

Search for entities by value across all shares or within a specific share.

bash
# Search globally
curl -s "$NEO_URL/ner/entities/search?q=NetApp" \
  -H "Authorization: Bearer $TOKEN" | jq .

# Filter by entity type and share
curl -s "$NEO_URL/ner/entities/search?q=NetApp&entity_type=organization&share_id={share_id}" \
  -H "Authorization: Bearer $TOKEN" | jq .

Aggregate Entities

Get aggregated counts of entity values, useful for dashboards and analytics.

bash
# All entities
curl -s "$NEO_URL/ner/entities/aggregate" \
  -H "Authorization: Bearer $TOKEN" | jq .

# Filter by type
curl -s "$NEO_URL/ner/entities/aggregate?entity_type=person&share_id={share_id}&limit=20" \
  -H "Authorization: Bearer $TOKEN" | jq .

Get NER Settings

bash
curl -s "$NEO_URL/ner/settings" \
  -H "Authorization: Bearer $TOKEN" | jq .

Returns the current global NER configuration: enabled, model, batch_size, confidence_threshold, device.

Update NER Settings

bash
curl -s -X PUT "$NEO_URL/ner/settings" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "confidence_threshold": 0.8,
    "device": "cuda"
  }' | jq .

Valid device values: auto, cuda, cpu. When the device is changed, the API forwards the change to the NER service so the model is moved immediately.

Trigger Reanalysis

Queue all files in a share for NER reprocessing. By default, only files without existing results are processed. Pass force=true to reanalyze everything.

bash
# Analyze files that are missing NER results
curl -s -X POST "$NEO_URL/ner/shares/{share_id}/reanalyze" \
  -H "Authorization: Bearer $TOKEN" | jq .

# Force reanalysis of all files
curl -s -X POST "$NEO_URL/ner/shares/{share_id}/reanalyze?force=true" \
  -H "Authorization: Bearer $TOKEN" | jq .

Delete NER Results

bash
# Delete results for a single file
curl -s -X DELETE "$NEO_URL/ner/files/{file_id}" \
  -H "Authorization: Bearer $TOKEN" | jq .

# Delete all results for a share
curl -s -X DELETE "$NEO_URL/ner/shares/{share_id}/results" \
  -H "Authorization: Bearer $TOKEN" | jq .

Check Pending Files

bash
curl -s "$NEO_URL/ner/pending?share_id={share_id}&limit=50" \
  -H "Authorization: Bearer $TOKEN" | jq .

GPU Acceleration

The NER service supports GPU acceleration through NVIDIA CUDA and AMD ROCm. GPU variants are built as separate container images (netapp-neo-ner-cuda and netapp-neo-ner-rocm).

Device Selection

Set NER_DEVICE=auto (the default) to let the engine detect available hardware. It checks for CUDA availability via PyTorch and falls back to CPU if no GPU is found.

Text Chunking

GLiNER2 is based on DeBERTa-v3-base with a 2048-token context window. For documents longer than the context window, the engine splits text into chunks of up to NER_MAX_TEXT_LENGTH characters with NER_CHUNK_OVERLAP characters of overlap between adjacent chunks. Entity spans detected across chunk boundaries are deduplicated in post-processing.

Adaptive Batch Sizing

On GPU, the engine probes available VRAM at startup and computes an initial batch size. During inference:

  • Successful batches gradually increase the batch size up to NER_MAX_BATCH_SIZE.
  • An OOM error halves the batch size and retries.
  • After NER_MAX_CONSECUTIVE_OOMS consecutive OOM errors, the engine falls back to CPU automatically.
  • After NER_CUDA_RETRY_COOLDOWN successful CPU inferences, the engine attempts to move back to CUDA (up to NER_MAX_CUDA_RETRIES times).

VRAM Requirements

The GLiNER2 base model requires approximately 500 MB of VRAM. Each inference chunk uses an estimated NER_ESTIMATED_MB_PER_CHUNK MB (default 1500 MB) for activations and gradients. A GPU with 4 GB of VRAM can comfortably run batch size 1--2; 8 GB or more is recommended for larger batch sizes.

Troubleshooting

Model Download Fails on First Startup

The NER service downloads the GLiNER2 model from Hugging Face on first launch. If the container has no internet access, pre-download the model and mount it into the container:

bash
# On a machine with internet access
python3 -c "from gliner2 import GLiNER2; GLiNER2.from_pretrained('fastino/gliner2-base-v1')"

# The model is cached in ~/.cache/huggingface/hub/
# Mount that directory into the container
docker run -v ~/.cache/huggingface:/root/.cache/huggingface ...

Alternatively, set NER_MODEL_NAME to a local path where the model weights are mounted.

CUDA Out-of-Memory (OOM) Errors

Symptoms: log messages containing CUDA out of memory or RuntimeError: CUDA error.

Actions:

  • Reduce NER_CUDA_MAX_TEXT_LENGTH to send smaller chunks (try 4000).
  • Reduce NER_MAX_BATCH_SIZE to 1.
  • Increase NER_MAX_CONSECUTIVE_OOMS if you want the engine to tolerate more OOMs before falling back.
  • If the GPU has limited VRAM (less than 4 GB), set NER_DEVICE=cpu to avoid OOM entirely.

The engine automatically falls back to CPU after repeated OOMs and will attempt to return to CUDA after a cooldown period.

Tokenizer Deadlock

The Hugging Face tokenizers library uses Rust-based parallelism that can deadlock when called from multiple Python threads inside a forked process. Symptoms: the NER service hangs during inference with no log output.

Workaround: Set the environment variable TOKENIZERS_PARALLELISM=false on the NER container. This disables tokenizer-level parallelism and prevents the deadlock. The Neo container images set this by default.

NER Results Are Empty

Check that:

  1. The share has enable_ner_analysis: true in its rules.
  2. The file has completed text extraction (status = completed).
  3. The confidence threshold is not set too high -- try lowering ner_confidence_threshold to 0.5 temporarily.
  4. The NER service is running and reachable from the worker service (check GET /ner/status).

Reprocessing After Schema Change

If you change a share's NER schema or entity types, existing results are not automatically updated. Use the reanalyze endpoint with force=true to reprocess all files with the new configuration:

bash
curl -s -X POST "$NEO_URL/ner/shares/{share_id}/reanalyze?force=true" \
  -H "Authorization: Bearer $TOKEN"