Skip to content

Deploy using Docker/Podman

This guide walks through deploying NetApp Project Neo v4 using Docker or Podman Compose. The deployment includes six services:

  • postgres -- PostgreSQL 17 shared database
  • api -- FastAPI service (HTTP API + MCP transport) on port 8000
  • worker -- Background processing (crawling, upload, NER orchestration)
  • extractor -- Content extraction (MarkItDown, Docling, VLM)
  • ner -- GLiNER2 Named Entity Recognition
  • neoui -- Web management console on port 8081
  • nginx -- An optional load balancer for scaled deployment

Prerequisites

Docker
Podman
  • Podman installed on your system. You can download Podman from the official Podman website.
  • Podman Compose installed. You can install it using the Podman Compose installation instructions.
  • Linux distribution, like RHEL-based, might not deploy all the podman packages for advanced networking configuration such as podman-plugins and containernetworking-plugins.

WARNING

The main difference between docker and podman is that podman requires a sudo prefix for privileged containers. Docker's daemon already runs containers in a privileged mode.

  • Sufficient system resources to run NetApp Neo. Refer to the Sizing Guide in the Deployment section for recommended specifications.
  • cifs-utils package deployed on the Linux host (required for SMB share mounting by the extractor service).
  • SELinux contexts may require adjustments based on your specific Linux host security profile.

Deployment Guide

TIP

Both docker-compose.yml and .env files provides a comprehensive inline documentation for every environment variable, GPU configuration options.

Docker Compose file

Create a directory, e.g., neov4

BASH
mkdir neov4 && cd neov4

The following Docker Compose file can be copied as docker-compose.yaml

yaml
# =============================================================================
# NetApp Project Neo - Docker Compose Example
# =============================================================================
#
# Multi-service deployment with independently scalable components:
#   - postgres:   PostgreSQL database (shared by all services)
#   - api:        HTTP API + MCP transport (user-facing)
#   - worker:     Background processing (crawling, upload, orchestration)
#   - extractor:  Content extraction (MarkItDown, Docling, VLM)
#   - ner:        Named Entity Recognition (GLiNER2)
#   - neoui:      Web management console
#   - nginx:      Load balancer if api service is scaled up
#
# Quick start:
#   1. Copy this file:  cp docker-compose.example.yml docker-compose.yml
#   2. Configure environment variables in .env 
#   3. Launch:          docker compose up -d --build
#
# Scale independently:
#   docker compose up -d --scale worker=3
#   docker compose up -d --scale extractor=5 --scale ner=2
#
# =============================================================================

services:
  # ---------------------------------------------------------------------------
  # PostgreSQL Database
  # ---------------------------------------------------------------------------
  # Shared by API, worker, and extractor services.
  # Data persists in the postgres_data volume.
  postgres:
    hostname: ${POSTGRES_HNAME}
    image: docker.io/library/postgres:17
    container_name: neo-postgres
    env_file:
      - .env
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # API Service
  # ---------------------------------------------------------------------------
  # Lightweight FastAPI service handling HTTP endpoints, MCP transport, and
  # OAuth. No background processing — delegates to the worker service.
  api:
    image: ghcr.io/netapp/netapp-neo-api:${NEO_VERSION}
    ports:
      - "8000:8000"
    env_file:
      - .env
    environment:
      # -----------------------------------------------------------------------
      # Internal service communication
      # -----------------------------------------------------------------------
      # WORKER_SERVICE_URL: URL of the worker service. Used by the API to
      # proxy SMB connection tests to the worker (which has SMB tools).
      WORKER_SERVICE_URL: http://worker:8000
      # NER_SERVICE_URL: URL of the NER service. Used by the API to
      # forward device configuration changes (GPU/CPU switching).
      NER_SERVICE_URL: http://ner:8000
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # Extractor Service
  # ---------------------------------------------------------------------------
  # Extracts text content from documents on SMB shares. Supports multiple
  # backends: MarkItDown (Office/PDF), Docling (OCR/tables), Docling VLM
  # (vision language models). Mounts shares independently via CIFS.
  extractor:
    image: ghcr.io/netapp/netapp-neo-extractor:${NEO_VERSION}
    # privileged needed for NFS/CIFS mounting inside the container
    privileged: true
    env_file:
      - .env
    depends_on:
      postgres:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # NER Service
  # ---------------------------------------------------------------------------
  # Named Entity Recognition using GLiNER2. Receives extracted text from the
  # worker, returns entities (people, organizations, dates, etc.),
  # classifications (document type), and structured data. Stateless — no file
  # or database access needed.
  ner:
    image: ghcr.io/netapp/netapp-neo-ner:${NEO_VERSION}
    env_file:
      - .env
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 180s  # GLiNER2 model loading on CUDA takes ~2 min
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # Worker Service
  # ---------------------------------------------------------------------------
  # Background processing: SMB file crawling, content extraction orchestration,
  # Microsoft Graph upload, ACL resolution, and NER orchestration. Delegates
  # extraction to the extractor service and NER to the NER service.
  worker:
    image: ghcr.io/netapp/netapp-neo-worker:${NEO_VERSION}
    cap_add:
      - SYS_ADMIN
      - DAC_READ_SEARCH
    security_opt:
      - apparmor:unconfined
    env_file:
      - .env
    environment:
      # -----------------------------------------------------------------------
      # External service URLs
      # -----------------------------------------------------------------------
      # EXTRACTOR_SERVICE_URL: URL of the extractor service. The worker sends
      # document extraction requests here instead of extracting locally.
      EXTRACTOR_SERVICE_URL: http://extractor:8000
      # NER_SERVICE_URL: URL of the NER service. The worker sends extracted
      # text here for entity recognition instead of running GLiNER2 locally.
      NER_SERVICE_URL: http://ner:8000
    depends_on:
      postgres:
        condition: service_healthy
      extractor:
        condition: service_healthy
      ner:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # Neo UI (Web Management Console)
  # ---------------------------------------------------------------------------
  # Browser-based management console for configuring shares, monitoring
  # crawls, and managing the connector. Connects to the API service.
  neoui:
    hostname: neoui
    image: ghcr.io/beezy-dev/neo-ui-framework:${NUI_VERSION}
    container_name: neoui
    ports:
      - "8080:80"
    environment:
      # NEO_API: URL of the API service. The UI proxies all requests here.
      NEO_API: http://api:8000
    depends_on:
      api:
        condition: service_healthy
    networks:
      - neo-network
    restart: unless-stopped

  # ---------------------------------------------------------------------------
  # Nginx Load Balancer (optional)
  # ---------------------------------------------------------------------------
  # Only starts when using the "with-lb" profile:
  #   docker compose --profile with-lb up -d
  #
  # Requires nginx.conf and (optionally) certs/ in the project root.
  # nginx:
  #   hostname: nginx
  #   image: nginx:alpine
  #   container_name: neo-lb
  #   ports:
  #     - "80:80"
  #     - "443:443"
  #   volumes:
  #     - ./nginx.conf:/etc/nginx/nginx.conf:ro
  #     - ./certs:/etc/nginx/certs:ro
  #   depends_on:
  #     - api
  #   networks:
  #     - neo-network
  #   restart: unless-stopped
  #   profiles:
  #     - with-lb

# =============================================================================
# Volumes & Networks
# =============================================================================
volumes:
  postgres_data:
    driver: local

networks:
  neo-network:
    driver: bridge

Environment file (.env)

Aside from the versioning and database paramters, Neo services can be configured after startup either via the UI or the API. Here are the parameters to be modified from the example .env file:

bash
# Neo container image versioning
NEO_VERSION=4.0.3p7
NUI_VERSION=3.2.2

## Database Settings (required)
# Modify accordingly to your preferences
# CAN NOT BE MODIFIED AFTER FIRST RUN.
POSTGRES_HNAME=postgres
POSTGRES_USER=neo
POSTGRES_PASSWORD=neo_password
POSTGRES_DB=neo_connector
POSTGRES_PORT=5432

Once you have the above parameters squared out, copy this .env file in the same directory where you have created the docker-compose.yaml file, and modify the versioning and database parameters accordingly to your preferences:

# =============================================================================
# NetApp Project Neo v4 - .env Example
# =============================================================================
#
# Copy this file as .env in the same directory as the docker-compose.yaml

# =============================================================================
## MUST BE CONFIGURED PARAMETERS
# =============================================================================
# Neo container image versioning
NEO_VERSION=4.0.3p7
NUI_VERSION=3.2.2

## Database Settings (required)
# Modify accordingly to your preferences
# CAN NOT BE MODIFIED AFTER FIRST RUN.
POSTGRES_HNAME=postgres
POSTGRES_USER=neo
POSTGRES_PASSWORD=neo_password
POSTGRES_DB=neo_connector
POSTGRES_PORT=5432

# =============================================================================
## OPTIONAL PARAMETERS
# =============================================================================
# License key for the connector. 
# Can be configured via API at /api/v1/setup/license after deployment.
# NETAPP_CONNECTOR_LICENSE=your-license-key

## Authentication
# JWT_SECRET_KEY: Secret key for signing JWT access tokens. Auto-generated and stored in the database if not set. All services sharing the same database will use the same key automatically. 
# Only set this if you need to override the database-stored key.
# JWT_SECRET_KEY=

# ACCESS_TOKEN_EXPIRE_MINUTES: How long JWT tokens remain valid.
# Default: 1440 (24 hours).
# ACCESS_TOKEN_EXPIRE_MINUTES=1440

## Encryption
# ENCRYPTION_KEY: Fernet key for encrypting sensitive data (SMB
# passwords). Auto-generated and stored in the database on first
# startup if not set. All services sharing the same database will
# retrieve the same key automatically. Only set this to override
# the database-stored key.
# ENCRYPTION_KEY=

## Microsoft Graph (optional - for M365 Copilot integration)
# MS_GRAPH_TENANT_ID=
# MS_GRAPH_CLIENT_ID=
# MS_GRAPH_CLIENT_SECRET=
# MS_GRAPH_CONNECTOR_ID=netappneo

## MCP OAuth (optional - for securing MCP endpoints)
# MCP_OAUTH_ENABLED=false
# MCP_OAUTH_TENANT_ID=
# MCP_OAUTH_CLIENT_ID=

## MCP API Key (optional - alternative to OAuth for MCP)
# MCP_API_KEY=

## Worker Concurrency
# NUM_UPLOAD_WORKERS=3
# NUM_EXTRACTION_WORKERS=2
# NUM_ACL_RESOLUTION_WORKERS=2
# NUM_NER_WORKERS=1

## NER Settings
# NER_CONFIDENCE_THRESHOLD=0.7
# NER_DEVICE=auto

## Extractor Settings
# EXTRACTOR_LOG_LEVEL=INFO
# EXTRACTOR_DEFAULT_PIPELINE=markitdown

# =============================================================================
## ONLY CHANGE IF INSTRUCTED BY NETAPP SUPPORT
# =============================================================================
# License Connector Identifier.
# Default: netappneo. Only change if instructed by NetApp support.
CONNECTOR_ID=netappneo

## Constructed DATABASE_URL (used by all services)
DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HNAME}:${POSTGRES_PORT}/${POSTGRES_DB}

TIP

The usage of Docker/Podman Secrets is recommended for production deployments to avoid storing credentials in plain text.

Start the containers

BASH
docker compose up -d --build
docker compose ps

Expected output:
NAME            IMAGE                                       STATUS                    PORTS
neo-postgres    postgres:17                                 Up 30 seconds (healthy)
api-1           neo-api                                     Up 25 seconds (healthy)   0.0.0.0:8000->8000/tcp
extractor-1     neo-extractor                               Up 28 seconds (healthy)
ner-1           neo-ner                                     Up 28 seconds (healthy)
worker-1        neo-worker                                  Up 20 seconds (healthy)
neoui           ghcr.io/beezy-dev/neo-ui-framework:3.2.2   Up 18 seconds             0.0.0.0:8081->80/tcp

View logs:
docker compose logs -f
BASH
sudo podman compose up -d --build
sudo podman compose ps

Expected output:
NAME            IMAGE                                       STATUS                    PORTS
neo-postgres    postgres:17                                 Up 30 seconds (healthy)
api-1           neo-api                                     Up 25 seconds (healthy)   0.0.0.0:8000->8000/tcp
extractor-1     neo-extractor                               Up 28 seconds (healthy)
ner-1           neo-ner                                     Up 28 seconds (healthy)
worker-1        neo-worker                                  Up 20 seconds (healthy)
neoui           ghcr.io/beezy-dev/neo-ui-framework:3.2.2   Up 18 seconds             0.0.0.0:8081->80/tcp

View logs:
sudo podman compose logs -f

TIP

The NER service takes up to 2-3 minutes to start on first launch while it downloads the GLiNER2 model. The worker service waits for NER to become healthy before starting.

You should see logs indicating that the API service has started in setup mode:

log
api-1  | INFO     | app.main:lifespan - Starting up application...
api-1  | INFO     | app.main:lifespan - Setup mode: Skipping license validation and Graph initialization
api-1  | INFO     | app.main:lifespan - Complete setup via /api/v1/setup endpoints to enable full functionality
api-1  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Scale services independently

Neo v4 supports independent scaling of worker, extractor, and NER services:

BASH
# Scale workers for higher crawling throughput
docker compose up -d --scale worker=3

# Scale extractors for faster document processing
docker compose up -d --scale extractor=5 --scale ner=2
BASH
sudo podman compose up -d --scale worker=3
sudo podman compose up -d --scale extractor=5 --scale ner=2

Configure

via GUI

Neo Console is available at http://your-host:8081 and will present the setup wizard on first launch.

Go to Settings and select the Neo Core tab to begin configuration.

  1. Enter a valid license key and save.
  2. Optionally configure Microsoft Graph, SSL, or proxy settings.
  3. Click Setup Complete to finalize. This triggers a restart of the services with the configured settings.

Once setup completes, the page displays a status of "Complete" and an Admin Credentials button appears with temporary login credentials.

IMPORTANT

The temporary password will not be accessible again after you log in. Save it in your password manager or change it immediately in the Users page.

via API

Neo can also be configured via the API. The interactive API documentation is available at http://your-host:8000/docs.

Step 1: Set the license key

bash
curl -X POST http://localhost:8000/api/v1/setup/license \
  -H "Content-Type: application/json" \
  -d '{"license_key": "your-license-key"}'

Step 2: (Optional) Configure Microsoft Graph

bash
curl -X POST http://localhost:8000/api/v1/setup/graph \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "your-tenant-id",
    "client_id": "your-client-id",
    "client_secret": "your-client-secret"
  }'

Step 3: Complete setup

bash
curl -X POST http://localhost:8000/api/v1/setup/complete

GPU Acceleration (optional)

The ner and extractor services support GPU acceleration for faster inference.

NVIDIA GPU

Add the following to the ner and/or extractor service in your docker-compose.yml:

yaml
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Requires nvidia-container-toolkit installed on the host.

AMD ROCm GPU

Add the following to the ner and/or extractor service in your docker-compose.yml:

yaml
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - render
    environment:
      NER_DEVICE: cuda  # ROCm uses the CUDA compatibility layer

Requires ROCm drivers installed on the host.

Troubleshooting

PostgreSQL

Check if the database was created:

BASH
docker exec -it neo-postgres psql -h localhost -U neo -d neo_connector -c '\l'
BASH
sudo podman exec -it neo-postgres psql -h localhost -U neo -d neo_connector -c '\l'

Expected output should include neo_connector in the database list.

API Service

BASH
docker compose logs -f api
BASH
sudo podman compose logs -f api

Check the health endpoint:

bash
curl http://localhost:8000/health

Worker Service

BASH
docker compose logs -f worker
BASH
sudo podman compose logs -f worker

TIP

The worker requires SYS_ADMIN and DAC_READ_SEARCH capabilities and apparmor:unconfined security option. If the worker fails to start, verify that your container runtime supports these settings.

Extractor Service

BASH
docker compose logs -f extractor
BASH
sudo podman compose logs -f extractor

TIP

The extractor runs in privileged mode to support NFS/CIFS mounting inside the container. If mounting fails, verify that cifs-utils is installed on the host.

NER Service

BASH
docker compose logs -f ner
BASH
sudo podman compose logs -f ner

The NER service downloads the GLiNER2 model on first startup. If it fails, check network connectivity and disk space.

Neo UI

BASH
docker logs -f neoui
BASH
sudo podman logs -f neoui

Check the browser developer console for additional error messages.

Next steps

This concludes the steps to deploy NetApp Neo using Docker/Podman Compose. For more advanced configurations and management options, refer to the Management section of the documentation.