AI Data & Knowledge Engineer

Hyderabad, Telangana, India
Full Time
Products & Innovation
Senior Manager/Supervisor

AI Data & Knowledge Engineer (Vector + Semantic Intelligence)

Location: Hyderabad, India

Employment Type: Full-Time; Salaried 

Compensation: Base Salary, Bonus, Stock Options, Medical

Job Description 

About Innovapptive

Innovapptive is an enterprise SaaS company building an AI-powered Connected Worker Platform for industrial organizations. Our platform connects frontline workers, back-office systems, and assets in real-time to drive safety, reliability, and operational productivity.

Leading global enterprises including Shell, Hess, Westlake Chemical, Kimberly-Clark, Scott Miracle-Gro, and Newmont Mining, rely on Innovapptive to transform how work gets done across plants and field operations.

Our customers have achieved $50M+ EBITDA savings at a single enterprise, 10× improvement in frontline productivity, and 15–20% reductions in maintenance costs.

Innovapptive is recognized as a Leader in Frost & Sullivan's “Frost Radar 2025 -  Augmented Connected Worker Platforms”, with acknowledgments from Gartner and LNS Research, and is backed by Vista Equity Partners and Tiger Global Management.

With headquarters in Houston and an engineering center in Hyderabad, we have 300+ employees across the U.S., India, and ANZ and are on a strong trajectory toward $100M ARR.

The Role

  • The AI Data & Knowledge Engineer will architect and operationalize Innovapptive’s semantic data intelligence layer — building the pipelines, vector stores, and retrieval frameworks that supply contextual understanding to AI Agents and enterprise workflows.
  • Reporting to the VP of Technology & Architecture, this role is responsible for designing the RAG (Retrieval-Augmented Generation) and Vector Embedding pipelines that connect industrial data (SOPs, manuals, logs, SCADA readings, SAP records) with Innovapptive’s AI runtime.
  • This is a hands-on, cross-disciplinary engineering role, blending data architecture, ML engineering, and semantic search design to make Innovapptive’s AI Agents contextually aware, accurate, and reliable.

How You Will Make An Impact

1. Architect the AI Knowledge and Data Layer

  • Design and implement data ingestion and embedding pipelines to convert structured and unstructured content into vectorized representations.
  • Build a unified data schema connecting maintenance, production, and safety data across SAP, Maximo, OSI PI, and SCADA systems.
  • Integrate vector databases (Pinecone, Weaviate, Qdrant, or Chroma) into the AI Platform (MCP) to enable context-aware retrieval.
  • Optimize query efficiency and relevance through hybrid search (semantic + keyword) and metadata tagging.

2. Operationalize RAG (Retrieval-Augmented Generation)

 
  • Implement document chunking, embedding, and retrieval pipelines for PDFs, work orders, shift logs, and incident reports.
  • Develop automated retraining and re-indexing mechanisms to ensure freshness of data.
  • Collaborate with AI Platform Architect to link retrieval flows into agent orchestration layers.
  • Validate precision, recall, and latency metrics for semantic retrieval using real production workloads.

3. Build AI Data Governance and Observability

 
  • Define data lineage, quality metrics, and access control for AI knowledge repositories.
  • Embed telemetry for data latency, embedding drift, and retrieval accuracy into Datadog/Sentry dashboards.
  • Partner with the Chief AI Architect to enforce compliance, explainability, and prompt context versioning standards.

4. Collaborate Across Product and Engineering

 
  • Work with Product Managers and Solution Architects to identify key use cases for AI-driven search and knowledge retrieval.
  • Partner with QA to build automated test frameworks for semantic accuracy and retrieval reliability.
  • Collaborate with industrial data teams to extract and normalize sensor, historian, and SAP data for RAG integration.

5. Drive Continuous Innovation

 
  • Evaluate emerging frameworks for knowledge graphs, embeddings, and contextual caching (e.g., LlamaIndex, LangChain, FAISS).
  • Tune embeddings and hybrid retrieval strategies for domain-specific industrial vocabulary.
  • Mentor developers on data preparation and retrieval design for AI-integrated product features.

What You Bring to The Team

  • 8 – 12 + years of data or ML engineering experience, with 3 + years in semantic search, RAG, or vector database architecture.
  • Proficiency with Python, SQL, and frameworks such as LangChain, LlamaIndex, or Haystack.
  • Hands-on experience with vector databases (Pinecone, Weaviate, Qdrant, Chroma) and cloud data stores (AWS S3, DynamoDB, Redshift).
  • Deep understanding of embedding models (OpenAI, Cohere, Sentence Transformers) and performance tuning for large-scale retrieval.
  • Strong data pipeline experience (Airflow, Kafka, Temporal) and understanding of MLOps fundamentals.
  • Familiarity with industrial data (SAP, Maximo, OSI PI, SCADA, MES) preferred.
  • Excellent communication and documentation skills — able to translate data architecture into business and engineering language.

Success Metrics (FIRST 90-180 Days)

  • Vector Data Layer deployed with initial knowledge embeddings across 2 core domains (Maintenance + Safety).
  • RAG pipelines operational, delivering ≥ 90 % retrieval precision for selected test datasets.
  • Telemetry dashboards live, showing retrieval latency, accuracy, and data freshness.
  • Data-to-Agent API integrated into MCP and adopted by 2+ AI Agent families.
  • Knowledge Playbook published — reusable design patterns for data ingestion, embeddings, and retrieval governance.

Why does this Role Matter?

  • The AI Data & Knowledge Engineer is the intelligence enabler behind every AI Agent.
    Without a robust, governed, and high-precision knowledge layer, AI features remain shallow and disconnected.
  • This role transforms Innovapptive’s platform into a contextually aware, continuously learning system — where every AI decision is grounded in trusted enterprise and field data.

What We Offer

  • Competitive compensation and equity tied to measurable impact on AI accuracy and performance.
  • A platform to shape the semantic intelligence layer of a category-defining industrial SaaS company.
  • Hybrid work model — Hyderabad or remote with periodic travel to Houston HQ.
  • Access to cutting-edge AI, data, and observability toolchains for continuous learning and innovation.

Innovapptive does not accept and will not review unsolicited resumes from search firms.

Innovapptive is an equal opportunity employer and is committed to a diverse and inclusive workplace.  Qualified applicants will receive consideration for employment without regard to race, color, religion or creed, alienage or citizenship status, political affiliation, marital or partnership status, age, national origin, ancestry, physical or mental disability, medical condition, veteran status, gender, gender identity, pregnancy, childbirth (or related medical conditions), sex, sexual orientation, sexual and other reproductive health decisions, genetic disorder, genetic predisposition, carrier status, military status, familial status, or domestic violence victim status and any other basis protected under federal, state, or local laws.

Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*