LLM API

Unlock NLP Capabilities for Keyword Generation, Q&A, and Semantic Search.

This document provides comprehensive documentation for the WebAI LLM API, which offers natural language processing capabilities including keyword generation, question answering, and NLP-powered search functionality.

Table of Contents

  • API Endpoints

  • Authentication

  • Generate Keywords Endpoint

  • QnA Endpoint

  • NLP Search Endpoint

  • Data Models

  • Error Handling

API Endpoints

All endpoints are prefixed with /v1.

Endpoint
Method
Description

/generate-keywords

POST

Generate alternative keywords based on input keywords

/qna

POST

Answer questions based on content from a specified domain

/nlp_search

POST

Process natural language queries and convert them to structured search parameters

Authentication

The API uses AWS Cognito for authentication. All requests must include a valid JWT token in the Authorization header.

Authorization: Bearer your-jwt-token

The token must contain valid claims including:

  • cognito:username or username: The user's username

  • sub: The user's unique identifier

  • cognito:groups: The Cognito groups the user belongs to

Access to certain resources is restricted based on the user's identity and group membership.

Generate Keywords Endpoint

Endpoint

POST /v1/generate-keywords

Description

This endpoint takes a list of keywords and generates semantically related alternative keywords. It uses Azure OpenAI's language models to analyze the input keywords and suggest alternatives that capture related themes and concepts.

Request Body

{
  "keywords": ["artificial intelligence", "machine learning", "neural networks"]
}

Parameters

Parameter
Type
Required
Description

keywords

array

Yes

List of input keywords for which related terms will be generated

Response

{
  "original_keywords": ["artificial intelligence", "machine learning", "neural networks"],
  "alternative_keywords": [
    "deep learning", 
    "AI algorithms", 
    "computer vision", 
    "natural language processing", 
    "predictive analytics", 
    "data science", 
    "cognitive computing", 
    "reinforcement learning", 
    "pattern recognition", 
    "automated reasoning"
  ]
}

QnA Endpoint

Endpoint

POST /v1/qna

Description

This endpoint answers questions based on content scraped from a specified domain. It uses web scraping to extract text from the provided URL and then uses a language model to generate an answer to the user's question based on that content.

Request Body

{
  "question": "What services does this company offer?",
  "domain": "example.com",
  "max_pages": 3
}

Parameters

Parameter
Type
Required
Description

question

string

Yes

The user's question

domain

string

Yes

The URL to fetch text from

max_pages

integer

No

Maximum number of pages to scrape (default: 1)

Response

If the question can be answered based on the content:

{
  "success": true,
  "can_answer": true,
  "message": "Question answered successfully",
  "data": {
    "answer": "## Company Services\n\nBased on the website content, Example Company offers the following services:\n\n* Web Development\n* Mobile App Development\n* Cloud Solutions\n* AI Integration\n* Data Analytics\n\nTheir core focus appears to be enterprise-level software solutions with an emphasis on scalability and security.",
    "sources": ["Homepage", "Services page"],
    "confidence": 0.92,
    "follow_up_questions": [
      "What technologies do they use for web development?",
      "Do they offer maintenance services?"
    ]
  }
}

If the question cannot be answered based on the content:

{
  "success": true,
  "can_answer": false,
  "message": "I don't have enough information in the provided context to answer this question. Please try a different question or provide additional context.",
  "data": null
}

NLP Search Endpoint

Endpoint

POST /v1/nlp_search

Description

This endpoint processes natural language queries and converts them into structured search parameters. It uses a workflow that analyzes the query to determine intent, extract locations, generate keywords, identify domains, and create semantic search inputs.

Request Body

{
  "user_query": "Find AI companies in Germany that specialize in computer vision",
  "search_id": "unique-search-id"
}

Parameters

Parameter
Type
Required
Description

user_query

string

Yes

The natural language query provided by the user

search_id

string

Yes

A unique identifier for the search request

Response

{
  "payload": {
    "keywords": {
      "must_one": ["computer vision", "image recognition", "object detection"],
      "must_all": ["AI", "artificial intelligence"],
      "must_not": []
    },
    "locations": {
      "country": ["Germany"],
      "state": [],
      "region": [],
      "district": [],
      "municipality": []
    },
    "custom_filters": {},
    "domains": [],
    "excludes": [],
    "semantic_input": "computer vision artificial intelligence"
  },
  "metadata": {
    "search_id": "unique-search-id",
    "user_query": "Find AI companies in Germany that specialize in computer vision",
    "timestamp": "2023-01-01T12:00:00Z",
    "processing_time": 1.5,
    "workflow_steps": [
      "flag_detection",
      "location_extraction",
      "keyword_generation",
      "semantic_input_generation"
    ]
  }
}

Data Models

Keyword Generation

KeywordRequest

{
  "keywords": ["string"]
}

BatchKeyword

{
  "original_keywords": ["string"],
  "alternative_keywords": ["string"]
}

QnA

QAResponse

{
  "answer": "string",
  "sources": ["string"],
  "confidence": 0.95,
  "follow_up_questions": ["string"],
  "can_answer": true
}

QueryRequest

{
  "user_query": "string",
  "search_id": "string"
}

QueryResponse

{
  "payload": {
    "keywords": {
      "must_one": ["string"],
      "must_all": ["string"],
      "must_not": ["string"]
    },
    "locations": {
      "country": ["string"],
      "state": ["string"],
      "region": ["string"],
      "district": ["string"],
      "municipality": ["string"]
    },
    "custom_filters": {},
    "domains": ["string"],
    "excludes": ["string"],
    "semantic_input": "string"
  },
  "metadata": {
    "search_id": "string",
    "user_query": "string",
    "timestamp": "string",
    "processing_time": 0,
    "workflow_steps": ["string"]
  }
}

Error Handling

The API returns standard HTTP status codes to indicate success or failure:

  • 200 OK: Request was successful

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Missing or invalid authentication token

  • 403 Forbidden: Authentication successful but access denied

  • 500 Internal Server Error: Server-side error

Error responses include a detail message explaining the issue:

{
  "detail": "Error message explaining the issue"
}

For the QnA and Generate Keywords endpoints, errors are also returned in a structured format:

{
  "success": false,
  "can_answer": false,
  "message": "Error message explaining the issue",
  "error": "Detailed error information"
}

Implementation Details

Language Models

The API uses the following language models:

  • Azure OpenAI: Used for keyword generation and QnA functionality

  • OpenAI GPT-4o: Used for NLP search processing

Web Scraping

The QnA endpoint uses web scraping to extract content from the specified domain. It can scrape multiple pages from the same website if the max_pages parameter is greater than 1.

Token Limits

The QnA endpoint limits the document context to a maximum of 40,000 tokens to ensure compatibility with the language model's context window.

Last updated

Was this helpful?