Search API
Query, Filter, and Download Company Data with Flexible Access Options.
This document provides comprehensive documentation for the WebAI Search API, which allows users to search and retrieve company data from the Elasticsearch database.
Table of Contents
API Endpoints
Authentication
Search Endpoint
Sublocations Endpoint
Fetch Market Endpoint
Download Endpoint
Close PIT Endpoint
Data Models
Error Handling
API Endpoints
All endpoints are prefixed with /v1
.
/search
POST
Execute a search operation
/sublocations
POST
Get all unique locations at the child level within a parent location
/fetch-market
POST
Fetch data for a list of domains representing a market
/download
POST
Download search results in various formats
/close_pit
DELETE
Close a point-in-time (PIT) search context
Authentication
The API supports two authentication methods:
API Key Authentication: Include an API key in the request header. (for API-Key user)
x-api-key: your-api-key
Cognito Authentication: Use AWS Cognito for user authentication. (for Dashboard user)
Authorization: Bearer your-jwt-token
All endpoints require authentication using one of these methods.
Search Endpoint
Endpoint
POST /v1/search
Description
Execute a search operation using various parameters including keywords, locations, and custom filters.
Request Body
{
"search_id": "unique-search-id",
"domains": ["example.com", "example.org"],
"excludes": ["exclude-domain.com"],
"keywords": {
"must_one": ["keyword1", "keyword2"],
"must_all": ["requiredKeyword"],
"must_not": ["excludedKeyword"]
},
"locations": {
"country": ["Germany"],
"state": ["Bavaria"],
"region": [],
"district": [],
"municipality": []
},
"custom_filters": None,
"index": "webai*",
"size": 25,
"columns": ["domain", "country", "description", "main_contact_mail"],
"pit_id": None,
"search_after": None,
"semantic_input": None
}
Parameters
search_id
string
Yes
Unique identifier for the search (UUID)
domains
array
No
List of domains for similarity search
excludes
array
No
List of domains to exclude from results
keywords
object
No
Keyword search parameters
locations
object
No
Location filters
custom_filters
object
No
Additional filters for specific fields
index
string
No
Elasticsearch index to search (default: "webai*")
size
integer
No
Number of results to return (default: 25)
columns
array
No
Columns to include in the output
pit_id
string
No
Point in Time ID for pagination
search_after
array
No
Search after parameter for pagination
semantic_input
string
No
Text input for semantic search
Response
{
"data": [
{
"domain": "example.com",
"country": "Germany",
"description": "Example company description",
"main_contact_mail": "[email protected]"
}
],
"metadata": {
"search_id": "unique-search-id",
"index": "webai*",
"size": 25,
"columns": ["domain", "country", "description", "main_contact_mail"],
"query": "your inserted query"
"total_hits": 100,
"total_fetched": 25,
"query_type": "keyword",
"pit_id": "pit_id_for_pagination",
"search_after": ["value_for_next_page"]
}
}
Sublocations Endpoint
Endpoint
POST /v1/sublocations
Description
Get all unique locations at the child level within a parent location. For example, get all states in Germany or all districts in Bavaria.
Request Body
{
"parent_level": "country",
"parent_value": "Germany",
"index": "webai*"
}
Parameters
parent_level
string
Yes
The geographic level of the parent (all, country, state, region, district, municipality)
parent_value
string
Yes
The value of the parent location
index
string
No
Elasticsearch index to search (default: "webai*")
Response
{
"locations": ["Bavaria", "Berlin", "Hamburg"],
"level": "state",
"total_count": 3
}
Fetch Market Endpoint
Endpoint
POST /v1/fetch-market
Description
Fetch data for a list of domains representing a market. This is useful for getting information about a specific set of companies.
Request Body
{
"index": "webai*",
"columns": ["domain", "country", "description", "employee_class"],
"domains": ["example.com", "example.org", "example.net"]
}
Parameters
index
string
No
Elasticsearch index to search (default: "webai*")
columns
array
No
Columns to include in the output
domains
array
Yes
List of domains to fetch data for
Response
{
"data": [
{
"domain": "example.com",
"country": "Germany",
"description": "Example company description",
"employee_class": "10-49"
}
],
"metadata": {
"total_fetched": 1,
"index": "webai*",
"columns": ["domain", "country", "description", "employee_class"],
"domains_requested": 3,
"domains_found": 1,
"missing_domains": ["example.org", "example.net"],
"timestamp": "2023-01-01 12:00:00"
}
}
Download Endpoint
Endpoint
POST /v1/download
Description
Download search results in various formats (TSV, Excel, Parquet). This endpoint supports text queries, KNN queries, and market queries.
Request Body
{
"download_id": "unique-download-id",
"index": "webai*",
"size": 10,
"columns": ["domain", "country", "description"],
"search_info": {
"search_id": "unique-search-id",
"query": {
"match": {
"description": "AI company"
}
}
},
"download_format": "xlsx"
}
Alternatively, for market-based downloads:
{
"download_id": "unique-download-id",
"index": "webai*",
"size": 10,
"columns": ["domain", "country", "description"],
"market_info": {
"market_id": "unique-market-id",
"market": ["example.com", "example.org"]
},
"download_format": "tsv"
}
Parameters
download_id
string
No
Unique identifier for the download (default: auto-generated UUID)
index
string
No
Elasticsearch index to search (default: "webai*")
size
integer
No
Number of results to download (default: 10, max: 10)
columns
array
No
Columns to include in the output
search_info
object
No
Search information for text or KNN queries
market_info
object
No
Market information for market-based downloads
download_format
string
No
Format of the downloaded file (tsv, xlsx, parquet) (default: tsv)
Response
{
"download_url": "https://presigned-s3-url-for-download",
"metadata": {
"download_id": "unique-download-id",
"user_id": "user-id",
"file_path": "s3://bucket-name/path/to/file",
"total_hits": 100,
"total_fetched": 10,
"index": "webai*",
"timestamp": "2023-01-01T12:00:00",
"download_format": "xlsx",
"query_type": "text"
}
}
Close PIT Endpoint
Endpoint
DELETE /v1/close_pit
Description
Close a point-in-time (PIT) search context to free up resources.
Parameters
pit_id
string
Yes
The point-in-time ID to close
Response
{
"succeeded": true
}
Data Models
Allowed Columns
The API supports a wide range of columns for filtering and output. Here are some of the commonly used columns:
domain
: The website domaincountry
: Country where the company is locateddescription
: Company descriptionmain_contact_mail
: Primary contact emailmain_contact_number
: Primary contact phone numberemployee_class
: Company size by employee countrevenue_class
: Company size by revenuelat
,lon
: Geolocation coordinates
For a complete list of allowed columns, refer to the ALLOWED_COLUMNS
constant in the API code.
Geography Hierarchy
The API supports a hierarchical structure for geographic locations:
country
: Country level (e.g., Germany)state
: State/province level (e.g., Bavaria)region
: Region level (e.g., Upper Bavaria)district
: District level (e.g., Munich District)municipality
: Municipality level (e.g., Munich City)
Error Handling
The API returns standard HTTP status codes to indicate success or failure:
200 OK
: Request was successful400 Bad Request
: Invalid request parameters403 Forbidden
: Authentication or authorization failure500 Internal Server Error
: Server-side error
Error responses include a detail message explaining the issue:
{
"detail": "Error message explaining the issue"
}
Common error scenarios:
Unauthorized access: Missing or invalid API key
Invalid query parameters: Malformed request body
Resource limitations: Attempting to fetch more than allowed records
Elasticsearch errors: Issues with the underlying search engine
Last updated
Was this helpful?