This document provides comprehensive documentation for the WebAI Search API, which allows users to search and retrieve company data from the Elasticsearch database.
Table of Contents¶
- API Endpoints
- Authentication
- Search Endpoint
- Sublocations Endpoint
- Fetch Market Endpoint
- Download Endpoint
- Close PIT Endpoint
- Data Models
- Error Handling
API Endpoints¶
All endpoints are prefixed with /v1.
| Endpoint | Method | Description |
|---|---|---|
/search | POST | Execute a search operation |
/sublocations | POST | Get all unique locations at the child level within a parent location |
/fetch-market | POST | Fetch data for a list of domains representing a market |
/download | POST | Download search results in various formats |
/close_pit | DELETE | Close a point-in-time (PIT) search context |
Authentication¶
The API supports two authentication methods:
-
API Key Authentication: Include an API key in the request header. (for API-Key user)
2. Cognito Authentication: Use AWS Cognito for user authentication. (for Dashboard user)
All endpoints require authentication using one of these methods.
Search Endpoint¶
Endpoint¶
Description¶
Execute a search operation using various parameters including keywords, locations, and custom filters.
Request Body¶
{
"search_id": "unique-search-id",
"domains": ["example.com", "example.org"],
"excludes": ["exclude-domain.com"],
"keywords": {
"must_one": ["keyword1", "keyword2"],
"must_all": ["requiredKeyword"],
"must_not": ["excludedKeyword"]
},
"locations": {
"country": ["Germany"],
"state": ["Bavaria"],
"region": [],
"district": [],
"municipality": []
},
"custom_filters": None,
"index": "webai*",
"size": 25,
"columns": ["domain", "country", "description", "main_contact_mail"],
"pit_id": None,
"search_after": None,
"semantic_input": None
}
Tool to generate UUIDs¶
You can use this free tool to generate UUIDs for your search requests: https://www.guidgenerator.com/
Recommendation: while you can use one UUID for multiple requests, we would recommend using separate ones to keep your searches clean and well organised.
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
search_id | string | Yes | Unique identifier for the search (UUID) |
domains | array | No | List of domains for similarity search |
excludes | array | No | List of domains to exclude from results |
keywords | object | No | Keyword search parameters |
locations | object | No | Location filters |
custom_filters | object | No | Additional filters for specific fields |
index | string | No | Elasticsearch index to search (default: "webai*") |
size | integer | No | Number of results to return (default: 25) |
columns | array | No | Columns to include in the output |
pit_id | string | No | Point in Time ID for pagination |
search_after | array | No | Search after parameter for pagination |
semantic_input | string | No | Text input for semantic search |
Response¶
{
"data": [
{
"domain": "example.com",
"country": "Germany",
"description": "Example company description",
"main_contact_mail": "contact@example.com"
}
],
"metadata": {
"search_id": "unique-search-id",
"index": "webai*",
"size": 25,
"columns": ["domain", "country", "description", "main_contact_mail"],
"query": "your inserted query"
"total_hits": 100,
"total_fetched": 25,
"query_type": "keyword",
"pit_id": "pit_id_for_pagination",
"search_after": ["value_for_next_page"]
}
}
Sublocations Endpoint¶
Endpoint¶
Description¶
Get all unique locations at the child level within a parent location. For example, get all states in Germany or all districts in Bavaria.
Request Body¶
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
parent_level | string | Yes | The geographic level of the parent (all, country, state, region, district, municipality) |
parent_value | string | Yes | The value of the parent location |
index | string | No | Elasticsearch index to search (default: "webai*") |
Response¶
Fetch Market Endpoint¶
Endpoint¶
Description¶
Fetch data for a list of domains representing a market. This is useful for getting information about a specific set of companies.
Request Body¶
{
"index": "webai*",
"columns": ["domain", "country", "description", "employee_class"],
"domains": ["example.com", "example.org", "example.net"]
}
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
index | string | No | Elasticsearch index to search (default: "webai*") |
columns | array | No | Columns to include in the output |
domains | array | Yes | List of domains to fetch data for |
Response¶
{
"data": [
{
"domain": "example.com",
"country": "Germany",
"description": "Example company description",
"employee_class": "10-49"
}
],
"metadata": {
"total_fetched": 1,
"index": "webai*",
"columns": ["domain", "country", "description", "employee_class"],
"domains_requested": 3,
"domains_found": 1,
"missing_domains": ["example.org", "example.net"],
"timestamp": "2023-01-01 12:00:00"
}
}
Download Endpoint¶
Endpoint¶
Description¶
Download search results in various formats (TSV, Excel, Parquet). This endpoint supports text queries, KNN queries, and market queries.
Request Body¶
{
"download_id": "unique-download-id",
"index": "webai*",
"size": 10,
"columns": ["domain", "country", "description"],
"search_info": {
"search_id": "unique-search-id",
"query": {
"match": {
"description": "AI company"
}
}
},
"download_format": "xlsx"
}
Alternatively, for market-based downloads:
{
"download_id": "unique-download-id",
"index": "webai*",
"size": 10,
"columns": ["domain", "country", "description"],
"market_info": {
"market_id": "unique-market-id",
"market": ["example.com", "example.org"]
},
"download_format": "tsv"
}
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
download_id | string | No | Unique identifier for the download (default: auto-generated UUID) |
index | string | No | Elasticsearch index to search (default: "webai*") |
size | integer | No | Number of results to download (default: 10, max: 10) |
columns | array | No | Columns to include in the output |
search_info | object | No | Search information for text or KNN queries |
market_info | object | No | Market information for market-based downloads |
download_format | string | No | Format of the downloaded file (tsv, xlsx, parquet) (default: tsv) |
Response¶
{
"download_url": "https://presigned-s3-url-for-download",
"metadata": {
"download_id": "unique-download-id",
"user_id": "user-id",
"file_path": "s3://bucket-name/path/to/file",
"total_hits": 100,
"total_fetched": 10,
"index": "webai*",
"timestamp": "2023-01-01T12:00:00",
"download_format": "xlsx",
"query_type": "text"
}
}
Close PIT Endpoint¶
Endpoint¶
Description¶
Close a point-in-time (PIT) search context to free up resources.
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
pit_id | string | Yes | The point-in-time ID to close |
Response¶
Data Models¶
Allowed Columns¶
The API supports a wide range of columns for filtering and output. Here are some of the commonly used columns:
domain: The website domaincountry: Country where the company is locateddescription: Company descriptionmain_contact_mail: Primary contact emailmain_contact_number: Primary contact phone numberemployee_class: Company size by employee countrevenue_class: Company size by revenuelat,lon: Geolocation coordinates
For a complete list of allowed columns, refer to the ALLOWED_COLUMNS constant in the API code.
Geography Hierarchy¶
The API supports a hierarchical structure for geographic locations:
country: Country level (e.g., Germany)state: State/province level (e.g., Bavaria)region: Region level (e.g., Upper Bavaria)district: District level (e.g., Munich District)municipality: Municipality level (e.g., Munich City)
Error Handling¶
The API returns standard HTTP status codes to indicate success or failure:
200 OK: Request was successful400 Bad Request: Invalid request parameters403 Forbidden: Authentication or authorization failure500 Internal Server Error: Server-side error
Error responses include a detail message explaining the issue:
Common error scenarios:
- Unauthorized access: Missing or invalid API key
- Invalid query parameters: Malformed request body
- Resource limitations: Attempting to fetch more than allowed records
- Elasticsearch errors: Issues with the underlying search engine