API Endpoint Access
Retrieve Precise Company Data by Domain, Location, or Keyword with Full Column Customization.
Endpoint: fetch-market
fetch-market
Fetches market data for the exact domains you specify in the domains
parameter—no fuzzy matching or similar-company lookups.
Use the optional columns
parameter to list the fields you want returned, so you can tailor and enrich your dataset with only the data you need.
Available Columns
Below are some commonly used columns. For the complete list with detailed descriptions, data types, and possible values, please refer to the Dataset Codebook.
Contact & Communications
all_mails
all_phones
main_contact_mail
main_contact_number
links
address
Domain Information
domain
domain_alias
domain_provider_true
domain_redirect
Business Details
title
description
keywords
name
summary
summary_keywords
new_register_entry
type
b2x
techstack
Location Data
continent
country
/country_code
region
/region_code
state
/state_code
district
/district_code
municipality
/municipality_code
geolocation
(includeslat
,lon
)
Scores & Categories
cultural_score
/cultural_score_category
leisure_score
/leisure_score_category
recreational_score
/recreational_score_category
transport_score
/transport_score_category
Probabilities & Classes
innoprob
/innoprob_innovator_probability
social_innoprob
/social_innoprob_innovator_probability
news_probability
retailer_probability
employee_class
revenue_class
Intensity Metrics
AI
ai_intensity
/ai_intensity_level
ai_keywords
/ai_keywords_hits
/ai_total_hits
Energy
energy_intensity
/energy_intensity_level
energy_keywords
/energy_keywords_hits
/energy_total_hits
Sustainability
sustainability_intensity
/sustainability_intensity_level
sustainability_keywords
/sustainability_keywords_hits
/sustainability_total_hits
Digital Health
digital_health_intensity
/digital_health_intensity_level
digital_health_keywords
/digital_health_keywords_hits
/digital_health_total_hits
Mobility
mobility_intensity
/mobility_intensity_level
mobility_keywords
/mobility_keywords_hits
/mobility_total_hits
Blockchain
blockchain_intensity
/blockchain_intensity_level
blockchain_keywords
/blockchain_keywords_hits
/blockchain_total_hits
Additive Manufacturing
additive_manufacturing_intensity
/additive_manufacturing_intensity_level
additive_manufacturing_keywords
/additive_manufacturing_keywords_hits
/additive_manufacturing_total_hits
SDG Metrics
sdg1_intensity
/sdg1_intensity_level
/sdg1_keywords
/sdg1_keywords_hits
/sdg1_total_hits
sdg2_intensity
/sdg2_intensity_level
/sdg2_keywords
/sdg2_keywords_hits
/sdg2_total_hits
sdg3_intensity
/sdg3_intensity_level
/sdg3_keywords
/sdg3_keywords_hits
/sdg3_total_hits
Complex Structures
products
(contains:name
,type
,pricing
,main_features
)team
(contains:name
,position
,contact
,cv
)structure
(contains:name
,type
,description
,location
,contact
)partnerships
(contains:entity_name
,relationship_type
)
import requests
api_url = 'https://api.istari.ai/v1/fetch-market' # this is the endpoint for the API Gateway
headers = {
'x-api-key': 'sk_api_key', # api-key generated from AWS API Gateway, each company has their own api-key in the future
'Content-Type': 'application/json'
}
payload = {
"domains": ["peterconradconstruction.com", "istari.ai"], # this is the list of domains that you are looking for in the database (it must be a list of strings even for a single domain!!)
"columns": [ # this is the list of columns that you are looking for in the database
"domain",
"country",
"state",
"title",
"description",
"keywords"
],
"index": "webai*" # this is the index that you are looking for in the database (it must be a string)
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())
The API returns a JSON object with two primary sections:
data
Contains an array of records—one per requested domain—each including exactly the fields you asked for via thecolumns
parameter:domain
: The queried domain name.country
: Registered/operating country.state
: State or region.title
: Content of the HTML<title>
tag.description
: Content of the HTML<meta name="description">
tag (or a snippet).keywords
: Content of the HTML<meta name="keywords">
tag (empty string if absent).
metadata
Echoes your input parameters and provides execution details:domains_requested
(int): Number of domains in your payload.domains_found
(int): How many of those domains were found in the index.missing_domains
(array): Any domains you requested but weren’t found.columns
(array): The exact list of columns you specified in the request.index
(string): The index or index pattern queried (e.g."webai*"
).total_fetched
(int): Total records returned (should equaldomains_found
).timestamp
(string): Server-side timestamp when the query executed (YYYY-MM-DD HH:MM:SS
).
Summary:
data
holds exactly the information you requested in thecolumns
array.metadata
captures your original input (domains, columns, index) and when the API processed your request.
Endpoint: sublocations
sublocations
Retrieves all child locations (“sublocations”) for a given parent level and value.
Parameters
parent_level
(string) – Geographic hierarchy to drill into:all
– return every level of sublocationcountry
– states/regions within a countrystate
– regions/districts within a stateregion
– districts/municipalities within a regiondistrict
– municipalities within a districtmunicipality
– localities within a municipality
parent_value
(string) – The name of the parent location (e.g."Germany"
).index
(string) – The Elasticsearch index or pattern to query (e.g."webai*"
).
import requests
api_url = 'https://api.istari.ai/v1/sublocations'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
payload = {
"parent_level": "country", # this is the level of the parent location (it must be a string!!)
"parent_value": "Germany", # this is the value of the parent location (it must be a string!!)
"index": "webai*" # this is the index that you are looking for in the database (it must be a string!!)
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())
The sublocations
returns the immediate geographic subdivisions for a given parent level and value.
Response Fields
locations
Array of sublocation names (e.g., Germany’s 16 states)
level
Granularity of returned locations ("state"
when drilling into country)
total_count
Number of items in the locations
array
Example Based on above provided payload, the return result will be:
{
"locations": ["Nordrhein-Westfalen", "Bayern", "Baden-Württemberg", "Niedersachsen", "Hessen", "Berlin", "Rheinland-Pfalz", "Sachsen", "Schleswig-Holstein", "Thüringen", "Hamburg", "Brandenburg", "Sachsen-Anhalt", "Mecklenburg-Vorpommern", "Saarland", "Bremen"],
"level": "state",
"total_count": 16
}
Endpoint: search
search
Description Fetches company records from Elasticsearch in two modes:
Similarity (KNN) Search when you supply
domains
.Keyword Search when
domains
is empty, usingkeywords
,locations
, andcustom_filters
.Semantic (KNN) Search : when you apply
semantic_input
.
Request Parameters
search_id
string
UUID to uniquely identify this search.
domains
string[]
(Optional) Seed domains for embedding-based similarity search.
excludes
string[]
(Optional) Domains to omit from results.
keywords
object
Term filters (must_one
, must_all
, must_not
).
locations
object
Geographic filters (country
, state
, region
, etc.).
custom_filters
object
Field-level filters (must match allowed filter columns).
index
string
Elasticsearch index or pattern (default: "webai*"
).
size
integer
Number of results (1–100, default: 25).
columns
string[]
Fields to return (defaults to the core output columns).
pit_id
string
(Optional) Point-in-time ID for consistent pagination.
search_after
any[]
(Optional) Cursor token from previous response for efficient, deep pagination.
semantic_input
string
(Optional) The semantic input (as the user query) .
Response Structure
data
: Array of company objects containing the requestedcolumns
.metadata
: Echoes your inputs (includingdomains
,keywords
,locations
,columns
), plus query details and timestamps.
For the full list of Available Columns and their descriptions, please refer to the Dataset Codebook. The codebook provides detailed information about all available variables, their data types, possible values, and descriptions. It also includes information about which fields can be used in
custom_filters
parameter.
Important Note: The domains and semantic_input can only one of them are having value else it will trigger an error.
# code for similarity search when you have a list of domains
import requests
api_url = 'https://api.istari.ai/v1/search'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
payload = {
"search_id": "123e4567-e89b-12d3-a456-426614174000", # this is the search id (it must be a string as uuid!!)
"domains": [], # this is the list of domains that you are looking for in the database (it must be a list of strings!! or None)
"excludes": [], # this is the list of domains that you want to exclude from the search (it must be a list of strings!!)
"keywords": {"must_one": ["packaging"], "must_all": [], "must_not": []},
"locations": {"country": ["United Kingdom"], "state": [], "region": []},
"custom_filters": {"b2x": ["B2C", "B2B"]}, # apply extra field-level constraints (e.g. `country`, `team.name`, `products.type`). Each key must be an allowed column, and its list of values becomes Elasticsearch `filter` clauses to precisely include or exclude documents.
"index": "webai*",
"size": 1,
"semantic_input": None, # this is the semantic input that you are looking for in the database (it must be a string!!), if not using it please set it to None
"columns": [
"domain",
"country",
"state",
"title",
"description",
"keywords",
"region_code"
],
"pit_id": None,
"search_after": [], # cursor-based paging: use the last hit’s sort values to fetch the next batch. It’s more efficient and consistent than deep offsets.
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())
The search
endpoint return those similar company based on inserted keywords and locations
Data
Returns one company record matching your criteria:
domain
: The website queried (ledburytownfc.co.uk
)country
/state
: Geographic info (United Kingdom
,England
)title
,description
,keywords
: Metadata scraped from the site (empty here)region_code
: Internal code for the region
Metadata
total_hits
: ~2.48 million matches in the indexsearch_id
: Your unique search identifiersearch_input
: Echoes your request parameters (filters, size = 1, requested columns)query
: The Elasticsearch boolean query built from your filterspit_id
&search_after
: Cursor tokens for efficient paginationstart_time
/end_time
: When the query ran
In short: you asked for one UK-based domain, and the API returned its basic fields plus metadata about how the search was executed and how to page further.
Endpoint: api-usage
api-usage
Description Check API Usage of the user.
Behavior
Return the user API usage that recorded in the database as how much requests they sent.
import requests
api_url = 'https://api.istari.ai/v1/api-usage'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
response = requests.get(api_url, headers=headers)
print(response.json())
Footnote:
The are having several columns will be considered as premium request that will be charged with a higher price based on extracted domains else it will be standard request.
The premium column have the following columns:
b2x
summary
summary_keywords
team
partnerships
products
structure
team.contact
team.name
team.position
team.cv
partnerships.relationship_type
structure.contact
structure.description
structure.location
structure.name
structure.type
partnerships.entity_name
products.name
products.main_features
products.pricing
products.type
Additionally, when the search endpoint includes the semantic input or domains within the payload, which will consider as a premium request too.
The price of both standard and premium request is still pending to be determined. Additionally, the premium column will also be adjusted in the future.
If needed to use the API, please contact us at [email protected]
for requesting your own API Key and pricing details. Then you can to access the API with our enriched dataset.
Last updated
Was this helpful?