Column Information
Retrieve Precise Company Data by Domain, Location, or Keyword with Full Column Customization.
Endpoint: fetch-market
fetch-marketFetches market data for the exact domains you specify in the domains parameter—no fuzzy matching or similar-company lookups.
Use the optional columns parameter to list the fields you want returned, so you can tailor and enrich your dataset with only the data you need.
Available Columns
Below are some commonly used columns. For the complete list with detailed descriptions, data types, and possible values, please refer to the Dataset Codebook.
Contact & Communications
all_mailsall_phonesmain_contact_mailmain_contact_numberlinksaddress
Domain Information
domaindomain_aliasdomain_provider_truedomain_redirect
Business Details
titledescriptionkeywordsnamesummarysummary_keywordsnew_register_entrytypeb2xtechstack
Location Data
continentcountry/country_coderegion/region_codestate/state_codedistrict/district_codemunicipality/municipality_codegeolocation(includeslat,lon)
Scores & Categories
cultural_score/cultural_score_categoryleisure_score/leisure_score_categoryrecreational_score/recreational_score_categorytransport_score/transport_score_category
Probabilities & Classes
innoprob/innoprob_innovator_probabilitysocial_innoprob/social_innoprob_innovator_probabilitynews_probabilityretailer_probabilityemployee_classrevenue_class
Intensity Metrics
AI
ai_intensity/ai_intensity_levelai_keywords/ai_keywords_hits/ai_total_hits
Energy
energy_intensity/energy_intensity_levelenergy_keywords/energy_keywords_hits/energy_total_hits
Sustainability
sustainability_intensity/sustainability_intensity_levelsustainability_keywords/sustainability_keywords_hits/sustainability_total_hits
Digital Health
digital_health_intensity/digital_health_intensity_leveldigital_health_keywords/digital_health_keywords_hits/digital_health_total_hits
Mobility
mobility_intensity/mobility_intensity_levelmobility_keywords/mobility_keywords_hits/mobility_total_hits
Blockchain
blockchain_intensity/blockchain_intensity_levelblockchain_keywords/blockchain_keywords_hits/blockchain_total_hits
Additive Manufacturing
additive_manufacturing_intensity/additive_manufacturing_intensity_leveladditive_manufacturing_keywords/additive_manufacturing_keywords_hits/additive_manufacturing_total_hits
SDG Metrics
sdg1_intensity/sdg1_intensity_level/sdg1_keywords/sdg1_keywords_hits/sdg1_total_hitssdg2_intensity/sdg2_intensity_level/sdg2_keywords/sdg2_keywords_hits/sdg2_total_hitssdg3_intensity/sdg3_intensity_level/sdg3_keywords/sdg3_keywords_hits/sdg3_total_hits
Complex Structures
products(contains:name,type,pricing,main_features)team(contains:name,position,contact,cv)structure(contains:name,type,description,location,contact)partnerships(contains:entity_name,relationship_type)
import requests
api_url = 'https://api.istari.ai/v1/fetch-market' # this is the endpoint for the API Gateway
headers = {
'x-api-key': 'sk_api_key', # api-key generated from AWS API Gateway, each company has their own api-key in the future
'Content-Type': 'application/json'
}
payload = {
"domains": ["peterconradconstruction.com", "istari.ai"], # this is the list of domains that you are looking for in the database (it must be a list of strings even for a single domain!!)
"columns": [ # this is the list of columns that you are looking for in the database
"domain",
"country",
"state",
"title",
"description",
"keywords"
],
"index": "webai*" # this is the index that you are looking for in the database (it must be a string)
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())The API returns a JSON object with two primary sections:
dataContains an array of records—one per requested domain—each including exactly the fields you asked for via thecolumnsparameter:domain: The queried domain name.country: Registered/operating country.state: State or region.title: Content of the HTML<title>tag.description: Content of the HTML<meta name="description">tag (or a snippet).keywords: Content of the HTML<meta name="keywords">tag (empty string if absent).
metadataEchoes your input parameters and provides execution details:domains_requested(int): Number of domains in your payload.domains_found(int): How many of those domains were found in the index.missing_domains(array): Any domains you requested but weren’t found.columns(array): The exact list of columns you specified in the request.index(string): The index or index pattern queried (e.g."webai*").total_fetched(int): Total records returned (should equaldomains_found).timestamp(string): Server-side timestamp when the query executed (YYYY-MM-DD HH:MM:SS).
Summary:
dataholds exactly the information you requested in thecolumnsarray.metadatacaptures your original input (domains, columns, index) and when the API processed your request.
Endpoint: sublocations
sublocationsRetrieves all child locations (“sublocations”) for a given parent level and value.
Parameters
parent_level(string) – Geographic hierarchy to drill into:all– return every level of sublocationcountry– states/regions within a countrystate– regions/districts within a stateregion– districts/municipalities within a regiondistrict– municipalities within a districtmunicipality– localities within a municipality
parent_value(string) – The name of the parent location (e.g."Germany").index(string) – The Elasticsearch index or pattern to query (e.g."webai*").
import requests
api_url = 'https://api.istari.ai/v1/sublocations'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
payload = {
"parent_level": "country", # this is the level of the parent location (it must be a string!!)
"parent_value": "Germany", # this is the value of the parent location (it must be a string!!)
"index": "webai*" # this is the index that you are looking for in the database (it must be a string!!)
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())The sublocations returns the immediate geographic subdivisions for a given parent level and value.
Response Fields
locations
Array of sublocation names (e.g., Germany’s 16 states)
level
Granularity of returned locations ("state" when drilling into country)
total_count
Number of items in the locations array
Example Based on above provided payload, the return result will be:
{
"locations": ["Nordrhein-Westfalen", "Bayern", "Baden-Württemberg", "Niedersachsen", "Hessen", "Berlin", "Rheinland-Pfalz", "Sachsen", "Schleswig-Holstein", "Thüringen", "Hamburg", "Brandenburg", "Sachsen-Anhalt", "Mecklenburg-Vorpommern", "Saarland", "Bremen"],
"level": "state",
"total_count": 16
}Endpoint: search
searchDescription Fetches company records from Elasticsearch in two modes:
Similarity (KNN) Search when you supply
domains.Keyword Search when
domainsis empty, usingkeywords,locations, andcustom_filters.Semantic (KNN) Search : when you apply
semantic_input.
Request Parameters
search_id
string
UUID to uniquely identify this search.
domains
string[]
(Optional) Seed domains for embedding-based similarity search.
excludes
string[]
(Optional) Domains to omit from results.
keywords
object
Term filters (must_one, must_all, must_not).
locations
object
Geographic filters (country, state, region, etc.).
custom_filters
object
Field-level filters (must match allowed filter columns).
index
string
Elasticsearch index or pattern (default: "webai*").
size
integer
Number of results (1–100, default: 25).
columns
string[]
Fields to return (defaults to the core output columns).
pit_id
string
(Optional) Point-in-time ID for consistent pagination.
search_after
any[]
(Optional) Cursor token from previous response for efficient, deep pagination.
semantic_input
string
(Optional) The semantic input (as the user query) .
Response Structure
data: Array of company objects containing the requestedcolumns.metadata: Echoes your inputs (includingdomains,keywords,locations,columns), plus query details and timestamps.
For the full list of Available Columns and their descriptions, please refer to the Dataset Codebook. The codebook provides detailed information about all available variables, their data types, possible values, and descriptions. It also includes information about which fields can be used in
custom_filtersparameter.
Important Note: The domains and semantic_input can only one of them are having value else it will trigger an error.
# code for similarity search when you have a list of domains
import requests
api_url = 'https://api.istari.ai/v1/search'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
payload = {
"search_id": [], # this is the search id (it must be a string as uuid!!)
"domains": [], # this is the list of domains that you are looking for in the database (it must be a list of strings!! or None)
"excludes": [], # this is the list of domains that you want to exclude from the search (it must be a list of strings!!)
"keywords": {"must_one": ["packaging"], "must_all": [], "must_not": []},
"locations": {"country": ["United Kingdom"], "state": [], "region": []},
"custom_filters": {"b2x": ["B2C", "B2B"]}, # apply extra field-level constraints (e.g. `country`, `team.name`, `products.type`). Each key must be an allowed column, and its list of values becomes Elasticsearch `filter` clauses to precisely include or exclude documents.
"index": "webai*",
"size": 1,
"semantic_input": None, # this is the semantic input that you are looking for in the database (it must be a string!!), if not using it please set it to None
"columns": [
"domain",
"country",
"state",
"title",
"description",
"keywords",
"region_code"
],
"pit_id": None,
"search_after": [], # cursor-based paging: use the last hit’s sort values to fetch the next batch. It’s more efficient and consistent than deep offsets.
}
response = requests.post(api_url, headers=headers, json=payload)
print(response.json())The search endpoint return those similar company based on inserted keywords and locations
Data
Returns one company record matching your criteria:
domain: The website queried (ledburytownfc.co.uk)country/state: Geographic info (United Kingdom,England)title,description,keywords: Metadata scraped from the site (empty here)region_code: Internal code for the region
Metadata
total_hits: ~2.48 million matches in the indexsearch_id: Your unique search identifiersearch_input: Echoes your request parameters (filters, size = 1, requested columns)query: The Elasticsearch boolean query built from your filterspit_id&search_after: Cursor tokens for efficient paginationstart_time/end_time: When the query ran
In short: you asked for one UK-based domain, and the API returned its basic fields plus metadata about how the search was executed and how to page further.
Endpoint: api-usage
api-usageDescription Check API Usage of the user.
Behavior
Return the user API usage that recorded in the database as how much requests they sent.
import requests
api_url = 'https://api.istari.ai/v1/api-usage'
headers = {
'x-api-key': 'sk_api_key',
'Content-Type': 'application/json'
}
response = requests.get(api_url, headers=headers)
print(response.json())If needed to use the API, please contact us at [email protected] for requesting your own API Key and pricing details. Then you can to access the API with our enriched dataset.
Last updated