Skip to main content
The Global Organization Index (GOI) is our curated, organization-level dataset containing approximately 20 million active organizations across 232 countries and territories. It serves as the canonical source for firmographic data at ISTARI, covering organization identity, location, industry classification, size, and more.

Data pipeline & methodology

GOI is built through a multi-step validation pipeline designed to prioritize quality over volume:
  1. Registry Collection: We systematically query national registers worldwide, including organization, commercial, and association registers. This primary data source is then enriched with additional open data sources, giving us an initial pool of approximately 400 million organizations.
  2. Domain Attribution: We determine which of these organizations can be attributed to a clearly identifiable web domain. This results in roughly 10% of the total, about 40 million organizations. The remainder are either no longer active, were never truly operational (e.g., pure holding structures), or simply never maintained a web presence.
  3. Activity Verification: We verify which of those 40 million domains are still actively operated. This validation step brings the dataset down to approximately 20 million organizations that are demonstrably active.
This process ensures that every record in GOI is web domain-verified and confirmed active.

Quality vs. volume

Our final dataset is smaller than comparable, traditional databases. However, in contrast, our dataset contains exclusively verified and active organizations, no inactive records or dormant entities. The key differentiator is not size, but data timliness, operational relevance, and verification depth.

Data sources

Source TypeRoleExamples
National registersPrimary sourceCompany registers, commercial registers, association registers
Open data sourcesEnrichmentGovernment databases, administrative statistics data, structured open datasets
Web presenceVerificationOrganisation websites, active domain validation

Schema

Key definitions

  • NACE Code: The EU’s standard statistical classification of economic activities. Used as GOI’s primary industry taxonomy. Learn more
  • Organization Type: Categorized as Company, Startup, Academic, Public, or Other based on registry data and web content analysis.
  • Organization Size: Derived from employee and revenue signals, bucketed into Micro, Small, Medium-sized, and Large enterprise per EU SME definitions.

Core fields

ColumnTypeDescription
nameSTRINGOrganization name
domainSTRINGOrganisation website domain
summarySTRINGAI-generated summary of the organization
keywordsLISTDescriptive keywords
employee_classSTRINGEmployee count bracket
countrySTRINGCountry name
country_codeSTRINGISO country code
state / state_codeSTRINGState or province
region / region_codeSTRINGRegion
district / district_codeSTRINGDistrict
municipality / municipality_codeSTRINGMunicipality
addressSTRINGFull address
latitudeFLOAT64Latitude coordinate
longitudeFLOAT64Longitude coordinate

Classification fields

ColumnTypeDescription
nace_codeSTRINGNACE industry classification code
nace_reasoning (optional)STRINGReasoning behind the assigned NACE code
organization_typeSTRINGType of organization (Company, Public, Academic, Startup, Other)
organization_type_reasoning (optional)STRINGReasoning behind the assigned type
organization_sizeSTRINGSize bracket (Micro, Small, Medium-sized, Large enterprise)
organization_size_reasoning (optional)STRINGReasoning behind the assigned size
Note: Reasoning fields are not published in the standard dataset.

Coverage statistics

At a glance

MetricValue
Total organizations~20,000,000
Countries & territories232
Industry sectors (NACE)22
Last updatedFebruary 2026

Top 20 countries by volume

CountryOrganizations
United States3,626,794
Germany1,828,181
United Kingdom1,013,026
Netherlands778,416
Italy625,754
Australia601,797
France571,895
Japan461,640
Brazil421,034
Canada366,258
Poland339,745
Czechia268,404
Spain250,467
Belgium233,349
Switzerland219,274
Sweden199,470
India186,750
Russia182,112
Austria158,644
Denmark140,840

Organization size distribution

SizeEmployee RangeCountShare
Micro0–97,341,83842.78%
Small10–497,494,40943.67%
Medium-sized50–2491,501,6558.75%
Large enterprise250+824,1264.80%

Top 10 industries (NACE)

NACE CodeIndustryCountShare
NProfessional, scientific & technical activities3,093,87918.03%
GWholesale & retail trade2,240,19113.05%
CManufacturing2,006,43511.69%
RHuman health & social work1,440,3408.39%
KTelecom, IT & computing1,269,1317.39%
FConstruction1,173,2266.84%
IAccommodation & food service1,126,5946.56%
SArts, sports & recreation1,101,5926.42%
QEducation584,4303.41%
JPublishing, broadcasting & content554,9853.23%

Notes on geographic data

Approximately 2.7 million organizations in the dataset do not have a standardized administrative region. These organizations are intentionally retained in the dataset. While they cannot be filtered by geographic region, they remain valuable for non-geographic analyses (e.g., industry, size, domain-level insights). When an organization appears multiple times (e.g., the same domain linked to different addresses), we deduplicate by domain and retain the record with the highest employee count, treating it as the organization’s headquarters.

Delivery

  • Formats: CSV, Excel, API, web app, parquet or any other requested file format for big data.
  • Delivery type: One-time dataset delivery or ongoing updates (discuss with the team)
  • Filters available: By country, industry (NACE), size, organization type, keyword filter, similarity filter, or any combination