The Global Organization Index (GOI) is our curated, organization-level dataset containing approximately 20 million active organizations across 232 countries and territories. It serves as the canonical source for firmographic data at ISTARI, covering organization identity, location, industry classification, size, and more.

Data Pipeline & Methodology¶
GOI is built through a multi-step validation pipeline designed to prioritize quality over volume:
- Registry Collection — We systematically query national registers worldwide — including organization, commercial, and association registers. This primary data source is then enriched with additional open data sources, giving us an initial pool of approximately 400 million organizations.
- Domain Attribution — We determine which of these organizations can be attributed to a clearly identifiable web domain. This results in roughly 10% of the total — about 40 million organizations. The remainder are either no longer active, were never truly operational (e.g., pure holding structures), or simply never maintained a web presence.
- Activity Verification — We verify which of those 40 million domains are still actively operated. This validation step brings the dataset down to approximately 20 million organizations that are demonstrably active.
This process ensures that every record in GOI is web domain-verified and confirmed active.
Quality vs. Volume¶
Our final dataset is smaller than comparable, traditional databases. However, in contrast, our dataset contains exclusively verified and active organizations — no inactive records or dormant entities. The key differentiator is not size, but data timliness, operational relevance, and verification depth.
Data Sources¶
| Source Type | Role | Examples |
|---|---|---|
| National registers | Primary source | Company registers, commercial registers, association registers |
| Open data sources | Enrichment | Government databases, administrative statistics data, structured open datasets |
| Web presence | Verification | Organisation websites, active domain validation |
Schema¶
Key Definitions¶
- NACE Code — The EU's standard statistical classification of economic activities. Used as GOI's primary industry taxonomy. Learn more
- Organization Type — Categorized as Company, Startup, Academic, Public, or Other based on registry data and web content analysis.
- Organization Size — Derived from employee and revenue signals, bucketed into Micro, Small, Medium-sized, and Large enterprise per EU SME definitions.
Core Fields¶
| Column | Type | Description |
|---|---|---|
name | STRING | Organization name |
domain | STRING | Organisation website domain |
summary | STRING | AI-generated summary of the organization |
keywords | LIST | Descriptive keywords |
employee_class | STRING | Employee count bracket |
country | STRING | Country name |
country_code | STRING | ISO country code |
state / state_code | STRING | State or province |
region / region_code | STRING | Region |
district / district_code | STRING | District |
municipality / municipality_code | STRING | Municipality |
address | STRING | Full address |
latitude | FLOAT64 | Latitude coordinate |
longitude | FLOAT64 | Longitude coordinate |
Classification Fields¶
| Column | Type | Description |
|---|---|---|
nace_code | STRING | NACE industry classification code |
nace_reasoning (optional) | STRING | Reasoning behind the assigned NACE code |
organization_type | STRING | Type of organization (Company, Public, Academic, Startup, Other) |
organization_type_reasoning (optional) | STRING | Reasoning behind the assigned type |
organization_size | STRING | Size bracket (Micro, Small, Medium-sized, Large enterprise) |
organization_size_reasoning (optional) | STRING | Reasoning behind the assigned size |
Note: Reasoning fields are not published in the standard dataset.
Coverage Statistics¶
At a Glance¶
| Metric | Value |
|---|---|
| Total organizations | ~20,000,000 |
| Countries & territories | 232 |
| Industry sectors (NACE) | 22 |
| Last updated | February 2026 |
Top 20 Countries by Volume¶
| Country | Organizations |
|---|---|
| United States | 3,626,794 |
| Germany | 1,828,181 |
| United Kingdom | 1,013,026 |
| Netherlands | 778,416 |
| Italy | 625,754 |
| Australia | 601,797 |
| France | 571,895 |
| Japan | 461,640 |
| Brazil | 421,034 |
| Canada | 366,258 |
| Poland | 339,745 |
| Czechia | 268,404 |
| Spain | 250,467 |
| Belgium | 233,349 |
| Switzerland | 219,274 |
| Sweden | 199,470 |
| India | 186,750 |
| Russia | 182,112 |
| Austria | 158,644 |
| Denmark | 140,840 |
Organization Size Distribution¶
| Size | Employee Range | Count | Share |
|---|---|---|---|
| Micro | 0–9 | 7,341,838 | 42.78% |
| Small | 10–49 | 7,494,409 | 43.67% |
| Medium-sized | 50–249 | 1,501,655 | 8.75% |
| Large enterprise | 250+ | 824,126 | 4.80% |
Top 10 Industries (NACE)¶
| NACE Code | Industry | Count | Share |
|---|---|---|---|
| N | Professional, scientific & technical activities | 3,093,879 | 18.03% |
| G | Wholesale & retail trade | 2,240,191 | 13.05% |
| C | Manufacturing | 2,006,435 | 11.69% |
| R | Human health & social work | 1,440,340 | 8.39% |
| K | Telecom, IT & computing | 1,269,131 | 7.39% |
| F | Construction | 1,173,226 | 6.84% |
| I | Accommodation & food service | 1,126,594 | 6.56% |
| S | Arts, sports & recreation | 1,101,592 | 6.42% |
| Q | Education | 584,430 | 3.41% |
| J | Publishing, broadcasting & content | 554,985 | 3.23% |
Notes on Geographic Data¶
Approximately 2.7 million organizations in the dataset do not have a standardized administrative region.
These organizations are intentionally retained in the dataset. While they cannot be filtered by geographic region, they remain valuable for non-geographic analyses (e.g., industry, size, domain-level insights).
When an organization appears multiple times (e.g., the same domain linked to different addresses), we deduplicate by domain and retain the record with the highest employee count, treating it as the organization's headquarters.
Delivery¶
- Formats: CSV, Excel, API, web app, parquet or any other requested file format for big data.
- Delivery type: One-time dataset delivery or ongoing updates (discuss with the team)
- Filters available: By country, industry (NACE), size, organization type, keyword filter, similarity filter, or any combination