Blockchain Indexing Blockchain Infrastructure

Blockchain Indexing – Overview of the Process Evolution

Bware Labs Team

7 May 2024

7 min read

In the intricate world of blockchain technology, the efficient retrieval and management of on-chain data represents one of the most difficult challenges for developers. This article delves into the realm of blockchain data indexing, an essential process that transforms the vast amount of raw data stored in a blockchain to structured high-level constructs tailored to serve the purpose of a specific application. From the foundational concepts and methodologies to cutting-edge solutions like The Graph and Covalent’s Unified API, we explore how these technologies are redefining data accessibility for cross-chain bridges, oracles, on-chain gaming, data analysis, and dApp’s frontends.

Rewind: Indexing 101

Blockchain data indexing represents a methodical approach to organizing on-chain information, facilitating rapid and straightforward querying capabilities for users. The analogy of using an index in a book to locate chapters easily parallels how blockchain indexing simplifies the search for specific data amidst the vast, immutable, and decentralized ledger system. The essence of blockchain technology as a secure, tamper-proof network underscores the need for an efficient system to index and query data, which is inherently complex due to the blockchain’s lack of native querying support. You can find our first overview of blockchain indexing right here.

Blockchain data indexing involves the extraction of data from the blockchain, and storing it in a structured form within a distributed database to enable efficient access. This process includes parsing data structures like blocks and transactions to index various elements such as transaction details, addresses, asset transfers, smart contract interactions, and blocks by relevant parameters. The use of diverse indexing techniques tailored to different blockchains and data types is critical for optimizing the querying process. Subsequently, the indexed data is made accessible via powerful APIs and user interfaces, including GraphQL endpoints and RESTful APIs, to facilitate user and application access.

The final stage in blockchain data indexing sees the provision of organized data to users and dApp developers through these APIs, allowing for fast and effective querying. This negates the need to manually scan the entire blockchain for specific information, significantly enhancing the efficiency of data retrieval. The incorporation of account transaction trace chain (ATTC) and subchain-based account transaction chain (SCATC) index structures further refines the data querying process, ensuring that users can locate precise data effortlessly, akin to utilizing popular search engines for information retrieval.

Pioneering: The Graph’s Solution to Blockchain Indexing

Often nicknamed “the Google of Blockchain”, The Graph is an open-source, decentralized indexing protocol simplifying information retrieval on Ethereum-based blockchains. It utilizes GraphQL, an API query language. The Graph employs “subgraphs” to interact with dApps via public and open APIs. These subgraphs analyze network blocks and smart contracts, aggregating data from various sources into a single API call.

Subgraphs comprise:

GraphQL schema, which defines database entities and connections;
Subgraph manifest, detailing data sources, templates, and functionalities;
Indexing logic, which specifies how to process and store blockchain data.

The Graph Network Participants are Delegators, Indexers and Curators. Delegators allocate their GRT tokens to indexers, earning a portion of query fees and rewards, helping secure the network without managing a full node. Indexers, or Node Operators, stake GRT tokens to provide indexing and querying services, earning fees and rewards in return. Lastly, the Curators, also known as Subgraph Developers, influence which data should be indexed through signaling, guiding indexers to high-quality subgraphs.

The Challenge of Cross-Chain Interoperability and Indexing

In the rapidly evolving landscape of blockchain technology, developers and enterprises face significant hurdles in accessing and managing data across multiple blockchain networks. Traditional methods, such as operating individual blockchain nodes or navigating through diverse and sometimes incompatible public APIs, are not only resource-intensive but also introduce complexities that can stifle innovation and scalability in the Web3 space. The challenge is further compounded by the need for real-time data indexing and retrieval across these disparate systems, a necessity for applications requiring up-to-date information from various blockchains. A factual example of this complexity is seen in the attempts to create decentralized finance (DeFi) applications that operate over multiple chains, where accessing unified, real-time data is crucial for operations such as arbitrage, lending, and asset management.

Redefining Blockchain Data Availability – Where We Are Now

Let’s take a look at the current available solutions in terms of blockchain indexing, as well as what distinguishes these projects in the area of data availability!

Covalent’s Unified API

The Covalent Unified API offers a standardized method for retrieving blockchain data, and maintaining consistency in request and response formats. This unified approach allows developers to integrate the API a single time, thereafter automatically gaining advantages from newly added network support, enhancements in endpoint performance, and other upgrades.

While not a cross-chain indexing solution in the strictest sense, Covalent’s Unified API represents a significant stride toward seamless interoperability and data accessibility across diverse blockchain ecosystems. This approach not only facilitates the rapid development and deployment of Web3 applications but also underscores the potential for future innovations in cross-chain indexing and blockchain interoperability, paving the way for a more integrated and efficient decentralized web.

Goldsky Subgraphs & Mirror

Goldsky provides two main self-serve products that function both separately and together to enhance your data stack. It offers a subgraph indexing solution that is fully backward-compatible.

At its core, the indexing utilizes the same WASM processing layer, and additionally, Goldsky introduces:

A redesigned RPC layer, an autoscaling query layer, and storage enhancements to boost reliability (over 99.9% uptime) and increase performance (up to six times faster).
Out-of-the-box webhooks support for enabling notifications, messaging, and other push-based applications.
Compatibility with custom EVM chains, allowing seamless indexing of your rollups or private blockchains.

On the other hand, the other solution proposed by Goldsky is Mirror. Mirror is a serverless data pipeline platform that enables the real-time integration of data into your database with a single .yaml file. Data is delivered directly to your datastore or queue, which can be queried together with your other data without external rate limits.

A Mirror pipeline guides Goldsky in sourcing data (sources), processing it if needed (transforms), and saving the results (sinks). This pipeline is aware of reorganizations, ensuring your datastores are updated with the latest information, and:

Manages backfills and edge streaming, letting you concentrate on your product.
Implements quality checks and automatic corrections and enhancements.
Automatically synchronizes data across different chains, standardizing timestamps and more.

Subsquid

Subsquid Network is a decentralized query engine designed for the efficient batch processing of extensive data volumes. It offers access to historical on-chain data from over 100 EVM, Substrate networks, and Starknet. The available data is thorough, including, for EVM networks, event logs, transaction receipts, execution traces, and state differences on a per-transaction basis.

Subsquid Network currently provides historical on-chain data from both EVM and non-EVM networks through a bespoke REST-like API. Plans are in place to expand its capabilities to include general-purpose SQL queries and a broader range of structured data sets, both on-chain and off-chain.

Subsquid Network’s API Features include:

Raw event logs
Transaction data, including receipts
Execution traces (for specific networks)
State differences (for specific networks)
It supports all major EVM and Substrate networks, and expansion to additional networks is planned. Feedback is welcome for requests to support additional networks.

Flair

Flair provides a range of reusable indexing primitives such as fault-tolerant RPC ingestors, custom processors, and re-org aware database integrations. These tools simplify the process of receiving, transforming, storing, and accessing on-chain data. Flair stands out due to its adoption of parallel and distributed processing paradigms, which significantly enhance scalability and resilience in contrast to traditional sequential processing methods like Subgraph. It offers compatibility with any EVM chain, enabling easy integration by simply connecting an RPC to start the indexer.

Further distinguishing features include native real-time stream processing for specific data workloads, such as aggregations and rollups, which is useful for tasks like calculating the total volume per pool. Flair also provides managed cloud services that reduce the need for DevOps, cutting down on unnecessary engineering expenses for decentralized application developers. Users can connect to any EVM chain with just an RPC URL, and Flair includes free managed RPC URLs for more than eight popular chains, supporting both websocket and https-only RPCs.

Flair’s capabilities extend to tracking and ingesting any contract event, automatically tracking new contracts deployed from factory contracts, and enabling custom processor scripts with a Javascript runtime that includes TypeScript support. It allows for external API or webhook calls to third-party services or backends, retrieving both current and historical USD values of any ERC20 token amount from any contract address on any chain, and using any external NPM library. Additionally, Flair enables the streaming of stored data to various databases such as PostgreSQL, MongoDB, Kafka, Elasticsearch, Timescale, and more.

Alchemy Subgraphs by Satsuma

Alchemy Subgraphs, previously known as Satsuma, is a blockchain indexing platform that offers full support for hosted subgraphs. This service is particularly valuable for Web3 engineering teams that depend on subgraphs to efficiently build custom APIs, though they frequently encounter challenges that consume significant engineering time.

To address these issues, Alchemy Subgraphs provides a seamless drop-in experience that allows for easy integration within five minutes. The platform guarantees a 99.9% API uptime under its service level agreement and offers up to five times faster indexing and twice as low data lag compared to other services. Additionally, it supports community interaction through a dedicated Telegram or Slack group where a team of engineers is available to assist. The platform is equipped with developer tools designed to save time for engineering teams. Among its other features, Alchemy Subgraphs includes auto-pruning for significantly faster queries, advanced metrics for performance insights, flexible subgraph versioning, and direct access to the underlying Postgres database. Furthermore, upcoming enhancements include GraphQL subscriptions and direct access to low-level decoded events, promising to enhance functionality and user experience.

Bware Labs Offers Custom Indexing Services

Expanding our product range to include blockchain data indexing services aligns naturally with our commitment to bolstering Web3 development. To enable wider adoption of blockchain technology, it’s essential to equip blockchain developers with tools that streamline their work, enabling them to concentrate exclusively on building innovative dApps for their users.

Bware Labs offers on-demand data indexing services for all EVM-compatible blockchains, through subgraphs using The Graph technology as well as providing data dumps using our proprietary indexing technology. Even though our core business revolves around Blockchain APIs our engineering team has managed to build a couple of innovative solutions that allow us to provide enterprise-level indexing services, based on the demand of our users.

We’re geared up to develop bespoke analysis and a custom-fit data strategy for each project’s unique requirements. Our service ensures the provision of the essential infrastructure and corresponding data to enable ongoing development while alleviating the complexities of establishing and managing the blockchain backend infrastructure necessary for data queries.