We live in a data-driven world where every bit of data can be analyzed to derive information that can be leveraged for several use cases. In an IT system, data resides in various layers that can be mined for diverse purposes such as reporting, understanding customer behavior, monitoring key performance indicators, etc.
Blockchain technology is a prime candidate for data analytics and mining as it is the backbone on which P2P (peer-to-peer), B2C (business-to-customer) and/or B2B (business-to-business) transactions take place over a business network.
Does blockchain support analytics?
One of the challenges that the initial set of blockchain platforms faced was the inability to support analytics because the underlying transactional data is stored in a key-value pair format and the querying of current or historical state is possible only when the “key” is used. For example, an asset may be identified via an asset identifier that is stored as the key based on which blockchain data can be retrieved. Querying based on the associated metadata or performing analytics by slicing and dicing the state of the data cannot be achieved directly. Hence, the ledger data contained in each blockchain node is not conducive for rich querying and analytics functions.
Therefore, solution implementers have relied on non-standard, peripheral techniques of storing the same business transaction data in off-chain repositories for overcoming this shortcoming of blockchains. This effectively means that the same set of data must be maintained on blockchain as well as other off-chain components, which can lead to data inconsistency and integrity issues. In the absence of a standardized mechanism, this approach of creating a data replica can quickly go out of sync as the transaction volume increases. Moreover, the contents of the off-chain database can be altered, resulting in data integrity issues that cannot be prevented proactively.
How can we apply analytics to blockchain?
Blockchain is designed for establishing trust through consensus-based approach for transaction validation. It is not advisable to change its core and replace how it stores the transactional data as it can lead to performance and security issues. Therefore, our approach to make this transactional data available for analytics is to create a read-only immutable twin of the main blockchain node.
Figure 1: Event-driven approach for the twin node
As depicted in Figure 1, the main node (N1) needs to be augmented to include a system smart contract, which can trigger events whenever ledger data is added. This event-driven approach is used to publish the transactional data and associated Merkle proof via messaging engine.
The twin node (TN1) has the following characteristics:
The immutable twin node solution provides the following advantages:
Standard and secure approach to blockchain analytics
The immutable twin node approach applies mainly to blockchain platforms that leverage a key-value store concept such as Ethereum and its derivatives. Some of the permissioned distributed ledger platforms have also tried to resolve these issues by providing an alternate NoSQL or relational data store for the main node, which impacts the overall performance. Instead of using bespoke methods of replicating the blockchain state to support querying and reporting, a standard and secure mechanism of leveraging event-driven, credential based, and read-only twin node is the recommended approach.
Hitarshi Buch
Chief Architect - Blockchain CoE, CTO Office, Wipro
Hitarshi has 20 years of experience in IT architecture, consulting, design and implementation using blockchain, API, SOA, BPM and Java/J2EE technologies. He has experience in IT transformation and modernization initiatives, and enterprise-wide SOA-based solutions. In his current role, , he leads the Center of Excellence initiatives as part of the Blockchain practice.