How Blockchain Can Support Analytics

Creating a read-only immutable twin of the main blockchain node is a secure approach to replicating the blockchain state for supporting querying and reporting

We live in a data-driven world where every bit of data can be analyzed to derive information that can be leveraged for several use cases. In an IT system, data resides in various layers that can be mined for diverse purposes such as reporting, understanding customer behavior, monitoring key performance indicators, etc.

Blockchain technology is a prime candidate for data analytics and mining as it is the backbone on which P2P (peer-to-peer), B2C (business-to-customer) and/or B2B (business-to-business) transactions take place over a business network.

Does blockchain support analytics?

One of the challenges that the initial set of blockchain platforms faced was the inability to support analytics because the underlying transactional data is stored in a key-value pair format and the querying of current or historical state is possible only when the “key” is used. For example, an asset may be identified via an asset identifier that is stored as the key based on which blockchain data can be retrieved. Querying based on the associated metadata or performing analytics by slicing and dicing the state of the data cannot be achieved directly. Hence, the ledger data contained in each blockchain node is not conducive for rich querying and analytics functions.

Therefore, solution implementers have relied on non-standard, peripheral techniques of storing the same business transaction data in off-chain repositories for overcoming this shortcoming of blockchains. This effectively means that the same set of data must be maintained on blockchain as well as other off-chain components, which can lead to data inconsistency and integrity issues. In the absence of a standardized mechanism, this approach of creating a data replica can quickly go out of sync as the transaction volume increases. Moreover, the contents of the off-chain database can be altered, resulting in data integrity issues that cannot be prevented proactively.

How can we apply analytics to blockchain?

Blockchain is designed for establishing trust through consensus-based approach for transaction validation. It is not advisable to change its core and replace how it stores the transactional data as it can lead to performance and security issues. Therefore, our approach to make this transactional data available for analytics is to create a read-only immutable twin of the main blockchain node.

Figure 1: Event-driven approach for the twin node

As depicted in Figure 1, the main node (N1) needs to be augmented to include a system smart contract, which can trigger events whenever ledger data is added. This event-driven approach is used to publish the transactional data and associated Merkle proof via messaging engine.

The twin node (TN1) has the following characteristics:

It is an authorized subscriber to the main node on a secured channel
It comprises of a read-only data store that will hold the state of each and every transaction committed on the main blockchain node
Each data record is linked to the corresponding transaction hash and Merkle proof of the blockchain at that point of time, which can be used for ensuring data integrity
Mirroring of state is accomplished by the blockchain runtime execution on the corresponding main node via a system smart contract, which will be the only process that can make an entry to twin node's data store
Publish-subscribe pattern is used to ensure reliability and asynchronous replication of blockchain state as well as allowing for multiple twin nodes

The immutable twin node solution provides the following advantages:

A standardized mechanism which is enabled via blockchain runtime to create the state replica, thus eliminating data inconsistency and integrity issues
Transactional data is made available in a flat structure so that it can facilitate rich querying and be integrated with any analytics engine for meaningful reporting
Integrity of the twin nodes' data can be easily checked by verifying it against the transaction hash and Merkle proof
As the transaction volume grows, the state storage requirements can be offloaded to the twin node while the main node only stores the transaction hashes
Since this an optional configurable feature, all participating entities don't have to bear the burden of maintaining a twin node or its infrastructure cost

Standard and secure approach to blockchain analytics

The immutable twin node approach applies mainly to blockchain platforms that leverage a key-value store concept such as Ethereum and its derivatives. Some of the permissioned distributed ledger platforms have also tried to resolve these issues by providing an alternate NoSQL or relational data store for the main node, which impacts the overall performance. Instead of using bespoke methods of replicating the blockchain state to support querying and reporting, a standard and secure mechanism of leveraging event-driven, credential based, and read-only twin node is the recommended approach.

The diagram above depicts a proposed blockchain monitoring framework, which comprises of the following:

A monitoring agent, which gets deployed on each blockchain node and associated dApp infrastructure, and can read the logs generated as a part of the transaction process and relay the CPU, memory and I/O usage data
A log collection engine that handles the streaming log information and assimilates it for further processing
The elastic nodes cluster, which processes a large amount of log data to organize and index it into matching documents, which are shared and stored as replicas
A visualization platform, consumes the data collated by elastic nodes and provides effective insight into the blockchain nodes and network statistics
Enables stakeholders to perform analytical research and generate reports

Leveraging the proposed monitoring framework will help:

Analyze how the blockchain transaction processing and consensus mechanism utilizes the underlying infrastructure resources
Provide visibility on a business transaction—end to end—as it gets initiated by a user from the dApp and is captured into blockchain
Combine and correlate the block and transaction related events from each node and determine the performance and throughput of the blockchain network
Setup a non-invasive monitoring solution that can be dynamically enabled for each onboarded peer and also support a common network provider model

Conclusion

While there is no dearth of monitoring solutions, the technique for effectively leveraging the existing mechanisms for monitoring a blockchain network is not thought through. The primary reason is that not many enterprise use cases have translated to production grade systems on blockchain yet. Also, the decentralized nature of blockchain poses the question—is monitoring of the whole blockchain network really required?

To maintain, analyze and improve an enterprise blockchain based solution, a holistic monitoring solution is required. This can further be coupled with DevOps tooling to enable maximum uptime of the blockchain network and ensure business continuity.

About the Author

Hitarshi Buch
Lead Architect, CTO Office, Wipro Limited

Hitarshi has 19 years of experience in IT architecture, consulting, design and implementation using blockchain, API, SOA, BPM and Java/J2EE technologies. He has experience in IT transformation and modernization initiatives and has provided enterprise-wide SOA-based solutions. In his current role, as a Lead Architect in Service Transformation at Wipro, he leads the Center of Excellence initiatives as part of the Blockchain practice.