by Marcus Crast, Senior Consultant, 10th Magnitude
If you want to get started with real-time analytics and the Internet of Things (IoT) today, there are two key capabilities you need: the ability to ingest data from your devices and applications; and the ability to do something with that data as it’s being ingested. I’m talking about things like running real-time queries (i.e., stream queries) that enable data visualization. And, if you really want to take advantage of the power of IoT and analytics, you might even want to trigger application logic based on a stream query.
Until now, the on-premises infrastructure required to enable these capabilities was cost-prohibitive for most organizations. Fortunately for everyone eager to get the ball rolling with analytics and IoT, Microsoft has introduced a new combination of cloud services: Event Hubs and Stream Analytics. Azure Event Hubs are part of the Azure Service Bus platform and serve as single point of ingestion to process millions of messages per second. Azure Stream Analytics sits on top of Event Hubs to perform analytics on data in flight. Pricing for these services is based on consumption and data volume, which means that organizations pay only for the computing resources they use.
Event Hubs follow the well-established publish-subscribe messaging pattern where senders of messages (publishers) do not communicate directly with the receiver of a message, but with a middleman or a hub (Azure Event Hubs). The final receiver of the message (also called a subscriber) then communicates with the publishers via the middleman. This pattern allows publishers (the multitude of connected and disconnected devices out there) and subscribers (notification engines) to operate without having to be aware of each other. In the complex world of BYOD and proliferating intelligent devices, this publish-subscribe pattern is critical prevention against the disconnect that would otherwise occur between message senders and receivers.
Event Hubs enable real-time analytics by serving as the repository for ingested data in flight, retaining data for a specified timeframe that the user sets based on their needs. Azure Stream Analytics (I’ll get to that next) can query data in flight to enable real-time action. Or, queries can be set up to collect data to be persisted into a repository such as HDInsight or a SQL database for later use.
Azure Stream Analytics is integrated out-of-the-box with Event Hubs, and actually operates on a different paradigm than most BI practitioners are used to working with. In the traditional analytics world, all data is latent because it first has to be written to a database and then read back out. Stream queries perform analytics on the data before it even gets to the database. In order to do this, we have to make the shift from “running queries” to “turning queries on.” With Stream Analytics, queries run constantly, watching for conditions to be met. The data meets the logic rather than the logic going out to meet the data.
A pipe analogy is a good way to envision how stream queries operate. Imagine the data flowing like water through a pipe (e.g., Event Hubs) and the query as a valve in the pipe that’s purposed for analytics and workflow. As the data passes through the query valve, it’s constantly being monitored for whether it matches the conditions set by the query. If a match is detected, then a specified workflow is triggered—for example, set off an alarm or send an alert.
Stream Analytics is also capable of comparing multiple data streams or combining stream data with data from LOB applications. As I described in my previous post on Azure Data Lake, pairing different analytic workloads can deliver powerful benefits by combining historical insights with real-time data. One of the most critical features of Azure Stream Analytics is that it is a temporal system, meaning it deals with the progress of time in ways that traditional scenarios would deem impossible or too complex to implement. Microsoft’s Power BI business intelligence tool offers another avenue for gaining immediate insights from stream queries. Power BI makes it easy to surface data in Azure and can provide real-time visualizations that fit perfectly with the real-time nature of Event Hubs and Stream Analytics.
What’s a real-world example of how Event Hubs and Stream Analytics could work together? Preventive maintenance is a great one, because almost all companies have to perform maintenance on something—whether it’s a machine or a process—to ensure it’s running in the most efficient way. For example, a building management company for a residential high rise could use sensors to report suboptimal operating conditions in real time for their HVAC systems. Traditional latent analytics might show that residents are more likely to complain about HVAC problems at night or on the weekends, when they spend the most time in their apartments. However, calling out an HVAC repair person during non-business hours usually comes with a hefty surcharge. Reporting problems in real time would ensure that issues are fixed as soon as possible without requiring the management company to pay a premium for evening or weekend service.
The pre-integrated nature of Event Hubs and Stream Analytics (along with Power BI) makes it easy to launch a first IoT proof of concept and validate potential use cases like the one I just described. These new Azure services go a long way toward making IoT implementations a real-world, near-term possibility for organizations of all sizes.