If you’ve been following the cloud buzz, you’ve inevitably heard of IOT or Internet Of Things. At a high level, the concept is simple: connect a bunch of devices that weren’t previously accessible and use intelligent data to inform other devices or take some action. A couple simple examples might include sensors on doors, vending machines or robots. Where in the past, a repair person might have to repair a motor on a scheduled basis, sensor data could inform that individual to take action sooner (or postpone) based on what the data is saying.
I wanted to embark on my own IOT journey. Reasons? 1) to better understand it and 2) to have something of ‘my own’ that I could demonstrate to clients. To do that, I knew I would need a few things first:
- A good use case or hypothesis that people could relate to
- A way to capture a significant scale of data
- Some time to learn and probably write a little bit of code
A Good Use Case
I wasn’t concerned with coming up with a good use case. Usually ideas are the easy part. Instead, I was concerned I wouldn’t be able to generate enough good data that resembled my use case. Data is easy to come by. You can download it off the internet, write a simple data generator app that creates millions of records, etc. That’s not hard. However, good data takes effort.
After a couple days of ideation, I landed on the concept of refrigeration. My thinking: operating temperature of a refrigerator in a warehouse is critical to preventing any loss of inventory or future sale. Assuming that today proactive maintenance is primarily done based on length of operation…I thought the scenario would be a good one to use as a framework. But how would I get refrigerator data?
Capture a Significant Scale of Data
Once I landed on the use case idea, next step was getting the data. I knew I could write a tool to generate some data for me, but what I really wanted was some variability–some real world anomalies. So, I decided to write a small app that captures the CPU% and Available RAM from your computer and use those values to mimic sensor data. CPU as a percentage (e.g. 27%) could resemble temperature (e.g. 27 F) and Available RAM (usually 1-2GB) multiplied by 1000 looks a lot like an RPM value for a motor (e.g. 1450 RPM). Success!
**Through a couple of iterations of MVP testing, I realized people actually wanted to see the computer data I was collecting. So, I didn’t do the conversions for the refrigeration use case and left the values as they were captured. I didn’t change my overall use case and explanation above, as the demo could easily be converted to tell that story, and I thought the MVP story was a relevant backdrop.
Time to Learn
“PaaS Effect” n. The feeling that is associated with realizing how awesome working with Platform-As-A-Service (PaaS) is.
Next was the daunting part for me. I knew nothing about IOT, the architecture or even how I was going to save this data to some “database” in the cloud?? Database? There’s no way that would scale to thousands of endpoints and millions of transactions per second. Surprisingly, the process was NOT daunting at all. In fact, the whole experiment took me about 4-6 hours to research AND implement. I was very impressed at how easy and fast it was to provision. I am dubbing this experience as the “PaaS Effect”: The feeling that is associated with realizing how awesome working with Platform-As-A-Service (PaaS) is vs. IaaS.
So, here’s what I learned (in a non-comprehensive summary of IOT): There are 4 major architectural components in my IOT solution:
Data Ingestion -> Reading / Real-time Streaming -> Longer-term Storage -> Analytics
Throw enough money and people at anything and it can “work”
There is no way an IOT solution could cost-effectively and elastically scale to thousands or millions of endpoints if data was written to a relational database. Could it work? Absolutely. Throw enough money and people at anything and it can “work”. However, that time and money adds no value to the solution when other platform services can do it all for me. Azure provides these capabilities through what it calls Service Bus.
A Service Bus is a way for services inside and outside of Azure to talk to one another. A Service Bus essentially defines a namespace where data can move through. Today, the Service Bus category includes services: Queues, Topics, Relays, Event Hubs and Notification Hubs. An example of data flowing out is Notification Hubs. Think of push notifications to your phone. The service can scale to millions of devices at an insanely low-cost. An example of data flowing in is Event Hubs.
I won’t go into detail about the other services as Event Hubs (designed for IOT scenarios) is exactly what we need. Here is a great overview of the different components within Azure Service Bus.
Azure Event Hubs are a type of Service Bus component specifically designed for gathering data from external services like websites, apps, sensors, etc. They work by creating Partitions of data streams that are accessible via a single endpoint exposed to the internet. You can create a configurable number of Partitions, based on the scale and architecture of your solution, but the key is that they are all exposed by the same physical API endpoint.
How Event Hubs Work
Publishers (entities or applications that write data to an Event Hub) send data to the endpoint via HTTP or AMQP. That data is automatically routed to a Partition. Think of a Partition like a ‘commit log’: The service appropriately leverages the partitions to scale the Event Hub to accommodate millions of writes to it. The Partition then holds the data for Reading until it is purged automatically by the system. It’s not recommended to write directly to a specific Partition, but rather allow the service to efficiently choose it for you.
Consumer Groups are applications or services that read data from an Event Hub. One or more Consumer Groups can read from the Partitions as data is queued. At the point the data is read, a ‘position offset’ of the last record is marked for that specific Consumer Group so it is not pulled again on the next read (therefore with each Read, the results are always new/unique).
Each Consumer Group keeps a record of its own ‘position offset’; therefore, it is recommended to only have one “application” per Consumer Group reading data at a time. For example, if one cloud service is writing the data from a Partition to some long-term storage (Blob, Database, etc), you would want a second Consumer Group for a second app that was doing real-time analytics.
More details on Azure Event Hubs can be found here.
Reading Test and Storage
While not required for the production version of this solution, I wanted to write a test application to read data directly out of Event Hubs. For this, I needed a dedicated Consumer Group. Since Event Hubs are only temporarily holding the ingested data, I also needed a place to read the (JSON) files too. I chose Blob storage, which is a very cost-effective solution for storing data and opted for Locally Redundant Storage – which stores 3 copies of the data in the same data center. This can be scaled to regional or global replication for greater fault tolerance if required.
In the later parts of this series, I graduate from this “test” read application and use real-time Stream Analytics to read the data and move it to longer-term storage (resembling a common production architecture).
Instead of duplicating already great content, here is a link to an excellent step-by-step tutorials on how to configure Event Hubs for yourself: Getting started with Event Hubs
In Internet of Things – Part II, I will go over real-time streaming the data to a longer-term storage and my configuration for analytics and visualization.