Ten Lessons From a Large IoT Project

Written by Liam McLennan

Lesson 1: Big data tools don’t solve the intractable problems of distributed system. They leak complexity to the consumer.

Lesson 2: Peak performance often relies on even distribution across partitions. Hashing keys can help. Try SHA256.

Lesson 3: To prevent DOS attacks public cloud components like Event Hub throw exceptions if you exceed quotas. The client must take responsibility for throttling.

Lesson 4: Azure web jobs are brilliant. They provide a way to run a console application as a reliable service that can be scaled horizontally. Web jobs are simple and they work.

Lesson 5: Distributed logs are a useful data structure for distributed systems.

Lesson 6: To understand what is happening in a system you need centralised logging with a correlation scheme.

Lesson 7: As metrics get larger we lose the ability to reason about them. 50/500/5000/50000 requests/s are very different. Those differences matter.

Lesson 8: Very large datasets (PB) drastically reduce data storage options.

Lesson 9: Consistency in the absence of transactions is hard. You will have to think about it.

Lesson 10: All actions must be idempotent. Work in batches. If an error occurs during a batch the idempotency allows the batch to be replayed.

Lesson 11: Tuning batch size substantially affects performance. Large batches and low error rates are the sweet spot. Higher error rates require smaller batches.