Abstract
Serverless
computing, notably Function-as-a-Service (FaaS), serves as an appropriate
execution model for Internet of Things (IoT) applications because of its
scalability and pay-as-you-go model. Paradigm shifts like this come with their
own sets of challenges. This paper examines serverless computing's IoT
application development impact on "cold start" and state management
as primary challenges and considers their impacts. The analysis documents
performance degradation suffered by ergonomically IoT-sensitive systems and
correlates that suffering with the scale-to-zero principle of serverless
economics. In addition, the inherent statelessness of FaaS complicates the
design of stateful IoT applications. The focus of this research is on
mitigation techniques and architectural patterns used to address the challenges
and real-world case studies are analyzed to demonstrate that the model is
effective with careful implementation.
Keywords: Serverless Computing,
Function-as-a-Service (FaaS), Internet of Things (IoT), Cold Start Latency,
State Management, Cloud Computing, Edge Computing, Microservices
1. Introduction
A. The serverless paradigm: A shift in cloud execution models
Serverless computing marks a significant advancement in the
development of cloud execution models. Centered around the FaaS model,
applications are broken into smaller, independent units called functions, which
the cloud provider activates in response to designated triggers. Each paradigm
that contributed to the rise of serverless computing possesses certain key
features, with IaaS and PaaS offerings being the most similar to traditional
models.
First, execution is fundamentally event-driven. Wrapping functions
are triggered by various types of events across the entire system. These
include an HTTP request via an API gateway, a database update, a queue message
or a data point received from an Internet of Things (IoT) sensor. Second, these
functions emphasize stateless and ephemeral processing. They run in
short-lived, isolated containers or micro virtual machines and are expected to
be stateless and short-lived between functions. Third, the platform guarantees
automatic and seamless scaling within the system. Multiple events arriving
simultaneously lead to automatic adjustments with no external control-from zero
to thousands of concurrent function instances and vice versa. Finally, this
model is supported by a very fine grained, pay-per-use billing system. Users
are charged during function execution based on the approximate milliseconds of
memory and computing time, with no costs for inactive resources. This "deliver
on demand, never pay for idle" policy is the primary economic reason for
its widespread adoption.
B. Architectural requirements of the internet of things (IoT)
The Internet of Things continues to be characterized by the
widespread and increasingly complex network of connected devices, currently
estimated at around 75 billion and expected to grow significantly. The devices
forming the ecosystem produce vast amounts of raw data, which must be managed
through advanced, flexible and scalable computational architectures. The
distinctive workload patterns of data and computation shape the architectural
standards of IoT applications.
The connectivity and use of IoT devices are often examined in terms
of traffic. Devices can transmit data at irregular intervals, for example, when
a sensor triggers an alarm or a smart device uploads data periodically. This
results in sparse and random data stream traffic, which doesn't align with
server-focused systems. IoT systems that play a significant role typically have
low internal latency. Critical IoT applications such as industrial automation, transportation
systems and monitoring devices require real-time data processing and immediate
responses. This is vital for their effectiveness and security. The architecture
of IoT devices must support and manage data inflow from billions of devices
simultaneously. This is crucial for IoT, as the architecture needs to be easily
and cost-effectively adaptable to meet constant changes in demand.
C. The synergy and conflict of serverless and IoT
Initially, the serverless server paradigm appears well suited for
IoT needs. The function-as-a-service (FaaS) model's event-driven approach
matches perfectly with the event-based data generation of IoT devices and the
auto-scaling capabilities can easily manage bursty and unpredictable data
flows. The system's economic value is increased by the organization’s payments
for computation, which handles sporadic IoT events and by avoiding the costs
associated with dormant servers.
A closer investigation shows that serverless computing and some
specific IoT applications face a set of underlying conflicts. These conflicts
stem from two distinct architectural disconnects. First, the ‘scale-to-zero’
feature of a serverless cost model is a primary reason why serverless computing
suffers from the so-called ‘cold start’ problem. With this problem, function
calls triggered after a period of dormancy experience high latencies. These
high latencies are highly unpredictable and, as a result, are not suitable for
real-time IoT applications. Furthermore, the stateless architecture and design
of ‘Function-as-a-Service’ (FaaS) functions conflict with the needs of IoT
applications, which often require tracking devices, user sessions and
historical data. Consequently, application developers are forced to use
inefficient systems for state control and management. This contradicts the
simplicity promised by serverless computing. The stark realities of these
issues are the focus of this paper, especially regarding their impact on IoT
application development. The paper then explores and analyzes existing
architectural frameworks and patterns aimed at easing the practical challenges
faced by IoT and serverless computing.
2. Core Research Challenges in Serverless
IoT Architectures
A. Cold start delays in latency-sensitive IoT workflows
A cold start and any other form of latency are two distinct
concepts. Cold start refers to the latency in cloud computing that occurs the
first time a serverless function is triggered or when a function has not been
used for a period. This is a normal occurrence. Essentially, it relates to the
ability to “scale to zero," a crucial feature for the serverless business
model. When a task arrives and no warm execution environment is available, the
platform must perform several time-consuming steps to prepare the serverless
function for execution: reserving a new container, fetching the function and
its buffers, configuring the language runtime and finally executing the
function. This process can be time-consuming. Research shows that cold start
latency, depending on the specific functions running in a microservices
environment, can range from a few milliseconds to several seconds. This can be
much greater than the actual function’s execution time. In complex designs with
multiple function executions, this cold start issue can cause extended total
response times, which in extreme cases may account for more than 90% of the
total response time.
This creates a paradox for serverless IoT. The most valuable aspect
of IoT serverless computing-paying for active computation on sporadic
data—comes with the cost of Scaling to Zero. However, sporadic IoT traffic
guarantees a sufficient load to always be active, idle or transitioning between
the two. Therefore, for IoT offerings, this feature of serverless architectures
is the most cost-effective, but cold starts are a significant disadvantage,
often called a "massive pay-for-what-you-get" syndrome. For some IoT
applications, the additional latency, which can be unpredictable and sometimes
substantial, renders the system ineffective. Smart health monitoring, connected
vehicles and Industrial Process Control Systems suffer greatly from this
latency. This paradox presents challenges in efficient resource allocation, as
the conflicting economic and performance characteristics of the serverless
model complicate real-world IoT deployments.
B. The state management dilemma for stateful IoT systems
Storage and computing function platforms rely on the assumption of
statelessness, where each function call is treated as an independent, isolated
transaction and executed as if previous calls did not occur. This design choice
makes it easier for providers to manage resources and scale in the cloud.
However, the most important IoT applications are inherently stateful. These
applications need to monitor the state of a device (for example, the
temperature setting on a thermostat or the lock status on a door), retain user
session context (such as smart home configurations) or analyze data streams
based on historical data.
This architectural disparity requires developers to export the
entire application state to an external persistence layer, which could be a
NoSQL database, an object store or a distributed cache. While this approach is
pragmatic, it introduces significant architectural complexity and performance
issues. Every function call that involves reading or writing state results in
an external service network round-trip, increasing latency and negatively
affecting the critical path. Additionally, developers must handle the scaling,
cost and security of the external state store, which partly contradicts the
‘zero-ops’ promise of serverless computing.
State application centralization remains integrated within an
external data monolith and is divided within the compute architecture into
separate microservices. Although the microservices framework improves network
latency and reduces single points of failure, the pivot functionalities
reorganize the performance bloat. Adding serverless functionalities at the
network edge further increases the complexity of the challenge. Among
unresolved issues are maintaining invariant state across multiple self-governing,
loosely connected edge nodes containing mobile IoT devices, ensuring consistent
data duplication and managing relocated state.
C. Trade-off between performance and cost
On commercial serverless platforms, resource allocation is managed
through a single dial - memory settings. The amount of allocated memory
determines the CPU and network bandwidth assigned to the function. This creates
a complex optimization challenge for developers, often called right-sizing the
function. Under-provisioned memory can lead to longer execution times due to
CPU starvation, which paradoxically increases costs if the extended duration
exceeds the cheaper per-millisecond rate. Conversely, over-provisioned memory
is a sunk cost, since the function will consume unnecessary additional CPU
power. This forces developers to choose between balancing cost and performance
for each function within an application or setting constraints.
This extends to strategic decisions at a higher level of the
architecture, especially regarding function size. Developers can either create
an application as many small, single-purpose functions or combine related tasks
into fewer, larger functions.
Single-purpose functions: This approach follows microservices principles, offering greater
modularity, easier maintenance and independent scaling. However, it can
increase end-to-end latency due to network delays between function calls and
the risk of cascading cold starts. It may also result in direct costs for
orchestration service state transitions, such as AWS Step Functions.
Function fusion: Lowering inter-function latency and orchestration costs can be
achieved by combining multiple steps into a single function. This approach
sacrifices modularity and can lead to inefficient resource allocation because
the fused function needs to reserve the most memory and CPU for the most
computationally intensive task, even when handling less demanding ones.
In the case of large-scale IoT deployments, such resource management
choices become even more critical. Within the context of a memory function, a
slight efficiency loss or subpar architectural decision can lead to significant
and unexpected cost overruns when scaled for a billion monthly invocations for
a fleet of IoT devices.
|
Technique |
Goal |
Advantages |
Trade-offs |
|
Provisioned
Concurrency |
Reduce Frequency |
eliminates cold
starts for configured capacity. |
Higher cost
(billed for idle time |
|
Function Warming
(Pinging) |
Reduce Frequency |
Low cost; simple
to implement. |
Not guaranteed;
platform may reclaim resources |
|
Predictive
Pre-warming |
Reduce Frequency |
Balances cost and
performance; adapts to traffic patterns. |
High complexity;
requires historical data |
|
Code Optimization |
Reduce Duration |
Reduces cost and
latency for all cold starts; good practice. |
Requires
developer discipline |
|
Runtime Selection |
Reduce Duration |
Simple decision
with significant impact on startup time. |
May conflict with
team skills. |
|
Environment
Snapshotting |
Reduce Duration |
Drastically
reduces initialization time for supported runtimes (e.g., Java). |
Limited to
specific runtimes |
Figure 1: The Illustration Shows How the
Saga Pattern Implements an Order Processing System by Using Aws Step Function.
c) Stateful serverless frameworks:New
research concentrates on integrating state more naturally into the serverless
model rather than merely externalizing it. The Serverless State Management
Systems (SSMS) class of systems aims to reduce the challenges of stateful
scaling and fault tolerance. These systems frequently use an actor-like
programming model where developers create stateful message handlers and the
platform manages the actor state using persistent storage checkpoints for
recovery.
Numerous frameworks reflect this trend, including Cloudburst, which
first introduced the principle of logical disaggregation with physical
colocation. It uses a persistent, scalable and auto-scaling key-value store,
while co-locating mutable caches with function executors for low-latency access
to frequently used data.
Similarly, Crucial is based on the insight that FaaS involves
concurrent programming at a data center scale and uses a distributed shared
memory for detailed state management and synchronization. This allows for easy
migration of traditional multi-threaded systems to serverless architectures
with minimal code changes. Other frameworks, such as Enoki and Faasm, focus
directly on stateful serverless challenges at the network edge, addressing
issues related to client mobility and resource scarcity in distributed IoT
systems.
The aforementioned frameworks represent a significant milestone in
bridging the stateless gap in FaaS to meet the needs of advanced and stateful
applications (Table 2).
|
Pattern |
Description |
IoT Use Case |
Fault Tolerance |
Performance |
|
External Database |
Functions read/write state to a managed
NoSQL/SQL database. |
Storing device shadow, user profiles, sensor
readings. |
High (relies on database durability and
availability) |
Moderate (adds network latency for every state
access). |
|
Distributed Cache |
State is stored in an in-memory cache for fast
access. |
Caching frequently accessed device
configurations or session data. |
Moderate (cache is ephemeral; requires a
backing store). |
Low (sub-millisecond latency for cache hits). |
|
State Machine Orchestration |
AWS Step Functions manages workflow state and
coordinate’s function calls. |
Multi-step device onboarding, complex
command-and-control sequences. |
Very High (built-in retries, error handling
and state persistence). |
Higher (overhead from the orchestration
engine). |
|
Stateful Frameworks |
The platform provides a native state
management layer (e.g., shared memory). |
Real-time collaborative applications,
fine-grained synchronization. |
Varies by implementation (often relies on
checkpointing). |
Potentially Low (avoids network calls to
external services). |
B. Achieving cost-performance optimization
Optimization strategies try to balance the two extremes of
performance and cost in serverless architectures and are applied at the level
of resource distribution and system architecture.
a) Systematic function right-sizing:
Understanding each function's guesswork is an inefficient use of
resources. Developers need to adopt a strategy for each function. Automated
systems such as the AWS Lambda Power Tuner can execute a function with
different memory settings, analyze its performance and calculate its cost to
find the optimal configuration. This strategy can be counterproductive as the
ratio of memory to CPU, execution time and cost vary significantly across
different workloads.
b) Architectural guidance on function granularity:
Choosing a combination of a large and a smaller function is a design
decision that should be based on the application's use case.