Can every storage service or database service be called a serverless storage or a serverless database? What makes one storage service being put into the serverless category and others drop out? This article will cover the basic properties a persistence service should have, for it to be called a serverless storage or a serverless persistence service. Next up, we'll give examples of different serverless storage and serverless database categories along with a few sample use cases of them in serverless computing. Lastly, we'll go on to discuss the serverless storage categories which are yet to emerge and look towards the future of serverless storage.
Serverless - no servers at all?
Source - http://www.commitstrip.com/en/2017/04/26/servers-there-are-no-servers-here/
The term serverless is a misnomer. Serverless does not mean that servers are no longer required and it basically goes with the concept of abstracting technology away from the user. The users no longer have to worry about servers and someone else (the cloud provider) is taking care of all that for the user.
Serverless is quite a buzz word these days. Even though the term serverless has been there for some time, the major reason for it becoming such a buzz word was the launch of AWS Lambda in 2014. The term further gained popularity with the launch of Amazon’s API Gateway in July 2015. In general, the term serverless - or rather serverless computing - refer the applications where server-side logic written by the application developer is run in stateless compute containers that are event-triggered, ephemeral (may only last for one invocation), and fully managed by a third party. This is also called Function as a Service (FaaS). Read more about serverless architecture in one of our previous articles.
The statelessness of Serverless Computing and the need for Storage/Persistence
One of the features of serverless computing or FaaS is that it is stateless. Therefore persisting state between two executions of the same function is not possible in the general design. This is where serverless storage/persistence comes into the picture. In cases where the state needs to be persisted between two FaaS function executions, it should be stored externally in a storage/persistence service.
The Problem with Traditional Storage
Traditional storage solutions are designed to run continuously on a fixed set of servers at a single geographic location. To use these storage solutions properly, application developers need to know a lot about configuration specifics. In the worst cases, high availability and performance are dependent on developer knowledge of database internals. Developers need to figure out and configure things like regions, zones, volumes, memory, compute capacity, and software versions. Teams have to spend hours after hours thinking about capacity planning, provisioning, sharding, backups, performance tuning, and monitoring. When you spend 80% of your time setting up and operating databases just to support serverless functions, you know something is amiss. While serverless computing adds elasticity to the compute layer, you cannot fully absorb its advantage when your persistence layer does not offer a comparable level of elasticity.
A Truly Serverless Storage
A truly serverless storage should ideally have the following properties.
- No provisioning
- Truly elastic (scale up and scale down without operator intervention)
When using a serverless storage, developers should not need to worry about any infrastructure details, like node size, memory, or storage size. In the same way that AWS Lambda charges per single-function invocation, consumption in a serverless storage should be measured by the amount of compute and storage used by different workloads. This is the essence of a serverless database. Users are never charged for idle capacity. Storage usage reflects function invocation and workload.
With traditional storage offerings, over-provisioning is the only strategy available to prepare for traffic spikes. A serverless storage’s elasticity prevents over-provisioning, which can be wasteful. Since a serverless storage has the capability to scale elastically without user intervention, developers can launch with no capacity planning and the applications will always have enough capacity. Further, the pay-as-you-go pricing model would ensure that no idle resources drain the user’s bank account. The cost of using serverless storage simply scales with usage. You never pay for unused capacity.
Serverless Storage/Persistence Options
Nowadays, there are various serverless storage categories; often, more or less the same service is provided by multiple cloud vendors. In this section we’ll go through those different serverless persistence categories, examples for each category, and some use cases of them as well.
Serverless Object Storage
Object storage is a hierarchy-free method of storing data, typically used in the cloud. Unlike other data storage methods, object-based storage does not use a directory tree. Discrete units of data (objects) exist at the same level in a storage pool. Each object has a unique, identifying name that an application uses to retrieve it. Additionally, each object may have metadata that is retrieved with it. Some common examples of serverless object storages are Amazon S3, Google Cloud Storage, Azure Storage and IBM Cloud Storage. One common example of the use of serverless object storage in the context of serverless computing would be the thumbnail generation.
- The mobile application uploads an image to an object store.
- Serverless object storage fires a change event after the image was uploaded, resulting in an execution of a serverless function.
- The serverless function creates a thumbnail based on the uploaded image.
- After uploading the thumbnail on to the serverless storage, the serverless function sends a push notification to the mobile application.
- Mobile application downloads the thumbnail and updates the user interface accordingly.
Serverless Relational Databases
An example of a serverless relational database is Amazon Aurora Serverless which was launched in the last quarter of 2017. It comes in two different editions compatible with both MySQL and PostgreSQL, but it is also compatible with other known systems like MariaDB, Oracle, etc. Amazon Aurora Serverless is an on-demand, auto-scaling database service, where the database will automatically start up, shut down, and scale capacity up or down based on the application's needs. It enables users to run their databases in the cloud without actually managing any database instances. It's a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads, and hence is ideal for serverless computing.
Serverless NoSQL Databases
Serverless Key-value and Document Stores
There are different kinds of serverless NoSQL databases available today. The most common ones are key-value stores and document stores. A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database. A document-oriented database or document store is a storage designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. A few examples of serverless key-value and document stores are Amazon DynamoDB, Google Cloud Datastore and IBM Cloudant NoSQL DB. Recently Amazon announced DynamoDB on-demand which provides the option for on-demand read/write capacity for DynamoDB with the pay-for-request model. There are various uses of such serverless key-value and document stores; out of those a simple use case is handling a contact us form with a serverless function.
Serverless Time Series Database
A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). Time-series data has specific characteristics such as typically arriving in the time order form, data is append-only, and queries are always over a time interval. While relational databases can store this data, they are inefficient at processing this data as they lack optimizations such as storing and retrieving data by time intervals. An example of a serverless time series database is the recently announced Amazon Timestream which is a purpose-built time series database that efficiently stores and processes records by time intervals.
Source - https://aws.amazon.com/timestream/
Serverless Graph Databases
Very simply, a graph database is a database designed to treat the relationships between data as equally important as the data itself. It is intended to hold data without constricting it to a predefined model. Instead, the data is stored like we first draw it out – showing how each individual entity connects with or is related to others. When it comes to a serverless graph database, we have FaunaDB which is a general purpose, transactional, temporal, geographically distributed, strongly consistent and relational database. FaunaDB also supports document store and time series database features.
What is Missing?
Serverless In-Memory Data Stores/Caches?
One of the storage types which are not yet available in the serverless flavour is in-memory data stores. One of the options we have at the moment for caching is caching at the API Gateway level. Then we have Amazon DynamoDB with single digit millisecond latencies. If a user needs to reduce the latency a bit more, then even though not yet fully serverless, DynamoDB Accelerator (DAX) can be used. Further, we do have AWS ElastiCache which offers Redis and Memcached but they do not yet offer them as fully managed services. Then from Google App Engine, we have Memcache as an in-memory caching service, however, this can only be accessed within the App Engine itself.
Serverless storage support is growing rapidly along with the growth and high adaption of serverless computing in the IT industry. We expect these services to grow, allowing a truly serverless application development.