Better Data Management in Applications

Posted in #programming

Interacting with a datastore is generally an integral part of any web application. Without access to data, we typically can't do anything useful or particularly interesting. The problem for developers is that every application brings its own specific set of data needs along with a unique data architecture and schema. While data technologies have SDKs for writing interactions across different programming languages, developers are likely required to write a lot of boilerplate code just to glue all the pieces together. Add in caching, data validation, sharding, and other complexities, and managing access to data quickly starts monopolizing your development team's time. Time that could be better spent adding value for your customers to your application.

In this post I'll discuss some techniques and technologies to make managing data between our apps and datastores more efficient, consistent and scalable.

The Data Access Layer

Most developers end up building some sort of a data access layer in order to manage the connections between their applications and backend services.

A data access layer (DAL) in computer software, is a layer of a computer program which provides simplified access to data stored in persistent storage of some kind, such as an entity-relational database. ~ Wikipedia

The benefits of such an approach are many. DALs provide developers with shortcuts for working with their application's persistent datastores, both saving time and reducing human error when manipulating data. Oftentimes a DAL is written to provide an additional layer of security to a data backend. For example, low-level access to a database from an application could have disastrous consequences. SQL injection attacks or poorly written UPDATE or DELETE queries could wipe out critical records. Providing an proxy to the database, especially in production, can help mitigate these risks.

DALs can also provide a level of consistency when changing a database schema. Changing field datatypes, updating table names, and restructuring relationships, can all be abstracted by adding backwards compatibility to a DAL. Hunting down and updating individual queries within an application can be time consuming and error prone. A DAL can provide a consistent interface for developers without worrying about backend data changes.

Database Abstraction Layers

Data access layers are closely related to Database Abstraction Layers (DBAL) in that they are both meant to simplify database communications. However, DBALs are designed to allow for a single interface into multiple types of databases. This lets the developer deal with a consistent API regardless of the underlying database technology. DBALs make switching from MySQL to Postgres, for example, much easier because the underlying queries don't need to change. While there are many advantages to DBALs, there are also several disadvantages, mostly around speed and optimizations. Data access layers, on the other hand, are often written for a specific set of backend technologies, making switching harder in the future, but optimizing for speed and simplicity now. However, it is not uncommon for data access layers to use DBALs as a means for connecting to a database.

Object-Relational Mapping

DALs and DBALs are meant to simplify interactions with a datastore. This typically means incorporating a technique called object-relational mapping or ORM. Wikipedia has a good article about ORM, but essentially it is the process of converting flat, columnar data stored in relational databases (such as SQL) to objects with nested hierarchies representing the relationships. The example Wikipedia gives is that a Person object might have multiple phone numbers. In SQL, this might be represented as a one-to-many relationship using a separate table to store the phone numbers. ORM would map that table into the main Person object, allowing the user to add and remove phone numbers simply by updating the main object, rather than needing to write multiple queries.

ORM is an incredibly powerful technique in that it lets developers work with the complete representation of an object rather than just its parts. This ensures that changes to the object are saved in a single transaction, dramatically reducing bugs and simplifying code that interacts with data. A well-optimized ORM system can also negate many of the speed and optimization concerns of DBALs.

Putting It All Together

The three techniques discussed, DALs, DBALs, and ORM, can be used together or in parts to make database interactions easier for developers. The key is to decide which techniques make the most sense for the size and scope of your application. There are also plenty of open-source and commercial software options available that will greatly simplify implementing these techniques. Many frameworks, such as PHP's Symfony, incorporate ORMs directly, giving developers access to all of the optimizations and stability of well-established codebases. While software like this adds dependencies, the benefits probably outweigh the disadvantages.

Building data-intensive applications requires maintaining reliable and consistent connections to the data that powers it. Utilizing these techniques can help create better apps by simplifying data interactions and providing developers with a clean and consistent interface to your data.