There are 10 parts or activities which are common to most modern data designs (2017). Some of these are fundamental, such as data ingestion and storage of source data, but others have less to do with theoretical design and instead reflect the current state of some general trends in the industry.
Two important trends are:
1) the migration of large monolithic systems to microservices (APIs and similar)
2) moving from on-premises data centers to public clouds such as Amazon Web Services, Microsoft Azure, Google Cloud Platform and IBM Cloud.
If you are not that technical, you can think of a monolithic application as one really large zip file that contains thousands of files that many people are constantly changing and that need to be in sync with one another. All of this has to be organized and then the file is moved to a server. There are other things that have to be managed too, such as databases, servers, etc, but the monolith is the file.
Microservices are basically many of those same programs often written differently and deployed in smaller pieces and usually on different services and infrastructure– namely, the cloud. This frees things up so that software can be developed faster and better and that people can be organized differently. Here’s a comparison:
The second trend of moving from onsite data centers to public clouds, such as Amazon Web Services, also affects design because the cloud infrastructure coerces certain configurations and services.
I included the image below as a reminder that in IT, at least for now, form follows infrastructure, culture, and other things; not so much function. Any design is beholden to its infrastructure and value chain among other things.
The reason for this is based in what I call movement. There are market and other forces at play in technical design and it is possible to get too far ahead. Often it is better to keep things as simple as possible within the current realm.
This is a current snapshot of what a modern data design looks like in Amazon Web Services followed by some definitions for clarification. Notice that there are many services that complement one another. Companies often move onto the cloud for cost reasons, and then find that many new business cases become viable because of all the new services available in the ecosystem.
Data Governance – management of the availability, usability, integrity, and security of data
Service Discovery – the automatic detection of devices and services offered by these devices on a computer network. A service discovery protocol (SDP) is a network protocol that helps accomplish service discovery.
Source Data- raw or atomic data that has not been processed for meaningful use
Data Lifecycle Management- policy-based approach to managing the flow of an information system’s data throughout its lifecycle: from creation and initial storage to the time when it becomes obsolete and is deleted.
Analytics & ETL- analytics is the discovery, interpretation, and communication of meaningful patterns in data; ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse.
Business Intelligence- the applications, infrastructure, tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.