We
were asked to deliver something with regard to the data architecture
and modeling to be used in the new datawarehouse of my employer.
The
organization is recently merged with another one, whereby systems and
processes are still separated from the main data warehouse.
In the new environment, the following must be taken into account:
-the different cultures of the blood groups of the fusion partners.
-the
organization that is spread over several locations and several
departments.
-the
existing landscape that must continue to deliver, the show must go
on.
-the
technology and the environment require a flexible set-up that can
last for several years.
With
regard to the cultures:
-The
largest merger partner consists of a self-managing team of
experienced employees with a more ad hoc way of working. Speed of
work is central. Demand driven.
-The
smallest merger partner consists of 2 teams that are organized
according to the principles of a demand supply organization, so they
do a lot with external hiring. They also provide the new management.
To
address all these factors, we propose to address this as follows:
–
Leave the current data warehouse intact, “the show must go on.”
–
Add the data of the smallest merger partner as quickly as possible so
that an unequivocal truth is created.
-To
organize the total data landscape in containers with their own staff
and their own possibilities to organize their work processes without
other containers being affected by this. It is also easier to work
with the OTAP street if there are no or at least clearly defined
interfaces between those containers. Experience shows that otap
testing processes that affect everything can hardly be done anymore.
So we want to reduce the complexity. The linking pin between these
containers should then be formed by the Meta data repository &
job control center.
The
containers that we see before us are:
–Self
service BI / Dashboards based on star models, perhaps to a large
extent simply copy from the already present models. Use an otap
street for this Self Service BI container as standard.
–Data
Science section this is easily forgotten but must be named. it seems
to me that this should also be arranged with a development and
production part. Unless nothing is produced and then this must also
be clear. In principle this concerns science and long-term studies
without guarantee of results.
–Applications.
There is currently a development towards real-time information
provision because the production system supplier no longer support
any regular reports and there are major problems with the validation
of registration. Operational provision of information and ad hoc data
provides the organization with money so this is important. For
realtime only flat tables are important, star modelling should not
be used. Timeliness and lead time are important in this container.
–Meta
data repository & job control center
An
automated repository that maintains meta data, performance and usage
of all components of the data warehouse for auditing and maintenance
and prioritizing datawarehouse processes.
In
principle, the boundaries between the containers are not fixed. The
core of the split is that between an “old” unmanaged part
and a “new” to be built managed container landscape.
Depending on the success and degree of exploitation of these
containers, shrinkage or growth, including the allocated FTE, the
boundary between the containers can be moved.
So
what is a container? 😉
Inspiration
for the setup in containers comes from our practical experience
within a large complex data warehouse environment and the
developments within the cloud architecture. (see, for example,
docker.com) Basically, this means stopping a certain group of data
warehouse activities in 1 environment, including all dependencies so
that people and resources do not get in each other’s way.
Step
1 of applying “containerization” is to put picket posts
within the existing (largest) data warehouse environment.