Datawarehouse – Archimedes van Wou

Backup all SQL server databases without data ddl scriptonly

A method to backup all databases from a sql server by using tsql and Python can be achieved as follows:

install Ananconda
create an anaconda environment for example with conda create –name <myenv> –clone base or conda create -p <pathtoenviron> –clone base
activate new environment with conda activate <myenv>
install microsoft mssql-scripter with pip install mssql-scripter
place the following windows batch file somewhere on your server
make a tsql stored procedure with can execute the batch file
make a job with a jobstep with the tsql code loop, see all code below

Rem ***********************************************************************************
Rem escriptdb.bat
Rem ***********************************************************************************
echo %1
echo %2
rem in python environment eerst pip install mssql-scripter indien nodig
call c:\programData\Anaconda3\condabin\conda.bat activate myenv
e:
cd E:\Backup\DatabasesScriptOnlyBackup
rem script all schema to a file wothout data!
mssql-scripter -S localhost -d %1  -f %2

-- ======================================================
-- Author:		Archimedes van Wou
-- Create date: 20230907
-- Description:	backup scriptonly van alle databases SP
-- Version:     20230907 AvW
-- ======================================================
CREATE OR ALTER   procedure [dbo].[ExportDbScript2File]
(
	@db varchar(200),
	@bestand varchar(200)
)
as
begin 
	declare @commando varchar(350);
	set @commando=cast('cd.. &&"Q:\Projects\ExportDB\escriptdb.bat"  '+@db +' '+@bestand as varchar(350))
	EXEC xp_cmdshell @commando EXECUTE AS LOGIN='DOMAIN\SQLSERVER_USER'; 
end;

-- ======================================================
-- Author:		Archimedes van Wou
-- Create date: 20230907
-- Description:	backup scriptonly van alle databases
-- Version:     20230907 AvW
-- ======================================================
DECLARE @constBackupLocatie AS VARCHAR(200)='Q:\Backup\DatabasesScriptOnlyBackup\'
DECLARE @i AS INT=1;
DECLARE @Bestand AS VARCHAR(200);
DECLARE @db AS VARCHAR(30)

DROP TABLE IF exists #t_db
SELECT 
	Id=Row_number() over (order by name)
	,DbName=Name
	,Bestand=@constBackupLocatie+(FORMAT (getdate(), 'yyyyMMdd') )+'_'+name+'.sql'
	,IsDone=0
INTO #t_db
FROM sys.databases 
WHERE NAME NOT IN ('master','tempdb','model','msdb')

WHILE @i<=(SELECT MAX(id) FROM #t_db)
BEGIN 

	SELECT @Bestand=#t_db.Bestand,@db=#t_db.DbName FROM #t_db WHERE id=@i;
	--select @Bestand,@db ;

	EXEC DWH_DB.[dbo].[ExportDbScript2File]   @db=@db,@bestand= @Bestand;
	UPDATE #t_db SET IsDone=1;
	SET @i=@i+1;
END;
--EOF

New multi layer datawarehouse

We are currently discussing a new structure for our data warehouse. The current situation is that we have roughly 2 layers. Staging and the layer that contains partially denormalized tables, the T or end tables. I always say that the T stands for Terminus, Latin for “Endpoint” but I now believe that this means “limit.” …My latin is no longer what it used to be … 😉 We have had a Dim fact model here and there but never implemented it consistently. We use these T tables for ad hoc questions and for building reports. Of course it is true that one T table can feed the other. This is currently difficult for newcomers to follow. The current setup is around 15 years old. And is fine but is not transparent enough for the business and external accountability.

In detail, a ‘T table’ is therefore a commonly used combination of staging tables joined together, for example, we have a T_production daily that contains all production. Wide in number of columns and with many rows. Nowadays the servers have no more problems with this. YES!! 1 table to rule them all and in darkness bind them… (Ban of the Ring) That’s the idea.

Current levels datawarehouse (very simple version)=>

Proposed multilayer model=>

Staging

Staging level seems obvious to me, this contains a copy of the production databases. There is a desire to do this incrementally and / or in real time. I leave this aside for a moment.

Business layer

We have a discussion about this. The proposal is very similar to what we are doing now in the T tables layer, only redesigned. This layer is seen as a preparation for the dim/fact layer that is more intended for the end user.

People from our department will be able to directly work on the business layer because our experience shows that for SQL people, dimfacts are not really pleasant schema’s to work with. Nevertheless, we wonder whether we should do this, since this is not mentioned in the standard holy bibles….Kimball and others…

Dim / fact layer (self service)

The source system of production is very complicated. I also worked with competing systems and they were a lot easier. It strikes me that people with years of background in this system when they start modeling dimensionally, they tend to follow that complexity. What then arises are fact tables with 15 dimensions. My proposal is to reduce this as much as possible. No less than 5 dimensions per facttable 😉

Unfortunately, a system to be designed and accepted by the project members depends on a compromise between the members. I do not yet know which way it is going. The tendency is now to avoid disagreements so after repeating opinions several times they are not repeated again. But if the goal is to build something better, everyone should talk and listen to each other. Welcome to the human factor … I have experienced a project several times that would rebuild the data warehouse and they all fail by human factors. But not yet!!!

Datawarehouse containerization

We were asked to deliver something with regard to the data architecture and modeling to be used in the new datawarehouse of my employer. The organization is recently merged with another one, whereby systems and processes are still separated from the main data warehouse.

In the new environment, the following must be taken into account:

-the different cultures of the blood groups of the fusion partners.

-the organization that is spread over several locations and several departments.

-the existing landscape that must continue to deliver, the show must go on.

-the technology and the environment require a flexible set-up that can last for several years.

With regard to the cultures:

-The largest merger partner consists of a self-managing team of experienced employees with a more ad hoc way of working. Speed of work is central. Demand driven.

-The smallest merger partner consists of 2 teams that are organized according to the principles of a demand supply organization, so they do a lot with external hiring. They also provide the new management.

To address all these factors, we propose to address this as follows:

– Leave the current data warehouse intact, “the show must go on.”

– Add the data of the smallest merger partner as quickly as possible so that an unequivocal truth is created.

-To organize the total data landscape in containers with their own staff and their own possibilities to organize their work processes without other containers being affected by this. It is also easier to work with the OTAP street if there are no or at least clearly defined interfaces between those containers. Experience shows that otap testing processes that affect everything can hardly be done anymore. So we want to reduce the complexity. The linking pin between these containers should then be formed by the Meta data repository & job control center.

The containers that we see before us are:

–Self service BI / Dashboards based on star models, perhaps to a large extent simply copy from the already present models. Use an otap street for this Self Service BI container as standard.

–Data Science section this is easily forgotten but must be named. it seems to me that this should also be arranged with a development and production part. Unless nothing is produced and then this must also be clear. In principle this concerns science and long-term studies without guarantee of results.

–Applications. There is currently a development towards real-time information provision because the production system supplier no longer support any regular reports and there are major problems with the validation of registration. Operational provision of information and ad hoc data provides the organization with money so this is important. For realtime only flat tables are important, star modelling should not be used. Timeliness and lead time are important in this container.

–Meta data repository & job control center

An automated repository that maintains meta data, performance and usage of all components of the data warehouse for auditing and maintenance and prioritizing datawarehouse processes.

In principle, the boundaries between the containers are not fixed. The core of the split is that between an “old” unmanaged part and a “new” to be built managed container landscape. Depending on the success and degree of exploitation of these containers, shrinkage or growth, including the allocated FTE, the boundary between the containers can be moved.

So what is a container? 😉

Inspiration for the setup in containers comes from our practical experience within a large complex data warehouse environment and the developments within the cloud architecture. (see, for example, docker.com) Basically, this means stopping a certain group of data warehouse activities in 1 environment, including all dependencies so that people and resources do not get in each other’s way.

Step 1 of applying “containerization” is to put picket posts within the existing (largest) data warehouse environment.