Q1. What are the types of data warehouses? What type of snowflake data warehouse is it?
Before understanding the types of data warehouses, we need to know what is a data warehouse? A data warehouse is a central data repository used for data analyses and reporting .
Following are the types of data warehouses
a) Enterprise Data Warehouse (EDW) which is a centralized warehouse used for decision making across enterprises. EDW is used for tactical and strategic decision purposes.
b) Operational Data Store (ODS) which is a centralized database that is the complementary element to EDW and often acts as a source to EDW, ODS gets refreshed in real-time and used for operational reporting and decision making.
c) Data Mart is a subset of a data warehouse and is normally used by a specific team or business line.
Snowflake is an analytic data warehouse i.e can be used as Enterprise Data Warehouse and it is implemented as a software As A service i.e SaaS service.
Q2. Is it possible to use data from the local system to load in Snowflake?
No, we can not load from the local system we need to use Amazon S3 bucket or Microsoft Azure BLOB, or Google cloud storage.
Q3. What are the important features of Snowflake?
The listed below are the features of the snowflake -
1. Database and object closing
2. External Table
3. Geospatial data support
4. XML support
5. Cashing
6. Search optimization services
7. Integration with Hive meta store
8. Data protection and security
9. Time Travel
10. Data sharing
Q4. Can we use an external database such as Oracle or DB2 for Snowflake storage layer ?
No, we can not use an external database for snowflake storage layer snowflake comes with an inbuilt database which is built on SQL database. It is a columnar stored relational database. The snowflake database works well with Tableau, Extel, and many other tools. Snowflake database provides all the services which come with SQL database such as role-based security, query tool, multi-statement transactions, etc.
Q5. What are the cashing areas in snowflake architecture?
The data fetched from the storage layer is cashed at two locations 1) Compute layer 2) Cloud services layer
If cloud services layer cashing is disabled then compute layer cashing is used
This is the first article on a series of Interview Questions and their answers on Snowflake. Through these questions and answers we will learn more about Snowflake so let's start.
Q 1 . What is the architecture of Snowflake?
Snowflake architecture is a hybrid architecture of shard-disk and shared-nothing database architectures. As like shared-nothing architecture, snowflake processes queries using massively processing compute clusters where each node store some portion of data locally. on other hand, as like shard-disk architecture, snowflake uses a central repository for data & it is accessible from compute nodes.
Snowflake architecture has three layers i.e database storage to store data, Query processing or compute layer for processing queries and the third layer is cloud services which provide services such as security, metadata, and optimizer.
Q2. What are cloud platforms are supported by Snowflake?
Snowflake supports the following cloud platforms -
1. Amazon Web Services (AWS)
2. Google cloud platform (GCP)
3. Microsoft Azure ( Azure)
Q3. Do you consider snowflake as an ETL tool?
Yes, snowflake can be considered an ETL tool because it performs extract, transform, and loads operations like other ETL tools.
a) Extract Process: With help of this process, snowflake extracts data from the source and creates data files. these data files support various data formats like CSV, XML, JSON, etc.
b) Transform Process: We can write a custom transform process in snowflake to pull data from the source and apply some transformations for cleansing and standardization and then load the data.
c) Load Process: With the load process, we can load data to the internal or external stage. we can use Microsoft Azure Blob, Amazon S3 bucket or snowflake managed location for staging data. The data is copied to snowflake storage using the COPY INTO command.
Q4. What are the different Editions of Snowflake?
Here is list of snowflake Editions
a) Standard Edition
b) Enterprise Edition
c) Business -critical Edition
d) Virtual Private snowflake
Q5. What kind of SQL does snowflake use?
Snowflake uses ANSI SQL which is a common standard version of SQL.
Are you looking for the details about various phases in BES External calls? Are you also interested in knowing what are categories under which these phases come? If so, then you reached the right place. In this article, we will learn about BES External calls and their phases.
A) What BES External calls?
BES External calls are the web service configured in the provisioning tool in order to achieve customization in Entity 360 or Customer 360 application. we need to develop a custom web service and deploy it on the application server. Once it is deployed, we can register the endpoint URL in the provisioning tool in Informatica MDM.
B) Categories of phases of BES External calls?
The phases of BES External calls are categorized as -
a) Read calls
b) Write calls
c) Merge calls
a) Read calls: This category is used for read and search operations.
b) Write calls: This category uses write co for insert, update and delete operations.
c) Merge calls: This category uses preview merge co which is designed for merge-operations.
C) Phases of External calls
Let's understand detailed phases for the above categories
Are you looking for details about Dynamic Data Masking? Are you also interested in knowing what are the things we need to consider for implementing Dynamic Data Masking also known as DDM? If so, then you reached the right place. In this article, we will explore various aspects of Dynamic Data Masking.
A) What is Dynamic Data Masking (DDM)?
Dynamic Data Masking is a technology using which we can mask production data in real-time. Dynamic Data Masking also called DDM does not change data physically. DDM just changes the data stream in order to mask the sensitive data when the requestor request such information.
B) What are the Dynamic Data Masking tools?
Various vendors provide Dynamic Data Masking functionalities and these are
1. Microsoft Azure SQL Database
2. Oracle Enterprise Manager
3. iMask
4. Informatica Dynamic Data Masking
5. Imperva Data Masking
6. Infosphere Optim Data Privacy by IBM
7. K2 view Data Masking
8. Mentis
C) What are data masking Rules?
The rules contain various conditions and actions that rule engines use in order to process the request.
e.g.
1) Connection rules : It process application connection requests.
2) Security rules : It process SQL statements.
Here are important points about rules
a) We can define and create rules in order to process SQL requests that are executed by an application against the database.
b) DDM rule uses two techniques i.e connection criteria and masking techniques.
c) In order to forward the requests the rule Engine uses connection criteria.
d) In order to mask the data the masking technique is used.
D) What are DDM rule components?
The DDM rule components are as below
a) Matcher: It defines the criteria for the rule engine to identify the match.
b) Action: It defines action which will be applied by the rule engine to request.
c) Processing Action: The rule engine applies specific action to the request after applying the rule.
Do you know why snowflake is one of the leading cloud data warehouse platforms available in the current market? Are you also aware that how Snowflake has evolved over the period of time and still evolving and replacing traditional data warehouses? We are going to understand all these things in this article.
Introduction:
In this article, we are going to see the golden age in which we are living. we will also see the design for the traditional data warehouse. Then we will see how snowflake evolved over the number of years and then finally we will see what's making snowflake a leader in the cloud data warehouse market.
Golden Age:
Let's start with the Golden age. Currently, we live in the golden age of distributed computing. The public cloud platforms such as Amazon web services, Google cloud platform, or Microsoft Azure provides unlimited storage and compute resources and these resources are available on demand. Because of that only end-user can Enterprise-class experience for systems or applications with help of software as a service or SAAS model. For this experience, we do not have to spend a lot of money. These services are cost-efficient and perform well. Cloud Dataware leverages these features but not the traditional data warehouse.
Before going to see what are the drawbacks of the traditional data warehouse implementation. Let's have a look into the design of the traditional data warehouse. As we can see in this screen, the traditional data warehouse has multiple layers, Those are the data source layer, staging layer, warehouse layer, data mart layer, and client layer.
The data source layer brings the data from various sources such as Salesforce, CRM, Human Resources, etc. such data is stored in a traditional database or flat-file format, The ETL i.e. Extract transform, the load is implemented to pull data from source systems and push to staging layer. After performing cleansing standardization data is then loaded from the staging layer to the data warehouse. Along with Raw data, we also store metadata and a summary of data in the data warehouse. Finally, this data is published to the data mart. The data mart night be for sales, inventory, or for purchasing. On top of this layer data mart, the client layer will be present. The business users or business analysts will perform various operations in order to carry out in-depth data analysts, prepare the reports and perform data mining. All these users will connect to multiple data marts for their needs.
As we can see this traditional data warehouse model is complex and resource extensive traditional data ware model is designed considering the fact that it will deal with fixed resources. That was true earlier but with evolution technologies, social media, and advancement in the sector fixed resources design is no more relevant. We deal with a variety of data coming with different speeds and formats. Traditional data warehouses face challenges in managing all these aspects of data.
Another aspect of a traditional data warehouse is investment cost. we need to invest a big chunk of money in the early stages of data warehouse implement which is not the case with the cloud data warehouse.
Complex ETL pipelines are another drawback of the traditional data warehouse. As we can see we need to build multiple pipelines to push data from data source to staging layer, staging layer to the data warehouse, and data warehouse to data mart. Adding flexibility to this flow is a very challenging thing for the traditional data warehouse. Hence snowflake comes into the picture.
Snowflake Evolution:
We are going to see various things about snowflakes but before going to see features and the advancement in the snowflake, let's understand a few things related to snowflakes. currently, snowflake supports three cloud platforms. And those are amazon web services, google cloud platform and Microsoft azure snowflake supports various regions across the world and those are north America, Europe, Asia specific. Each of these regions is supported by respect cloud platforms i.e. Amazon web services, google cloud platform and Microsoft azure.
Let's have look at how snowflake evolved over the period of time snowflake was founded in the year 2012 and was published in Oct 2014. In the same year, it come with the Amazon S3 platform once it become more stable, it introduced Microsoft azure cloud in 2018, and in this year 2019 Google cloud platform was introduced. As we can see, within a short period of time this product has evolved a lot. It will support three major cloud service providers, i.e. Amazon, Google, and Microsoft. As it progresses, it will support many more cloud providers in the future. Snowflake was number one rank in cloud 100 in this year 2019. Snowflake is one of the leading tools in the cloud data warehouse market.
So, what makes snowflake a leading platform? The critical aspect about snowflake is it segregated storage and compute layer traditional data warehouse either support shared-nothing architecture or shared disk architecture. On other hard snowflake brought a hybrid approach on the table with benefits from both shared-nothing & shared disk approach. Apart from it, the snowflake is a pure software as a service, product .i.e users don't have to worry about software installation, administration, product upgrades, etc. It also supports ASSI SQL and ACID transactions. semi-structured data which is difficult to manage with a traditional data warehouse is easily managed & maintained using a snowflake cloud data warehouse. The Elastic storage of computing resources can be scaled independently and seamlessly. That's a very critical aspect brought by a snowflake. It is highly available and it is durable. And of course, it is cost-efficient snowflake is also working on improving cost efficiency furthermore. Last but not least snowflake is secure and comes with end-to-end encryption.
Because of all those features snowflake is the leader in the cloud data warehouse.
Are you looking for details virtual warehouse? Are you also interested in knowing what role virtual warehouse plays in snowflakes? If so, then you reached write place, in this article we will learn about virtual warehouses in snowflake in detail.
A) What is Virtual Warehouse?
Before knowing about virtual warehouse we need to know what is EC2? EC2 is also known as Elastic compute cloud which is a web service that provides secure, resizable compute capacity in the cloud. Now, we know what is EC2, Let's understand what is the virtual warehouse.
The virtual warehouse is an important layer in snowflake architecture and it consists of clusters of EC2 instances. The virtual warehouse is an abstraction by which each cluster is presented to a single user.
B) What is a worker node in the virtual warehouse?
As we know virtual warehouse consists of a cluster of EC2 instances. The individual EC2 instance is called a worker node that performs given tasks. End-user never interacts with worker node. When users perform any action which involves virtual warehouse processing during such time users do not know how many worker nodes are working on and how they are performing tasks in the warehouse.
C) What are the virtual warehouse sizes?
Virtual warehouses come with T-shirt sizes .currently available VW size are
i) x- Small vi ) 2x - Large
ii) Small vii) 3x - Large
iii) Medium viii) 4x - Large
iv) Large ix) 5x - Large
v) x - Large x) 6x - Large
D) Elasticity and Execution Engine
Virtual warehouse comes with two important concepts Elasticity an Execution Engine . The VWs are compute resources and these can be created , resized and destroyed at any point of time . This feature is Elasticity and has no effect on the state of persistant store or database .
The execution engine is implemented by a snowflake and it is a SQL execution engine. This engine is built based on the below features
Are you looking for an article that explains various processes in Informatica Master Data Management (MDM)? If so, then you reached the right place. In this article, we will learn about various processes through which records are loaded to the MDM system. Let's start.
Informatica MDM contains the various processes and those are -
Step 1: The land process transfers data from a source system via ETL jobs to
landing tables in the MDM ORS (Operational Reference Store).
Step 2: The stage process (Stage Job) reads the data from the landing table,
cleanses the data if applicable, and moves the cleansed data into a staging
table via mapping in HUB Console.
Step 3: The load process (Load Job) loads data from the staging table into the
corresponding base object in MDM ORS.
Step 4: The tokenize process (Tokenization Job) generates match tokens based on
match columns that are used subsequently by the match process to identify
candidate base object records for matching.
Step 5: The match process (Match Job) compares two records for points of
similarity. If sufficient points of similarity are found to indicate that the two
records are probably duplicates of each other, then Informatica MDM Hub flags
those records for consolidation.
Step 6: The consolidate process (Merge Job) merges duplicate records into a
single record after duplicate records have been identified in the match
process.
Step 7: Publish
or distribution process is the main outbound flow for Informatica MDM Hub.Hub
integrates with external systems or DB Schemas to share the consolidated
(Golden) Records.