DronaBlog

Monday, January 28, 2019

Important Informatica MDM Interview Questions and Answers - Part IV



Are you looking for the Informatica MDM interview questions and answers? Are you also looking for an explanation for various concepts in MDM? If yes, then refer to this article where we have explained various MDM concepts in the form of interview questions and answers. This article will be helpful for the Informatica MDM interview. In this article, we will focus on questions and answers about Cleanse function in the Informatica MDM.

Q1: What is the use of cleanse function in MDM?

Answer: Informatica MDM hub is used for data enrichment and consolidation. In order to perform data enrichment, it has to go through cleansing and standardization. Cleansing is a process through source data is cleansed for nuisance characters or words, invalid data or repeated words. Cleansing also helps to achieve standardization e.g. converting Limited, Ltd.,  Lmtd,  Lt to standard work LTD etc.
In Informatica MDM hub, cleansing is achieved during stage process while moving data from the landing table to the staging table. We need to install and configure cleanse engine before running stage job.



Q2: What are the cleanse functions you have used your projects so are?

Answer: This is one of the common questions get asked during the Informatica MDM interview. 
  1. In order to achieve data cleansing we normally use inbuild cleanse functions such as Concatenation,  Trim, Uppercase, regular expression.
  2. For complex operation where IF-ELSE conditions need to be handled then we use graph function. We also use Cleanse List function to achieve cleansing and standardization.
  3. There are several scenarios where inbuild cleanse function do not satisfy the business requirement in such case we build the custom Java Cleanse function. e.g. determining the length of String, Determine the index of the character in the given String.
The video below explains how to develop custom Java cleanse function.



Q3: How to read the database using cleanse function?

Answer: Read database function is used to perform the lookup and get values from the database table. While using read database cleanse function we need to connect to the database by passing table name and column name on which we need to perform the lookup. 

Normally, Read database cleanse function is used if we need to populate values in staging table by reading database table.




Q4: What is the Graph Cleanse function and how to create it?

Answer: MDM hub comes with various types of inbuilt cleanse function such as Data Conversion, General Processing, Geographic, Logic Functions, Math Functions, Misc Functions, Noise Functions, and String Functions. However, there are some business scenarios where these inbuilt functions do not meet the requirements. 

          In such cases, we can combine inbuilt cleanse function and create our own cleanse function. In order to create such a function, we need to use the Graph Cleanse Function. Using Graph Cleanse function we can achieve IF-ELSE or CASE statement scenarios. 


Q5: Have you created a custom Java Cleanse function? If yes, what was use case?

Answer: Informatica MDM hub comes with inbuilt cleanse function. We can build custom complex function by combining these inbuilt functions to cleanse and standardize the business data. There are some business cases where inbuilt cleanse function or custom complex cleanse function does not satisfy business needs.  In such cases, we need to create custom Java Cleanse functions. Informatica MDM provides Java framework to create custom Java cleanse function.

Business use case: Determining the geocode of the given address.
Assume that your business would like to determine the geocode of the given physical address. We have two options here:
a) Buy address doctor license from Informatica and populate co-ordinates for address
b) Build custom logic using Google Geocoding API (Free) with no extra license money

If we choose option b) where we no need to pay for determining Geocoding of address. In order to implement such custom logic, we need to write custom Java cleanse function.

The video below provides a detailed explanation about how to build custom Java Cleanse function.






Thursday, December 27, 2018

Important Informatica MDM Interview Questions and Answers - Part III




Are you preparing for Informatica MDM interview? And are you looking for the interview questions and answers about Informatica MDM? If yes, then refer to this article. In this article, we are going discuss various questions and their answers which are normally asked in Informatica MDM interview. You can also refer the previous article - Important Informatica MDM Interview Questions and Answers - Part II

Q 1: Suppose you are running stage job with delta detection enabled. While running stage job delta detection is successful but a stage job failed to insert the records in Stage table. How do you handle this issue?

Answer
This is scenario based question which can be asked by the interviewer to check knowledge of the candidate.
In the case of full data load if stage job failed to process records then we can handle this situation in two ways-
1. Truncate PRL and reload:

  • When we run stage job, the records from landing table get compared with _PRL table and delta is determined. 
  • If we re-run stage job after its failure then no delta will be determined as the _PRL table will be same as the landing table. 
  • To fix this we can truncate PRL table and re-run stage job. 
  • There will be more time required to run stage job as it is going to process whole data set. 
  • Only delta records will be updated or inserted as part of the load job.
2. Populate PRL table using _RAW table:

  • If we have enabled RAW retention then this approach will be an efficient approach.
  • First, we need to determine JOB_ROWID for the previous run using C_REPOS_JOB_CONTROL table.
  • Using JOB_ROWID we can pull all records from the _RAW table and insert into the _PRL table.
  • We need to re-run -stage job to process delta records.

The video below provides more insights about the stage and load jobs in Informatica MDM


Q 2: When PRL, OPL, RAW and REJ tables are created?

Answer:
When we configure the landing and staging tables the next is to create the mapping. Once mappings are created then Raw retention and delta detections properties get enabled. The mentioned below are the instances during which PRL, OPL, RAW and REJ tables are created.
a. _REJ table get created when we create the staging table
b. When we configure the staging table for Raw Retention, the _RAW table associated with the staging table is created.  
c. The _PRL and _OPL tables are created when we configure delta detection for the staging tables.




Q 3: What are the causes of record rejection?

Answer:
The _REJ table is associated with the staging table. e.g. If the staging table name is C_STG_CRM_PARTY then associated reject table name will be C_STG_CRM_PARTY_REJ.


Reason for Reject table creation:
1. The reject table is created to store rejected records during the stage job and the load job.
2. To increases performance by rejecting a record when it first encounters a reason to reject the record

Note: If there is more than one reason to reject a record, the reject table describes the first reason that  encounters.

There are several causes for the record to reject during MDM processes. The main reasons or causes for record rejections are as follows:

  • The value of PKEY_SRC_OBJECT column is null 
  • The duplicate value in PKEY_SRC_OBJECT column. One one record is processed successfully (One with highest SRC_ROWID). The other duplicate record/records are rejected
  • The value in the LAST_UPDATE_DATE column contains a future date or null date.
  • The value in the LAST_UPDATE_DATE column is less than 1900.
  • The unique column contains duplicate values.
  • The column HUB_STATE_IND contains values other than 1, -1, 0
  • The column contains invalid referential integrity value.



Q 4: When PRL, REJ, STG and RAW table get cleared/truncated?

Answer:
This is another interesting question interviewer may ask to check how extensive candidate has worked with Informatica MDM tool.

Not all the system tables in the Informatica MDM are truncated. Some of the system tables are truncated during specific processes.
a. The _PRL table gets truncated during each stage job run
b. The _REJ table never gets truncated during stage or load job. However, we can manually truncate it or we can use Clean  SIF API on Base Object table to clean or truncate REJ table.
c. The _STG table is truncated during each stage job
d. The _RAW table never gets truncated during stage or load job. However, if the retention period is complete then the unique records are kept in the _RAW table from stage job prior to the retention period. The remaining records are deleted from the _RAW table. The _RAW table also get truncated when we call Clean SIF API on Base Object Table.

Read More: Learn more about how to handle rejected records.

Q 5: Have you used any data quality tool along with Informatica MDM such as Informatica Data Quality?

Answer:
In some projects, Data Quality tools are used. It is not mandatory to have knowledge or work experience in Data Quality tool. However, having knowledge about Data Quality tool will make your career profile strong.

So if you have Data Quality experience then mention about it. e.g. You can mention that you used Data Quality to perform data analysis and come with data standardization rules for Party and Address data.




You can refer the video below learn more about Informatica Data Quality





Sunday, December 23, 2018

Important Informatica MDM Interview Questions and Answers - Part II

In this article, we will focus on interview questions related to Informatica MDM stage table and delta detection process. Are you interested to also know interview questions and answers about Hard Delete detection process? If yes, then refer to this article where we provide detailed questions and answers about Informatica MDM. Here is the link for Important Informatica MDM Interview Questions and Answers - Part I, in case you have not gone through it already.


Q 1: Where do you configure Audit Trail?

Answer:

The audit trail is used to maintain the history of source data. The history of the source data can be maintained for the specific number of runs or the specific number of job runs. The audit trail is configured at the Stage table level. Audit trail option gets enabled when we create the mapping between landing and staging table. Once Audit trail is configured _RAW table associated with the Stage is get created.

Read more: Click here to read more about Audit Trail and Delta Detection



Q 2: What is Hard Delete Detection?

Answer:
The hard delete detection (HDD) is used to determine records physically deleted from the source. There are two types of Hard Delete Detection in Informatica MDM -
a) Direct Delete
b) Consensus Delete

The details about how to configure Hard Delete Detection in Informatica MDM is explained in the video below -



Q 3: What is delta detection? How to enable delta detection?

Answer:
The delta detection is used determine new inserts and update in existing source record for full data load process. The delta detection happens for the specific column which we can configure at the Stage table level. The delta detection option gets enabled when we create the mapping between landing and staging table. In order to achieve delta detection data from the landing is compared with the _PRL table which is created at the time of delta detection configuration.

In the figure below, we can see data changes on day 1

The second figure below provides states of records in each landing, staging and PRL table due to delta detection process -





Q 4: What is the full data load and incremental data load?

Answer:
A) Full data load: In this case, the source sends full data files every day to load data in MDM. The new inserts and update to existing records will be determined in MDM as part of the delta detection process.

B) Incremental data load: In this case, the incremental file from source is loaded in MDM landing tables every day. The new inserts and update to existing records will be determined outside the MDM process. The MDM delta detection is not required.

Q 5: How to use delta detection with incremental data load?

Answer:
This is tricky question interviewer might ask in order to check whether the candidate really has real-time experience.

The answer to this question is - The delta detection only works with full data load and not with incremental data load.

You can learn more about Informatica MDM here.





                                              






Wednesday, December 19, 2018

Important Informatica MDM Interview Questions and Answers - Part I


Are you preparing for Informatica Master Data Management (MDM) interview? Are you also planning to learn MDM concepts? Would you like to know how to prepare for Informatica MDM interview? If yes, then refer to this article which provides detailed information about questions asked during MDM interview. This article also provides details about the reason behind asking the interview questions. Good luck to your interview!

Q 1: Explain your Informatica MDM experience related to MDM Hub configuration, User Exits, IDD and SIF.

Answer
As the start of the interview, the interviewer may like to know more about your experience and will ask this question. This common question normally asked in every MDM interview.

You can start with explaining, how started your MDM career and then provide experience in each of MDM components such as MDM hub configuration, User Exits, IDD and SIF. If you do not have experience in any of the module or if you have the basic idea about it then mention it accordingly. The sample answer is as below -
I have more than 5 years of Informatica MDM experience. I worked on configuring MDM hub for landing, staging and base object tables. I have a great experience in configuring stage table properties such as delta detection, base object properties. I worked on the configuration of the match and merge rules. I worked extensively on the match and merge job tuning. I have worked on Informatica Data Director configuration tool to create IDD app for data stewards. I also have Core Java knowledge using which developed IDD and MDM hub User Exits to achieve business requirements. In these User Exits, I have used SIF API to connect MDM hub and fetch as well update records in the MDM tables.
Important!  The interviewer may ask questions based on your answer to this introductory question.

Q 2: How many sources were present in your last project and what are those?

Answer:
This question is normally followed with several questions which depend on the number of sources configured. So provide the number of source systems which you configured in the project. Also, provide the name of source systems and what kind of data contributed by each source system. The source system names such as SALES, CRM, HCM etc.



Q 3: How many landing, staging and BO tables were present in your last project?

Answer:
In order to answer this question, you can provide below details -
The number of landing, staging and BO tables depends on
a) Data model design
b) Number of Source systems configured
You can also mention the number of staging tables multiple of the source system. e.g. if the number of BO tables are 10 and the number of source systems are 3 then the total number of staging tables are  = 10 * 3 = 30.

So, if the number of Source systems = 3
The number of landing tables configured = 10
The number of BO tables configured = 10
The number of Staging tables configured = 30 ( 10 * 3)

Learn more: About the landing and staging tables.

Q 4: What are the processes involved in the Informatica MDM?

Answer:
Informatica MDM involves the various processes to process data from sources. The processes involved in Informatica MDM are
a. Landing: The data is pulled from the source system and pushed in the MDM landing tables.
b. Staging: The landing table data is standardized, cleansed and pushed to the MDM Staging tables.
c. Load: The data from the staging table is loaded to BO table.
d. Tokenization: If we configure fuzzy match rules then in order to generate match tokens, the tokenization process is used.
e. Match: The match process is used to match the records
f. Merge or Consolidation: The matched records are consolidated during merge process.

Read more: Click here to learn about Batch Groups in Informatica MDM




Q 5: What is the stage process and what is its significance?

Answer:
The stage process transfers source data to the staging table. 
  • The job uses stage mapping between the landing table and the staging table. 
  • The data standardization and cleansing is performed during the stage process. 
  • If required database lookup can be achieved during stage job.

You can learn more about stage and load jobs here:









Tuesday, December 18, 2018

Important Python Interview Questions and Answers - Part IV


Are you preparing for Python interview? Are you looking for information about data comparison and data modifier functions commonly used in Python language? Are you also interested to learn Python concepts? If yes, then refer to this article which is helpful for interview preparation as well for learning Python concepts. I would also recommend reading the previous article on Important Python Interview Questions and Answers - Part III

Q 1: What is Python?

Answer:
Python is a programming language with below features -
  • High-level
  • Interpreted
  • Interactive 
  • Object-oriented 
  • Used for Scripting
  • Highly readable. 
Python is simple in nature as it uses English keywords with fewer syntactical constructions.

Q 2: What is the Dictionary data type in the Python language?

Answer
  • The dictionary data type defines one-to-one relationship between keys and values. 
  • It contains the pair of keys and associated values
For example, 
region_country = {  
                               'NA':'CANADA',
                               'LA':'PERU',
                               'APAC':'INDIA'
                             }

In the example above, the region value such as NA, LA, and APAC are keys in the dictionary. The country values such as CANADA, PERU, and INDIA are values associated with keys in the dictionary.



Q 3: How to compare two lists with each other? Is there any way to determine the length of a list?

Answer:
A List or an Array in Python is data storing structure. Using a list we can store values during program execution. We can read, update or append values to the List structure. There are several utility functionalities comes with List. These functionalities are helpful during Python programming.

a) Assume that we have two lists as below:
country_list = ["US", "INDIA", "UK"]
region_list = ["NA", "LA", "APAC", "EMEA"]

We can compare elements of two lists by using cmp function as 

cmp(country_list , region_list )

b) The size or length of a list helps in many scenarios such as iterating over list value based on the size of the list. Python List comes with len function which gives the total length of the list.

len(country_list )






Q 4: How will you convert the String value to an Object, a Tuple and a List? How will you convert the object value to the String value?

Answer:
During programming for an application using any technology, we need to interact with many interfaces which work on different data types. There are some cases we need to manipulate data which requires different data type. So converting the String to other data type is not an exception for it. So Python provides the function to achieve it.

a) Convert an object to a String
Use str function to convert an object value to the String value as 
str(x) 

b) Convert a String to an object
Use eval function to convert the String value to an Object value as 
eval(str) 

c) Convert a string to a tuple
Use tuple function to convert the String value to the Tuple as 
tuple(s)

d) Convert a string to a list
Use list function to convert the String value to the List as 
list(s) 

Q 5: What is the use of 'is' operator in Python language?

Answer:
The operator 'is' is used to evaluate
a) To true if the variables on either side of the operator are the same object 
b)To false if the variables on either side of the operator are the different object  

e.g. x is y

Here, 'is' results in 1 if  object (x) equals object (y).




In order to learn the Python language, you need to install PyCharm tool. The video below explains how to install PyCharm Tool.






                                      



Monday, December 17, 2018

Unix Interview Questions and Answers - Part I


Are you preparing for Unix interview? Unix is major operating systems on which enterprise applications run. Because of it, the interviewer asks Unix related questions during interviews of technologies such as Java, Hadoop, and Python etc. Are you planning to give an interview for support project for which Unix knowledge is required? If yes, then refer to this article as it provides detailed questions and answers about Unix.

Q 1: What is the process in Unix and What are the types of process exists in Unix?

Answer
A process is an instance of a program running in an Operating System. Normally, the process is started when a program is initiated.

Type of Processes: There are two types of the process
a) Foreground Processes
b) Background Processes


Q 2: What is the command to list the directories and the files?

Answer:
This is a very basic question asked during an interview to understand whether the candidate has basic knowledge of Unix.

The command below is used to list the directories and the files in current location -
$ ls -ltr

ls : To list files
ls -l : To list files with additional information

Every file line begins with d, -, or l. These characters indicate the type of the file which is shown on the screen as result.

e.g
drwrrwxr-x 3 abc abc 16   Jan 10 2018 temp_dir
-rw-rw-r-- 2 abc abc 4028 Mar 5  17:12 test.sh

Learn more: Click here to learn more about ls command.

Q 3: What are the differences between the Zombie and Orphan processes?

Answer: The differences between the Zombie and the Orphan processes are listed as below -

Zombie Process
Orphan Process
Zombie processes are those processes which are killed or completed execution but still show an entry in the process table.
A child process which remains running itself even after its parent process is completed or terminated is called as an orphan process.
Zombies only occupy space in the process table
Orphan Process uses memory resources
If the Zombies that exist for more time, then it indicates an issue in the parent program
The orphan process is get created unknowingly and unintentionally due to process crash
The zombie process shows the process with a Z state
The orphan processes are terminated


ReadClick here, to know more details about the Zombie, Orphan and the Daemon processes.


Q 4: How to run the process in background and foreground?

Answer: Follow the below steps to run a process in
A. Background:
To run any process in the background, we need to '&' sign at the end of the command. By using '&' sign we are telling Unix system that runs the given process in the background so that we can continue to use the command prompt. 

The example of running the process as the background process is as -

sh run_calculate_interest.sh &

B. Foreground:
If any process is running background and you would like to bring it to foreground then use the steps below -
a)  Get job id of the process
$ jobs

The output will look like as 
[1]   7095 Running                 run_calculate_interest&
[2]   7206 Running                 run_app &

b) Use jobid and run it in the foreground as
$ fg 7095

Learn more: Unix tutorial 




Q 5: Assume that one of the processes is running more than 24 hrs. How you are going to identify such process and remove or kill it?

Answer: Sometimes, interviewer tries to create the complex scenario for the simple answer. So do not get confused with the description of the question. The underline principle will remain the same to kill the process. By asking this question, an interviewer is trying to check whether how extensive you worked on Unix system.

To identify long running processes we can use commands such as 'Top' or 'jobs'. However, you can also us 'ps -ef'  command to know running process.

The above command will provide the process ids (or Job ids) for running processes.

Use below command to kill the process-
$ kill -9 PID

Here, PID is process id.









Thursday, December 13, 2018

What are the features of Python Programming

Are you interested to know what are the features of the Python language? Would you like to start programming in Python and would be interested in knowing more about Python? If so, then you reached the right place. This article provides detailed information about the advantages of Python Programming.

Python Advantages:

  • Easy to learn
  • Object Oriented Programm
  • Portable (Cross-platform)
  • Expressive (Understandable and Readable)
  • Interpreted
  • GUI Programming
  • Free and Open source

1. Easy to learn:

Python is easy to learn compared to other programming languages such as Java, .NET or C++.  Python programming syntax is simple and easy to read and write. The concepts such as function, structures, expression can be easily understood by the beginner with ease. No exhaustive list of software is required to learn and work on Python programming. A simple editor such as notepad or development IDE such as PyCharm is enough to start with Python. If you are first time learning any programming language then Python is the best language to start with. Along with its simplicity, Python is also a very strong and robust language which can be used to developing enterprise applications.

2. Object Oriented Programming

Python is Object Oriented language. i.e. all features of Object Oriented programming such as inheritance, polymorphism, abstraction, and polymorphism are supported in Python. In Python class acts as the blueprint and it will be model for the objects. Every real-time entity such as human being, trees, non-living things can be represented as the object.
e.g. In the example below, the class for 'Cat' is presented using Python language.

class Cat:

    def __init__(self, cat_name, cat_age):  
        self.cat_name cat_name
        self.cat_age cat_age



3. Portable (Cross-Platform)

Python is a portable language. It is also known as cross-platform language. It means we can run the same program on different Operating Systems such as Windows or Mac or Unix as long as we have Python interpreter installed on these operating systems. We do not need to write program specific to any operating system. However, we need to keep in mind that there are some features which are dependent on the Operating System. Such features need to be handled properly if your program is going to run on multiple systems. 

4. Understandable and Readable

The breadth of ideas that can be represented and communicated using the Python language is quite better than other programming languages. There are some features in the Python language using which we can build the business functionalities extensively. These extensive features may not feasible in other languages. Programming in the Python language is easily understandable and readable which help us to build better and extensive programming functionalities. Hence, the Python language is expressive and it is one of the great features.

5. Interpreted language

Python is interpreted language. The source code in Python is executed line by line, unlike Java language where the complete code is compiled first and then it is executed. Python using an interpreter to interpret Python code. The interpreter converts Python source code to bytecode which operating system can understand and execute step by step.



6. GUI Programming

Python provides GUI programming framework by using which we can develop user interfaces. The framework TkInter can be used to build the user interface for Python. 

7. Free and Open Source

Python software is open source and it is freely available. No additional cost is required for the licensing. Python software and custom modules are freely available for download over the internet. The Python module communities provide a good number of Python useful utilities in the form of Python modules.




The video below provides detailed information about Python language -







Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...