DronaBlog

Wednesday, July 7, 2021

How to achieve better stage job performance in Informatica MDM

      Are you working on MDM and want to understand its different fields, then this is the right place in this article we are going to know about what are the product recommendation, thread setting properties, and database recommendations of MDM stage job performance


 A. Thread setting for MDM Stage Job Performance

       In this article, we will see what are the different reasons for issues and their solutions regarding stage job performance.

     1) Post/ Pre Stage UE

          The reason for this issue is it we are going to write the query regarding inserting or updating and record, it will create locks and because of this it slows down the performance to avoid this there is a need to put logger statement in UE  code. If doable rerun the jobs by removing the UE code.





     2) Cleanse Function 

            Each in every record will have to go through a cleansing process so here need to check that if any cleanse function is taking more time to process.

           Depends on which type of cleanse function you are using there is a need to check network latency also we need to check of IDQ  end.

     3) Directory 

            The main reason for the directory issue is having a shared directory within 2 different instances of the process server.

           To avoid file locks each process should have its own directory.

     4) Tables  

            The system will reduce the performance if the tables like RAW, REJ, C_REPOS_JOB_CONTROL table contain a huge amount of data.

           We can keep the important data and archive old data. 

   5) Log level

           We will have to keep the logging level to INFO.


B. Thread setting for MDM Stage Job Performance

          The configuration that can be updated :

       1) Threads for cleanse operation (HUB Console)

          Max value = ( Number of CPU cores -1 )

        2) In cmxcleanse properties

            (cmx.server.cleanse.min_size_for_distribution)

            Default size is 1000





        3) com.informatica.mdm.batchcontroller.Batchjob.min_rec_for_multithreading

            The default size is 1000. We can decrease the size if multithreading is enabled and no. of records are lesser than 1000.


C. DB Recommendations for MDM Stage Job Performance.

        1) collect AWR reports.

        2) For the DB performance, collect TESTIO results


D. Appserver Recommendation for MDM Stage Job Performance

        1) when we run the processing server and DB server jobs, check that the CPU is going high.

        2) Have to check basic java arguments such as - xmx value.

           


 

Monday, July 5, 2021

What is Business Entity Services in Informatica MDM?

    Are you looking for details about the Business Entity Services aka BES in the Informatica MDM? Are you also interested in knowing how to use Business Entity Services in your project? If so, then you reached the right place. In this article, we will explore more about Business Entity Services. Let's start.


A) What are Business Entity Services in MDM?

         Informatica MDM is used to master the data from various source systems. We onboard the data from sources to the MDM landing table. The data is moved to the staging table from the MDM landing table and then to the base object tables by executing stage and load jobs. Once we consolidate the data in the MDM, it will be ready to consume by consuming systems. Consuming can consume a golden copy of the records by consuming ETL or JMS Queue or Business Entity Services. Business Entity Services are web services that can be used for real-time integration of external systems with Informatica MDM.

B) What are operations can be performed using Business Entity Services?

      The various operations such as Read, Write, the search can be performed using Informatica BES.

      1) Read BE Business Entity Service-

          It is used to read the data from the base object. the result does not include soft-deleted records.

      2) Write BE Business Entity Service- 

            It is used to create, update and delete parent and child business entity elements. we cannot perform trust override with this service.

      3) Search BE Business Entity Service-

            It is used to root record in the business entity. We can use match operation while searching records to achieve fuzzy searches.

C) Understanding Business Entity Services Architecture 

        Business Entity Services is an integral part of Informatica MDM and it is available as wed service. The consuming system can write web service consumer and consume BES, for reading and write operation. Once the operation is invoked when the data is searched against the base object or updated against the base object.



D) What are the REST Methods Supported 

      Here is the list of HTTP methods supported in Business Entity Services

           GET - used to retrieve details about record, task, or a file 

           POST - used to create task or record 

           PUT -  used to update root, child records, or a file.

          PATCH - used to update task partially 

          DELETE - used to delete records or file


Learn more about Business Entity Services here -  



Sunday, June 27, 2021

What is Common Table Expression (CTE) in Snowflake?

    Are you working on Snowflake technology and would like to understand Common Table Expression also known as CTE? If so, then you reached the right place. In this article, we will understand what is Common Table Expression. We will also explore how to use CTE in queries.

A) What is Common Table Expression i.e CTE?

      A Common Table Expression (CTE) is a named subquery. It is defined in a WITH Clause. It is equivalent to a temporary view.

      The  CTE Contains an optional list of columns and a SELECT Statment. The output of CTE is a table with a column defined.





B) What is the syntax for CTE?

     The syntax for CTE is as below

      WITH

            TEMP_CTE (COL_1 ,COL_2) AS

              (SELECT col_nm1 , col_nm2 From TABLE)

     SELECT COL_1 , COL_2 FROM TEMP_CTE

C) What is recursive CTE?

      A recursive CTE is a CTE that references itself. In a recursive CTE, we can join Table to itself as many times as needed.


D) Benefits of CTE?

      1) CTE increases the modularity of the SQL programs

       2) CTE is helpful for simplified maintenance

       3) Recursive CTE can be used to process hierarchical data in the table.





E) Naming conventions for CTE?

      We need to avoid CTE names that match with database views or table the reason for this is - if we define query with CTE names then CTE takes precedence over table or view names which will produce unexpected results.

         For more details about snowflake refer to this video  



Tuesday, June 22, 2021

When Tokenization job gets executed in Informatica MDM?

Are you looking for the details about when the tokenization job executed in Information? Would you be interested in knowing the relationship between a match job and a tokenization job? If so, then you reached the right place. In this article we will explore the instances during which tokenization job gets triggered:

A. Why we need a tokenization process in MDM?
          Tokenization is a process that generates the token-based fuzzy match key and other match column configuration. The tokens generated are stored in the table <BO> STRP. This STRP table is used as input to match the process. The match process matches the records based on these tokens.





B. When the tokenization process executed
           The tokenization process gets triggered at various points and These are:
          1)  Manual tokenization execution
          2)  During the match process
          3)  During the load process

1) Manual tokenization job execution
       We can execute the tokenization job in Informatica MDM manually whenever it is needed. It is recommended that tokenization need to executed separately instead of in conjunction with load or match job.





2)  Tokenization process as part of match job
       If it is required to update the match token, the match process in Informatica automatically starts the match job's tokenization process. This scenario may happen if new records have been added or updated existing records.


3) Tokenization process during load job.
         If we enable the `Generate Match Tokens on Load' property on the base object then, when records loaded in the base object at that time the tokenization job will be triggered. It is not recommended to enable this property as it will adversely impact load job.






Learn more about the tokenization process here-


Friday, June 18, 2021

How to Use LIKE operator in Snowflake?

   Are you looking for details of what is collate in Snowflake? Are you also interested in knowing what is the purpose of collate and how to use it? If so, then you reached the right place. We will understand how to use collate to achieve LIKE functionality in Snowflake. In this article, we will more about collate or collation in Snowflake.




A) What is collate?

          The collate function in Snowflake allows specifying alternative rules for comparing strings.

B) What is the purpose of collate in Snowflake?

        The collate function in Snowflake is used to compare and sort the data. The comparison and sorting will be based on a particular language or other user-specified rules.

         The text strings in Snowflake are stored using UTF-8 character set. Comparing based on Unicode will not provide the desired output because of the following reasons :

          1. The special character in a language does no sort based on the language standards.

          2. In case we would like to achieve sorting based on special rules .e.g case insensitive sort.


C) What type of rules can be used with collate in Snowflake

     Here is a list of rules that can be used with collate 

     1. Different character sets for different language

     2. To achieve case insensitive comparisons 

    3.  Accent sensitivity e.g a,á,ä

    4 . Punctuation sensitivity e.g P-Q-R and PQR

    5. Sorting based on the first letter in the strings.

    6. Trimming leading and trailing spaces and then sorting

     7. Other options can be implemented based on business needs.





D) Where to use collate in Snowflake SQL?

       1. Simple comparison in where clause

              WHERE  FIELD1= FIELD2

        2. Join condition

              ON  EMP. EMP_NM =MANAGER.MNGR_NM

         3. Sorting condition

               ORDER BY FIELD 1

         4. Aggregation condition

                GROUP BY

          5. Aggregate functions

                  MAX ( FIELD1)

          6. Scaler functions

                 LEAST (FIELD1, FIELD2, FIELD3)

         7. Data clustering conditions

                CLUSTER BY (FIELD1)

     There are other several usages of collate in SQL, however above mentioned are commonly used.


E) How to use collate with LIKE operator

      here is an example of collate using like operation

     SELECT * FROM EMP WHERE COLLATE (NAME,") Like%ABC%


Learn more about snowflake here -



Wednesday, May 19, 2021

Things to consider while configuring snowflake account - part 2

In the previous article on Things to consider while configuring Snowflake Account - part I, we understood two important factors i.e Data retention and timezone. In this article, we will see security, cost savings, and connection performance. Let's start.






A] Security -

        The security requirement for Snowflake has two parts.

a) Storage integration

        We should not create external stages without storage integration. A Storage integration is secure means of creating connectivity between snowflake and cloud. storage provider. We need to set the below properties to `true ' so that we can prevent the exposure of access tokens or secret keys to snowflake users, the properties are 

  •  REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION 
  •  REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION

b) Network Policies.

        It is important to prevent access from unwanted networks to the snowflake account. The network policies help to define a list of valid network locations for user connection.

     We can configure account level and user-level network policies.

        NETWORK_POLICY is a parameter associated with network policies.





B] Connection Performance

      In order achieve better connection performance we need to set CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX to value `TRUE'. This helps to reduce the amount of information used for JDBC and ODBC connections.


C] Cost Savings.

      Cost  of Snowflake implementation is dependent upon two factors 1) Users  2) Warehouses

1) User: In the snowflake account there are system users and human users. Set different configurations for each of these. Below are helpful parameters for users - 

  •  LOCK_TIMEOUT
  •  STATEMENT_TIMEOUT_IN_SECONDS
  •  STATEMENT_QUEUED_TIMEOUT_IN_SECONDS


2) Warehouse: The optimum use of Snowflake warehouses can be achieved by resource monitoring, setting the appropriate warehouse size, and setting proper cluster count

      The mentioned below are parameters of Warehouse

  • RESOURCE MONITOR
  • AUTO_SUSPEND 
  • WAREHOUSE_ SIZE 
  •  MIN_CLUSTER_COUNT


 Learn more about snowflake here-



Informatica MDM match job and Tokenization job

       Are you interested in knowing what is the relationship between the Informatica MDM match and tokenization job? Would you also like to know what all conditions needed in order to tokenization job as part of match job? If so, then you reached the right place. In this article on Informatica MDM, we will learn match and tokenization job relationship.

A] What is the tokenization job in Informatica MDM?

           Tokenization is a process in Informatica during which match columns are used to generate tokens. These are based on the type of fuzzy match key column, another match column, and type of data present in the system 

          An example of the token is X01Z530K. It is an alphanumeric value for a given record.





B] What is a match job in Informatica MDM?

      By using the match process the record is matched in order to consolidate the record from various source systems. The match process uses tokens that are generated as part of the tokenization job.

       Once records are matched then those are stored in <BO Table>_MTCH table. The merge or consolidation process uses the MTCH table to consolidate the records.

C] Dose the match job always generate Match tokens?

        The answer is No. The match job will not always generate match tokens automatically. There various factors as part of a match job.





D]  What are factors are needed in order to generate tokens as part of the match job automatically?

         The following conditions are required to be there in order to generate a match token when we execute the match job.

     1. The base object should be fuzzy match enabled.

     2. The important ' Fuzzy Match Key ' column must be defined in match columns.

     3. Match ruleset should have at least on Fuzzy match rule.

     4. If generate Match Tokens job already executed then match job will not generate tokens.

     5. If the value of Dirty Indicator column value is 0 ( zero) for all the records then match job will not generate tokens

More details about the tokenization job are explained in the video -







Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...