DronaBlog

Friday, July 17, 2020

What is Build Match Group (BMG) in Informatica MDM?

Are you looking for details about the Build Match Group (BMG) process which is used in Informatica MDM? Are you also would like to know when the BMG process gets executed? Would you be interested in knowing how to control this behavior? If so, then you reached the right place. In this article, we will discuss the BMG process in detail.



What is the Build Match Group (BMG) Process?

The process by which redundant matching records are removed from the match set prior to the consolidation process is called the Build Match Group (BMG) process. It is a very important process for the matching process and plays vital role in Informatica MDM jobs.

How does the Build Match Group process removes the record?

Let's assume that the BMG indicator is on then in such a case if we run a match job then it will remove one of the symmetric matches from the manual match pairs.
e.g.
Let's consider the records below
Pair 1: 'Bob Paul' is matched with 'Robert Paul' with match rule number 3
Pair 2: 'Robert Paul' is matched with 'Bob Paul' with match rule number 5

As we know that the automerge_ind is set 1 for the matching pairs if records matched through auto-merge rules. The BMG process will trigger if all the records are matched with manual match rule then the BMG process will take effect. However, few records matched with the auto-merge rule, and few records matched with manual merge rule than one of the symmetric match entries will be removed from the match table.

When does the BMG process get execute?

There are two jobs during which the BMG process executed. 
1. During Match Job: BMG process get triggered during match process if we enable 'BMG on match indicator' property.
2. During Merge Job: BMG process always gets executed during the merge job. There is no option to turn ON and OFF during the merge job.

What is impact of the BMG process on Manual match records?

There is no impact due to the BMG process on manually matched records. BMG process only applicable for auto-merge jobs i.e. AUTOMERGE_IND is 1 in <BASE_OBJECT>_MTCH table and we also need to enable Base Object for BMG process.




How to enable the Base Object for the BMG process?

In order to enable the Base Object for the BMG process, we need to update the C_REPOS_TABLE table for the BMG_ON_MATCH_IND field. If value of BMG_ON_MATCH_IND is 1 then BMG is ON, if the value is 0 then BMG is OFF for the given table.

Here is sample sql statement to update this field-
update C_REPOS_TABLE set BMG_ON_MATCH_IND=1 where table_name='<TABLE_NAME>'

Important note: Restart the application server with clearing the cache after making the above change.






Thursday, July 9, 2020

Best Practices for Elastic Search in Informatica MDM

Elastic Search a search engine that is based on the Lucene library is used in the Informatica MDM in order to achieve free text searches like google as well as a fuzzy search like match engine search. In this article, we will understand what are the best practices which we need to follow in order to implement Elastic Search using the Informatica MDM solution successfully.



Introduction

It is vital to follow best practices while integrating Elastic Search with Informatica MDM. Some minor configuration may lead to expensive performance cost. The best practices provided here helps not only to achieve better performance but also for better search results.

Elastic Search Best Practices

Here are the details about the Best Practices

1. Indexing Job Execution
If we enable searchable properties for Base Object tables including lookup table then we need to run indexing job for lookup table first then followed by indexing job on remaining Base Object table. 

2. Indexing Job execution for all tables
If we have configured Searchable property for parent and child tables e.g. Party table, Party Phone table, etc. Then we need to run an indexing job for all the tables. First, run the indexing job for Party table and then run jobs for child tables

3. Facets configuration
Facets are used for pre-emptive grouping of the records. We need to use a limited number of facet fields as it has an advance impact on the performance of search functionality. We also have to make sure the fields for which we need to configure facets are having low entropy. Low entropy fields have a low set of unique values.

4. Unused Business Entities
If there are unused Business Entities with searchable properties then delete those as it will cause performance issues for indexing and load jobs.

5. Index Auto commit property
We need to increase the value of the auto-commit property and keep it optimum based on your environment configuration. The property es.index.refresh.interval can be used to set it

6. Indexing jobs in parallel
We should try to avoid running indexing jobs in parallel as that may cause resource exhaustion. 

7. Running load jobs in parallel
If we have configured searchable on multiple tables such as Party and Address tables then do not run load jobs for these tables in parallel. This is because during load job indexing job get executed and may lead to resource exhaustion scenario and job will fail.

8. Deleting indexes
The CleanTable API will not delete the indexes, we need to manually delete it if required. However, in case you still would like to delete the indexes then we need to use the curl command to execute Elastic Search APIs to delete those. As of now, there is no Informatica API to handle this use case.

9. Limiting the number of searchable fields for Business Entities
We have limitations on how many searchable fields we should use for the Elastic Search document. By default 50 number of nested fields are allowed in Elastic Search. Apart from it, there is a limit on the amount of data is required for Elastic Search REST calls. The limit is 104857600. So make sure less number of searchable columns are configured for the Business Entities.



Learn more about Informatica MDM here -




Wednesday, July 8, 2020

What are the Types of Data Model?


Are you looking for an article which explains different types of Data Model? Are you also interested in knowing what data model to choose for business? If so, then you reached the right place. In this article, we will understand the different types of data models that are used in Enterprise applications.



What is the Data Model?

The process in which the data is analyzed and defined based on the requirements in order to support the business processes is called Data Modeling and the output of data modeling process is a data model.

What are the Types of Data Model?

There are three types of data models. The list of data models is:

  1. Conceptual Data Models: It is a high-level data model and used for static business structures and concepts
  2. Logical Data Models:  This data model elaborates entity types, data attributes, and relationships between entities
  3. Physical Data Models: This type of data model provides detailed information such as data type, length, relationships, referential integrity etch.


Selection of data model-

Each data model has its own purpose. Depending on the purpose we need to decide which model is suitable for business use cases.

For example, if we are planning to explain to business users how business attributes are structured in the application then the Logical Data Model will be a better choice.

If we are planning to use integrate multiple systems and would like to explain how systems work as a whole then Conceptual Data Model will be helpful as it gives a high-level view of the underlying system.

If the use case is to develop a database based on data model then we need detailed information about each and every component of the data model. In this case, the Physical Data Model will be the best suited. Normally, developers use the Physical Data Model to configure database objects.




Detailed information is explained in the video below:


Tuesday, July 7, 2020

How to setup Task Reminder with Popup using Java in Windows System

In this article, we will see how to set up a Task reminder with Popup message in the Windows system. We need to have some basic understanding of Java and Windows shell scripting.





Step 1: Setup up Java project

Use any IDE e.g. Eclipse to create a Java project. We need below external jar files. Download the jar files below and set up in the classpath:



Step 2: Write Java class

Use below code snippet to read an excel file and show popup message using Jpanel class.


import java.awt.Graphics;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;

import javax.swing.JFrame;
import javax.swing.JPanel;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class ReadFile extends JPanel {
public void paint(Graphics g){
ReadFile rc = new ReadFile(); // object of the class
// reading the value of 2nd row and 2nd column
ArrayList<String> vOutput = rc.ReadCellData(2, 5);
int k = 10;
if (vOutput != null) {
if (vOutput.size() <=0) {
g.drawString("No reminder today! Enjoy!!!", 10, k);
} else {
for (int i = 0; i < vOutput.size() ; i++) {
g.drawString(vOutput.get(i), 10, k);
k = k + 20;
}
}
}
}

public static void main(String[] args) {
JFrame frame= new JFrame("Customer Reminder!");
frame.getContentPane().add(new ReadFile());
frame.setSize(300, 300);
frame.setVisible(true);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.setResizable(false);
}

// method defined for reading a cell
public ArrayList<String> ReadCellData(int vRow, int vColumn) {
ArrayList<String> al = new ArrayList<>();
Workbook wb = null; // initialize Workbook null
try {
// reading data from a file in the form of bytes
FileInputStream fis = new FileInputStream("D:\\sample.xlsx");
// constructs an XSSFWorkbook object, by buffering the whole stream into the
// memory
wb = new XSSFWorkbook(fis);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
Sheet sheet = wb.getSheetAt(0); // getting the XSSFSheet object at given index
int j = 0;
for (int i = 1; i < 1000; i++) {
Row row = sheet.getRow(i); // returns the logical row
if (row != null) {
Cell cell = row.getCell(7); // getting the cell representing the given column
if (cell != null) {
Date dueDate = cell.getDateCellValue();
DateFormat dateFormat = new SimpleDateFormat("MM/dd/yyyy");
Date currentDate=java.util.Calendar.getInstance().getTime();

if (dueDate != null && (dateFormat.format(dueDate)).equals(dateFormat.format(currentDate))) {
Cell customerName = row.getCell(6);
String custName = null;
if (customerName != null) {
custName = customerName.getStringCellValue();
}
Cell phoneNum = row.getCell(8);
String phone = null;
if (phoneNum != null) {
phone = phoneNum.getStringCellValue();
}
if (custName != null) {
al.add("Call: [ " + custName + " : " + phone + "] ");
}
}
}
}
}
return al; // returns the cell value
}

}

Step 3: Create a runnable jar file

Use Eclipse -> Export option to create Runnable jar file and name it as reminder_project.jar (You can provide any name)




Step 4: Create a batch file

Create CMD file with the content below

java -jar D:\reminder_project.jar

Step 4: Setup Task using Task Scheduler in Windows system


Use Task Scheduler in the Windows system. Provide the name of the task, Trigger details (daily, weekly), and Actions.




Provide CMD file name in the schedular





Tuesday, June 23, 2020

Important Regular Expression (RegEx) for daily Use

In this article, we will explain how to use Regular Expressions (Reg Ex) for daily use. We will be using  RegEx with multiple tools.

1. Regular Expressions in Notepad++

A. Case 1: Removing the last comma in the document
e.g.  Abc,,Pqr
XYZ,ABC,
DWD, XXX,

Regex: ,(?! )$   (This will detect the last comma in each statement)

B. Cae2: Remove duplicate values 
Regex: ^(.*?)$\s+?^(?=.*^\1$)
To remove duplicate lines just press Ctrl + F and provide RegEx mentioned above.

Monday, June 22, 2020

What is the future of Master Data Management?


At present, Master Data management (MDM) has become the core project of any organization. The various industries such as banking, healthcare, insurance, telecommunication, manufacturing, and logistics, etc realized that with the implementation of MDM, businesses can achieve better growth in the competitive market. In this article, we will explore the future of Master Data Management. So let's start.



A. MDM with Cloud Solution

The MDM vendors such as Informatica, Reltio, IBM provides cloud solutions. However, the companies who are using these solutions are criticizing about growing cost of cloud and control aspect of it. The initial cost of the cloud solution implementation is less compared to in house MDM implementation. As data is a growing asset and it leads to more usage over time. Cloud solution cost is directly proportional to usage and hence cost cloud MDM solution increases drastically over the years.  The infrastructure is owned and managed by the product vendor and we need to rely on the vendor for infrastructure issues. These issues are not limited to quarterly or monthly upgrades, server maintenance, emergency bug fixes, server crashes, major product releases, etc.

Even though with having these concerns, companies are still moving forward to use cloud MDM solutions and the reason is the cloud solution provides more sustainability. With recent pandemic, it is proved that businesses with cloud implementation survive better than in house solutions. There is no doubt, cloud solutions will be used by all the applications in the near future.

B. Artificial Intelligence and MDM

Artificial Intelligence (AI) is a buzz word in the current market. The MDM solution which has AI components will have better survivorship compared to one which does not. With recent releases, Informatica MDM has used AI features for small components in the data steward user interfaces. This tells us that the MDM solution components have started looking AI aspect more seriously. Many business intelligence applications are used to capture, store, access, and analyze data to assist business users in making better decisions. AI with business intelligence will create another world and MDM will be part of it.

There is a great scope for improvement in MDM solutions. AI can be used in extracting and transporting data from source to landing area and from landing area to MDM system. This will reduce the development, testing, performance tuning and deployment time. The cleansing and standardization heavily rely on manual configurations. If AI is leveraged then this manual effort can be reduced to a great extend. Another aspect where AI can be used is customer matching. Currently, many vendors use their proprietary match engine to identify and match customer records. Identifying and matching is an iterative process that takes a long time spanning from few months to few years, in some cases it is a never-ending process. If AI is used to identify and match the records then it will help business users as well as stakeholders to achieve their business goals.



C. Smart MDM and User Interface

The user interfaces (UI) used with the MDM solutions are developed with the technologies which are more stable. This is because of new features and smartness comes with newer versions and is hardly replicated in these interfaces. In many cases, we have noticed that the decade-old source code has never been touched in the MDM user interface. Most of the programming languages such as HTML5, JavaScript, Spring, Java, Python, R2, etc are evolving with great space. The future will not be far when these technologies will be self-improving the use of better infrastructure and intelligence. If these user interfaces needed to be survived in the global market then these need to use smartness in the applications. The end users are capable of handling these advanced features in doing daily routine work.  

The end goal of these smart features is to make end users experience not better but the best. The main challenge in the current environment is these user interfaces are not self-explanatory. We have to spend much of the time in training business users. UI can be improved to accept voice and touch commands and in some cases, UI should be smart enough to take its own decisions. This way it will improve productivity and ultimately the profitability.

D. Quicker and Simpler

With the development of data processing technologies, we are able to achieve better improvement in the data processing. However, we see that it take a day to a month to perform initial data load from the source system to MDM systems depending on the volume of the data. This is a situation while dealing with gigabytes or terabytes of data. What will happen if we need to handle exabyte, zettabyte, or yottabyte data in the future? We need to think through now itself about handling future growth of the data within the stipulated time. 30 days of time is going to cost heavily as the value of time is growing at a faster pace. The value of 1 hr from now will be higher in comparison with the value of 1 hr now.

Most of the underlying technologies such as databases, JVMs are not improving in faster processing than what expected. MDM is heaving dependent on these technologies. If underlying technologies improve over time then MDM solutions will be improved automatically else MDM vendors need to come with their own underlying technologies in order to sustain in the future.




E. Increase in Cost - Increase in Value

Due to advancements in technologies such as AI, Cloud computing, etc. the cost of the MDM solution will go high. As it will use extensive data and time for research and solutions. Having said that those increased costs can be explained by the increase in the value of the smart MDM solution.

With the smart MDM approach, we will be creating sustainable, profitable, and future proof solutions that will benefit end customers as well as businesses. The smart MDM is not far!






Tuesday, June 16, 2020

What are the differences between RAC and GRID?

Are you looking for the differences in RAC and GRID system in Oracle? Are you also would like to know how RAC and GRID systems related to each other? If yes, then you reached the right place. In this article, we will see what are the differences between RAC and Grid systems.


What is RAC system?

The database system comprises the configuration of multiple servers which are combined together with clustering software or program and accessing the shared disk storage structures is known as Oracle Real Application Clusters (RAC) system.




What is Grid System

The Grid system represents the pool of database hosts or servers, along with a pool of storage, and networks in an inter-related resource platform. Grid is used for effective workload management within the grid database.

What is the relationship between RAC and Grid systems?

The RAC is an integral part of Grid computing and RAC helps Grid computing for high availability information sharing.

What are differences between RAC and Grid?

Here is a list of the differences between RAC and Grid systems

Cluster
Grid
User management in Cluster is centralized
User management in the Grid is decentralized
In Cluster, the inter-operability is VIA and it is proprietary
No standard is developed for inter-operability
Ownership is singe in case of Cluster
The multiple ownership exits in Grid
Throughput is medium
Throughput is high
The guaranteed capacity
Capacity varies for grid implementation
Centralized resource management
Decentralized resource management
Can be used with commodity computer
Can be used with commodity and high end computers
The scheduling is centralized
The scheduling is decentralized
Single system image is possible
Single system image is not possible



Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...