DronaBlog

Monday, August 10, 2020

Elastic Search Concepts - Cluster, Node, Index, Document, Shard and Replica

 Are you looking for detailed information about the various concepts used in Elastic Search? Are you also interested in knowing what is Document, Shard, and Replica in Elastic Search? If so, then you reached the right place. In this article, we will understand all the important concepts which are more commonly used in the Elastic Search.


A. Elastic Search Cluster

The Cluster is a collection of nodes. It has a unique name. If we do not provide any name to the cluster then it defaults as elasticsearch. We can create clusters specific to each environment. for example, we can development cluster or QA clusters or production clusters. We can create clusters with more than one node, however, it is totally okay if we have just one node in a cluster. The cluster provides indexing and searchable capabilities across all the nodes. i.e. when we perform search or index a data we do not have worry about on which node the data is getting indexed or searched.


B. The node in Elastic Search

The node is a single server which is the part of the cluster that stores the data. Node has a unique name as like a cluster name. Node provides important capabilities such as search and index which is part of a cluster. An important thing to remember is the node names are in all lower case. We can create as many nodes as we want. There is no limit on it. If a cluster has more than one node than each node contains a subset of data.




C. Index

So, what is an index? As we know the nodes contain indices and an index is a collection of similar documents. for example, the document can be customer information or production information. In short for each type of document we create the index. The index name is in lowercase. The index name is used for indexing, searching, updating, deleting documents within an index. We can create n number of indices in a cluster.


D. Category or Type in Elastic Search

Inside each index, we have a type it is nothing but a category. We can create multiple categories such as Customer, Product, Vendor, Supplier, Broker, etc. Assume that our index name is the customer then we can create categories such as Individual, Organization, Self Proprietor, etc. Under each category, we can have document. The type has a name and associated with mapping. We create a separate mapping for each type of index. Here is some additional note about category or type. As we know Elastic search is built on Lucene and in Lucene there is no concept of type or category. The category is stored as _type in the metadata. while search document of a particular type, elastic search applies a filter on this field




E. Mapping in Elastic Search

The mapping describes fields and their types. e.g. data types such as string, integer, date, geo, etc. It also contains details about how each field will be indexed and stored. In many cases we don't have to create mapping explicitly, it is called dynamic mapping.


F. Document

The document is the base unit of information in the Elastic Search. The document contains fields with key/value pair values. The value can be of any data type such as string, date, integer which is defined in the mapping.   It could be a single Customer or Product or vendor etc. The document is in JSON format and it physically resides in the index which we create. We can as many documents as we need in a given index.


G. Shard

The shard is a portion of that index. We can divide index into multiple pieces i.e. shards which will be helpful if we have large set data to store on the physical disk. If the physical disk does not have enough capacity then we can divide the index into multiple pieces.  each shard is a fully functional index in its own. By default while creating an index we create five shards, however, we can configure as many shards as we need. In short, shards are created to achieve scalability.




H. Replica

The replica is a segment of an index or a copy of the shard. We never locate a replica on the same node where the primary shard is present so that when one node goes down, another node will be helpful for recovery. By default, while creating an index we create only one replica. Assume that we have two nodes, in that case, we will have five replica shards and five primary shards across two nodes. So replica's are helpful to achieve high availability. An important thing to note about replica is - Search queries can be executed on all replicas in parallel.



Wednesday, July 29, 2020

Informatica MDM - How to create Elastic Search certificate to access Elastic Search secure way

Are you trying to access Elastic Search API through the browser? Are you also planning to execute Elastic Search APIs using Postman or Soap UI? If yes, then you need to create a certificate in order to access Elastic Search API in a secure way. In this article, we will discuss what are the steps which need to be executed in order to generate the certificates.



Step 1: Location of steps execution
We need to execute certificate generation commands from the location below. Hence go to this location

 <MDM hub install directory>/hub/server/resources/certificates

Step 2: Execute the command below to convert Java Key Store (JKS) files to p12 file. P12 file contains a digital certificate with Public Key Cryptography Standard #12 encryption. P12 file is a portable format to transfer personal private keys and other sensitive information. This file will be used to access Elastic Search API such as GET, POST, PUT etc.

keytool -importkeystore -srckeystore MDM_ESCLIENT_FILE_JKS.keystore -srcstoretype jks -destkeystore MDM_ESCLIENT_FILE_JKS.keystore.p12 -deststoretype pkcs12 -alias esclient -destkeypass changeit

Here, changeit is a password.

Step 3: We need public key to create access to Elastic Search. In order to create a public key, we need to use P12 file which is created in Step 2. The public key will be used to encrypt the data before sending over the network. Execute the command below to generate a public key.

openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.key.pem -nocerts -nodes

Step 4: Certification creation is another important step. Before understanding why we need crt file, we need to know little about .pfx file. The .pfx  file includes both the public and private keys for the given certificate.  Normally used for TLS/SSL on web site. The .cer file only has the public key and used for verifying tokens or client authentication requests. To generate certificate run the command below-

openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.crt.pem -clcerts -nokeys

Step 5: Execute the command below check Elastic Search accessible in a secure way. The command below will list all the indices present in the Elastic Search server.

curl -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/_cat/indices

Step 6: This step is optional but if you are looking for how to make POST or PUT call using curl command on Elastic Search server then this will be helpful.

First, prepare the request body and save it in the file.  e.g. Create file Sample.txt. Add request below anything you want (a JSON message). A sample one is provided below:

{
   "index.search.slowlog.threshold.query.debug": "-1ms",
   "index.search.slowlog.level":"info"
}

Execute the command below using the Sample.txt file. Here we need to use the index name on which the PUT or POST request will be executed. e.g 43456-customer is an index name which you can get from step 5.

curl -d "@Sample.txt" -H "Content-Type: application/json" -X PUT  -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/43456-customer/_settings  


Step 7: If you are using a clustered environment and would like to check the status of the cluster then execute the command below -

curl -k -E ./file.crt.pem --key ./file.key.pem -XGET 'https://localhost:9200/_cluster/health?pretty'



Wednesday, July 22, 2020

Informationca MDM - Validation after install or upgrade

Are you looking for the details about the validation of Informatica MDM components after installation or upgrade or patch fix? Are you also looking for what functionalities we need to validate in IDD or Informatica MDM hub? If yes, then you reached the right place. In this article, we will understand component validation details. So let's start.



Components to validate

Here is a list of components we need to validate after installation, upgrade or after applying the patch to Informatica MDM:
1. Informatica MDM Hub Validation
2. Informatica Data Director Validation
3. Provisioning Tool Validation
4. Active VOS Validations

1. Informatica MDM Hub Validation

We need to perform the mentioned below validations in Informatica Hub Console after new install or upgrade:


    A)   Validation of MDM hub access-
          i) Launch the Hub Console using URL and try to login with the user name and password.

    B)   Validation of MDM hub tool-
          i) Verify all users are corrected created or migrated by using the Users tool in the Configuration workbench. Verify that the properties of the users are intact.
          ii) Verify the data model by selecting the Schema Viewer tool in the Model workbench, and then connect to an Operational Reference Store.
          iii) Verify that the cleanse functions are working fine. You can select the Cleanse Functions tool in the Model workbench and execute any cleanse function and make sure it is working properly.
          iv) Verify Base Object tables, Staging tables, Relationships among the tables, Validation Rules                 (if exist), and the Match/Merge Setup for a base object.
          v) Validate that record creation working as expected by creating a record using the Data Manager tool.
          vi) Use the merge manager and merge some sample records to make sure merge processing is working as expected.
          vii) Verify that jobs are running fine by running any sample batch job such as Stage job and make sure it executes successfully.
           viii) Verify the connectivity to process servers from the MDM hub by selecting the Process Server tool in the Utilities workbench and click the Test the connection.
           ix) Verify that queries and packages are showing data in the view page


.

2. Informatica MDM- Data Director Validation

The validation below needs to be performed if you are using the Data Director with subject areas.  you need to deploy the application before you begin the tests. Perform the following upgrade tests that apply to your environment:



    A)   Validation of Data Director access-
           i) Use the Data Director Configuration Manager URL and try to access it. then access the Informatica Data Director application using the username and password.

    B)   Validation of Informatica Data Director-
           i) Create search query using fields from Subject Area and Subject Area child fields and make sure able to create, edit, and delete the queries.
          ii) Run the queries to perform searches. Perform multiple searches to verify search functionality.
          iii) Open searched the record and perform the update operation.
          iv) Verify record creation process by creating a new record.
          v) Verify History, Timeline sections are working fine
          vi) Validate Matches section and try to add merge candidate and merge record

    C)   Validation of Tasks in Informatica Data Director -
         i) Open task manager in IDD and verify all the tasks are listed.
         ii) Verify the opening of tasks is working fine.
         iii) Claim the task to make sure, claim action is working as expected.
         iv) If it is an update task then update the record and make sure the task successfully completed.
         v) If it is a merge task then merge the record and verify the task get cleared from the task list.




3. Provisioning Tool Validation

We need to perform the validation below for the Provisioning tool.

A)   Validation Provisioning Tool Access-
           i) Login to the Provisioning Tool using username and password.

B)   Business Entity, Transformation, View verification, and Task Configuration
          i) Verify that all the Business Entity are present in the provisioning tool
          ii) Verify all the transformation between View to Business Entity and Business Entity to View as well as Business Entity to Business Entity
          iii) Verify all the views
          iv) Verify Task configuration such as Task Type, Task Triggers etc

C)   Verify Elastic Search configuration
        i) Verify Elastic Search server configuration under Infrastructure settings
        ii) Verify all layout manager, application configuration



4. Active VOS Validation

Validate Active VOS  for the items below:
        i) Verify status of Active VOS in Active VOS console
        ii) Verify Identity Service connection from AVOS console
        iii) Verify all the workflows are in a deployed status
        iv) Verify all the task in running state







Friday, July 17, 2020

What is Build Match Group (BMG) in Informatica MDM?

Are you looking for details about the Build Match Group (BMG) process which is used in Informatica MDM? Are you also would like to know when the BMG process gets executed? Would you be interested in knowing how to control this behavior? If so, then you reached the right place. In this article, we will discuss the BMG process in detail.



What is the Build Match Group (BMG) Process?

The process by which redundant matching records are removed from the match set prior to the consolidation process is called the Build Match Group (BMG) process. It is a very important process for the matching process and plays vital role in Informatica MDM jobs.

How does the Build Match Group process removes the record?

Let's assume that the BMG indicator is on then in such a case if we run a match job then it will remove one of the symmetric matches from the manual match pairs.
e.g.
Let's consider the records below
Pair 1: 'Bob Paul' is matched with 'Robert Paul' with match rule number 3
Pair 2: 'Robert Paul' is matched with 'Bob Paul' with match rule number 5

As we know that the automerge_ind is set 1 for the matching pairs if records matched through auto-merge rules. The BMG process will trigger if all the records are matched with manual match rule then the BMG process will take effect. However, few records matched with the auto-merge rule, and few records matched with manual merge rule than one of the symmetric match entries will be removed from the match table.

When does the BMG process get execute?

There are two jobs during which the BMG process executed. 
1. During Match Job: BMG process get triggered during match process if we enable 'BMG on match indicator' property.
2. During Merge Job: BMG process always gets executed during the merge job. There is no option to turn ON and OFF during the merge job.

What is impact of the BMG process on Manual match records?

There is no impact due to the BMG process on manually matched records. BMG process only applicable for auto-merge jobs i.e. AUTOMERGE_IND is 1 in <BASE_OBJECT>_MTCH table and we also need to enable Base Object for BMG process.




How to enable the Base Object for the BMG process?

In order to enable the Base Object for the BMG process, we need to update the C_REPOS_TABLE table for the BMG_ON_MATCH_IND field. If value of BMG_ON_MATCH_IND is 1 then BMG is ON, if the value is 0 then BMG is OFF for the given table.

Here is sample sql statement to update this field-
update C_REPOS_TABLE set BMG_ON_MATCH_IND=1 where table_name='<TABLE_NAME>'

Important note: Restart the application server with clearing the cache after making the above change.






Thursday, July 9, 2020

Best Practices for Elastic Search in Informatica MDM

Elastic Search a search engine that is based on the Lucene library is used in the Informatica MDM in order to achieve free text searches like google as well as a fuzzy search like match engine search. In this article, we will understand what are the best practices which we need to follow in order to implement Elastic Search using the Informatica MDM solution successfully.



Introduction

It is vital to follow best practices while integrating Elastic Search with Informatica MDM. Some minor configuration may lead to expensive performance cost. The best practices provided here helps not only to achieve better performance but also for better search results.

Elastic Search Best Practices

Here are the details about the Best Practices

1. Indexing Job Execution
If we enable searchable properties for Base Object tables including lookup table then we need to run indexing job for lookup table first then followed by indexing job on remaining Base Object table. 

2. Indexing Job execution for all tables
If we have configured Searchable property for parent and child tables e.g. Party table, Party Phone table, etc. Then we need to run an indexing job for all the tables. First, run the indexing job for Party table and then run jobs for child tables

3. Facets configuration
Facets are used for pre-emptive grouping of the records. We need to use a limited number of facet fields as it has an advance impact on the performance of search functionality. We also have to make sure the fields for which we need to configure facets are having low entropy. Low entropy fields have a low set of unique values.

4. Unused Business Entities
If there are unused Business Entities with searchable properties then delete those as it will cause performance issues for indexing and load jobs.

5. Index Auto commit property
We need to increase the value of the auto-commit property and keep it optimum based on your environment configuration. The property es.index.refresh.interval can be used to set it

6. Indexing jobs in parallel
We should try to avoid running indexing jobs in parallel as that may cause resource exhaustion. 

7. Running load jobs in parallel
If we have configured searchable on multiple tables such as Party and Address tables then do not run load jobs for these tables in parallel. This is because during load job indexing job get executed and may lead to resource exhaustion scenario and job will fail.

8. Deleting indexes
The CleanTable API will not delete the indexes, we need to manually delete it if required. However, in case you still would like to delete the indexes then we need to use the curl command to execute Elastic Search APIs to delete those. As of now, there is no Informatica API to handle this use case.

9. Limiting the number of searchable fields for Business Entities
We have limitations on how many searchable fields we should use for the Elastic Search document. By default 50 number of nested fields are allowed in Elastic Search. Apart from it, there is a limit on the amount of data is required for Elastic Search REST calls. The limit is 104857600. So make sure less number of searchable columns are configured for the Business Entities.



Learn more about Informatica MDM here -




Wednesday, July 8, 2020

What are the Types of Data Model?


Are you looking for an article which explains different types of Data Model? Are you also interested in knowing what data model to choose for business? If so, then you reached the right place. In this article, we will understand the different types of data models that are used in Enterprise applications.



What is the Data Model?

The process in which the data is analyzed and defined based on the requirements in order to support the business processes is called Data Modeling and the output of data modeling process is a data model.

What are the Types of Data Model?

There are three types of data models. The list of data models is:

  1. Conceptual Data Models: It is a high-level data model and used for static business structures and concepts
  2. Logical Data Models:  This data model elaborates entity types, data attributes, and relationships between entities
  3. Physical Data Models: This type of data model provides detailed information such as data type, length, relationships, referential integrity etch.


Selection of data model-

Each data model has its own purpose. Depending on the purpose we need to decide which model is suitable for business use cases.

For example, if we are planning to explain to business users how business attributes are structured in the application then the Logical Data Model will be a better choice.

If we are planning to use integrate multiple systems and would like to explain how systems work as a whole then Conceptual Data Model will be helpful as it gives a high-level view of the underlying system.

If the use case is to develop a database based on data model then we need detailed information about each and every component of the data model. In this case, the Physical Data Model will be the best suited. Normally, developers use the Physical Data Model to configure database objects.




Detailed information is explained in the video below:


Tuesday, July 7, 2020

How to setup Task Reminder with Popup using Java in Windows System

In this article, we will see how to set up a Task reminder with Popup message in the Windows system. We need to have some basic understanding of Java and Windows shell scripting.





Step 1: Setup up Java project

Use any IDE e.g. Eclipse to create a Java project. We need below external jar files. Download the jar files below and set up in the classpath:



Step 2: Write Java class

Use below code snippet to read an excel file and show popup message using Jpanel class.


import java.awt.Graphics;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;

import javax.swing.JFrame;
import javax.swing.JPanel;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class ReadFile extends JPanel {
public void paint(Graphics g){
ReadFile rc = new ReadFile(); // object of the class
// reading the value of 2nd row and 2nd column
ArrayList<String> vOutput = rc.ReadCellData(2, 5);
int k = 10;
if (vOutput != null) {
if (vOutput.size() <=0) {
g.drawString("No reminder today! Enjoy!!!", 10, k);
} else {
for (int i = 0; i < vOutput.size() ; i++) {
g.drawString(vOutput.get(i), 10, k);
k = k + 20;
}
}
}
}

public static void main(String[] args) {
JFrame frame= new JFrame("Customer Reminder!");
frame.getContentPane().add(new ReadFile());
frame.setSize(300, 300);
frame.setVisible(true);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.setResizable(false);
}

// method defined for reading a cell
public ArrayList<String> ReadCellData(int vRow, int vColumn) {
ArrayList<String> al = new ArrayList<>();
Workbook wb = null; // initialize Workbook null
try {
// reading data from a file in the form of bytes
FileInputStream fis = new FileInputStream("D:\\sample.xlsx");
// constructs an XSSFWorkbook object, by buffering the whole stream into the
// memory
wb = new XSSFWorkbook(fis);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
Sheet sheet = wb.getSheetAt(0); // getting the XSSFSheet object at given index
int j = 0;
for (int i = 1; i < 1000; i++) {
Row row = sheet.getRow(i); // returns the logical row
if (row != null) {
Cell cell = row.getCell(7); // getting the cell representing the given column
if (cell != null) {
Date dueDate = cell.getDateCellValue();
DateFormat dateFormat = new SimpleDateFormat("MM/dd/yyyy");
Date currentDate=java.util.Calendar.getInstance().getTime();

if (dueDate != null && (dateFormat.format(dueDate)).equals(dateFormat.format(currentDate))) {
Cell customerName = row.getCell(6);
String custName = null;
if (customerName != null) {
custName = customerName.getStringCellValue();
}
Cell phoneNum = row.getCell(8);
String phone = null;
if (phoneNum != null) {
phone = phoneNum.getStringCellValue();
}
if (custName != null) {
al.add("Call: [ " + custName + " : " + phone + "] ");
}
}
}
}
}
return al; // returns the cell value
}

}

Step 3: Create a runnable jar file

Use Eclipse -> Export option to create Runnable jar file and name it as reminder_project.jar (You can provide any name)




Step 4: Create a batch file

Create CMD file with the content below

java -jar D:\reminder_project.jar

Step 4: Setup Task using Task Scheduler in Windows system


Use Task Scheduler in the Windows system. Provide the name of the task, Trigger details (daily, weekly), and Actions.




Provide CMD file name in the schedular





Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...