Tuesday, August 25, 2020

JBOSS 7.1.0 - Deployment failed with error - org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory

While deploying the web service on Jboss 7.1.0 application server, the error - No implementation defined for org.apache.commons.logging.LogFactory was noticed.

After trying out multiple approaches, finally, the issue was resolved. Here are the steps performed to fix the issue

Resolution:

1. Removed old axis.jar and put axis-1.3.jar under WebConent/WEB-INF/lib location

2. Removed common-logging.jar file from WebConent/WEB-INF/lib location

3. Added jcl-over-slf4j-1.7.7.redhat-3.jar to WebConent/WEB-INF/lib location

Now web service is successfully deployed and able to access WSDL.

Error Details:

Here is error stack.

on Jboss 21:38:14,854 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-2) MSC000001: Failed to start service jboss.deployment.subunit."WSTestEAR.ear"."Sample.war".INSTALL: org.jboss.msc.service.StartException in service jboss.deployment.subunit."WSTestEAR.ear"."Sample.war".INSTALL: WFLYSRV0153: Failed to process phase INSTALL of subdeployment "Sample.war" of deployment "WSTestEAR.ear" at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:172) at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:2032) at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1955) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.jboss.msc.service.DuplicateServiceException: Service jboss.undertow.deployment.default-server.default-host./Sample.session is already registered at org.jboss.msc.service.ServiceRegistrationImpl.setInstance(ServiceRegistrationImpl.java:158) at org.jboss.msc.service.ServiceControllerImpl.startInstallation(ServiceControllerImpl.java:235) at org.jboss.msc.service.ServiceContainerImpl.install(ServiceContainerImpl.java:768) at org.jboss.msc.service.ServiceTargetImpl.install(ServiceTargetImpl.java:223) at org.jboss.msc.service.ServiceControllerImpl$ChildServiceTarget.install(ServiceControllerImpl.java:2555) at org.jboss.msc.service.ServiceTargetImpl.install(ServiceTargetImpl.java:223) at org.jboss.msc.service.ServiceControllerImpl$ChildServiceTarget.install(ServiceControllerImpl.java:2555) at org.jboss.msc.service.ServiceBuilderImpl.install(ServiceBuilderImpl.java:317) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.lambda$processDeployment$0(UndertowDeploymentProcessor.java:405) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.processDeployment(UndertowDeploymentProcessor.java:405) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.deploy(UndertowDeploymentProcessor.java:190) at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:165) ... 5 more 21:38:14,914 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 75) MSC000001: Failed to start service jboss.undertow.deployment.default-server.default-host./WSTest:

org.jboss.msc.service.StartException in service jboss.undertow.deployment.default-server.default-host./WSTest: java.lang.ExceptionInInitializerError at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:84) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.jboss.threads.JBossThread.run(JBossThread.java:320)

Caused by: java.lang.ExceptionInInitializerError at org.apache.axis.transport.http.AxisServlet.(AxisServlet.java:75) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.axis.transport.http.AxisServletBase.class$(AxisServletBase.java:59) at org.apache.axis.transport.http.AxisServletBase.(AxisServletBase.java:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.jboss.as.ee.component.ConstructorComponentFactory.create(ConstructorComponentFactory.java:24) at org.jboss.as.ee.component.ComponentInstantiatorInterceptor.processInvocation(ComponentInstantiatorInterceptor.java:67) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.AroundConstructInterceptorFactory$1.processInvocation(AroundConstructInterceptorFactory.java:26) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:60) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:161) at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:134) at org.jboss.as.ee.component.BasicComponent.createInstance(BasicComponent.java:88) at org.jboss.as.ee.component.ComponentRegistry$ComponentManagedReferenceFactory.getReference(ComponentRegistry.java:149) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$6.createInstance(UndertowDeploymentInfoService.java:1221) at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:245) at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:133) at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:565) at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:536) at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42) at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43) at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:578) at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100) at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:81) ... 6 more Caused by:

org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory at org.apache.commons.discovery.tools.ClassUtils.verifyAncestory(ClassUtils.java:176) at org.apache.commons.discovery.tools.SPInterface.verifyAncestory(SPInterface.java:201) at org.apache.commons.discovery.tools.SPInterface.newInstance(SPInterface.java:195) at org.apache.commons.discovery.tools.DiscoverClass.newInstance(DiscoverClass.java:579) at org.apache.commons.discovery.tools.DiscoverSingleton.find(DiscoverSingleton.java:418) at org.apache.commons.discovery.tools.DiscoverSingleton.find(DiscoverSingleton.java:378) at org.apache.axis.components.logger.LogFactory$1.run(LogFactory.java:45) at java.security.AccessController.doPrivileged(Native Method) at org.apache.axis.components.logger.LogFactory.getLogFactory(LogFactory.java:41) at org.apache.axis.components.logger.LogFactory.(LogFactory.java:33) ... 45 more 21:38:14,918 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) WFLYCTL0013: Operation ("full-replace-deployment") failed - address: ([]) - failure description: { "WFLYCTL0080: Failed services" => { "jboss.deployment.subunit.\"WSTestEAR.ear\".\"Sample.war\".INSTALL" => "WFLYSRV0153: Failed to process phase INSTALL of subdeployment \"Sample.war\" of deployment \"WSTestEAR.ear\" Caused by: org.jboss.msc.service.DuplicateServiceException: Service jboss.undertow.deployment.default-server.default-host./Sample.session is already registered", "jboss.undertow.deployment.default-server.default-host./WSTest" => "java.lang.ExceptionInInitializerError Caused by: java.lang.ExceptionInInitializerError Caused by: org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory" },

Thursday, August 20, 2020

Elastic Search - Types of Analyzer in the Elastic Search

Do you know how many types of analyzers available in the Elastic Search? Are you looking for the details about all the analyzers come with Elastic Search? If so, then you reached the right place. In this article, we will discuss the types of analyzes which are more commonly used in Elastic Search.

What is an Analyzer?

An analyzer is a package which contains three lower-level building blocks: character filters, tokenizers, and token filters which are used for analyzing the data

Types of Analyzer

Here is a list of analyzer which comes with Elastic Search-

Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Fingerprint Analyzer

Understanding Analyzers

Standard Analyzer

The text gets divided into terms of word boundaries in a standard analyzer. The punctuations are removed and the upper case is converted into lowercase. It also supports removing stop words.

e.g

Input: "This is a sample example, for STANDARD-Aanlyzer"

Output:[this, is, a, sample, example, for, standard, analyzer]

Simple Analyzer

With Simple Analyzer, the text is divided into separate terms whenever non-letter character appears. The non-letter character can be number, hyphens, and space, etc. The upper case characters are converted into lowercase.

Input: "My dog's name is Rocky-Hunter"

Output:[my, dog, s, name, is, rocky, hunter]

Whitespace Analyzer

The input phrase is divided into terms based on whitespace. It does not lowercase terms.

Input: "Technology-World has articles on ElasticSearch and Artificial-Intelligence etc."

Output:[Technology-World, has, articles, on, ElasticSearch, and, Artificial-Intelligence, etc.]

Stop Analyzer

A stop analyzer is a form of Simple Analyzer where the text is divided into separate terms whenever non-letter characters encountered. The non-letter character can be number, hyphens, and space, etc. Like Simple analyzer in Stop Analyzer, the upper case characters are converted into lowercase. Additionally, it removed the stop words. Assume that stop word file includes work 'the', 'is', 'of',

Input: "Gone with the wind is one of my favorite books."

Output:[Gone, with, wind, one, my, favorite, books]

Keyword Analyzer

The input phrase is NOT divided into terms rather output phrase/token is the same as the input phrase.

Input: "Mount Everest is one of the worlds natural wonders"

Output:[Mount Everest is one of the worlds natural wonders]

Pattern Analyzer

The regular expression is used in the pattern analyzer to split the text into terms. The default regular expression is \W+ which is nothing but all non-word characters. We need to remember that the regular expression is used as a term separator in the input phrase. The upper case characters are converted into lower case, also the stop words are removed.

Input: "My daughter's name is Rita and she is 7 years old"

Output:[my, daughter, s, name, is, Rita, and, she, is, 7, years, old]

Language Analyzers

The language-specific such as English, French, Hindi are provided in the Elasticsearch.

Here is a sample keyword from the Hindi language analyzer.

e.g. "keywords": ["उदाहरण"]

Fingerprint Analyzer

The fingerprint analyzer is used for duplicate detection. The input phrase is converted into lowercase, the extended characters are removed. The duplicate words are removed and a single toke is created. It also supports stop words.

Input: "á is a Spanish accents character"

Output:[a, accents, character, is, spanish]

Learn more about Elastic Search here

Monday, August 10, 2020

Elastic Search Concepts - Cluster, Node, Index, Document, Shard and Replica

Are you looking for detailed information about the various concepts used in Elastic Search? Are you also interested in knowing what is Document, Shard, and Replica in Elastic Search? If so, then you reached the right place. In this article, we will understand all the important concepts which are more commonly used in the Elastic Search.

A. Elastic Search Cluster

The Cluster is a collection of nodes. It has a unique name. If we do not provide any name to the cluster then it defaults as elasticsearch. We can create clusters specific to each environment. for example, we can development cluster or QA clusters or production clusters. We can create clusters with more than one node, however, it is totally okay if we have just one node in a cluster. The cluster provides indexing and searchable capabilities across all the nodes. i.e. when we perform search or index a data we do not have worry about on which node the data is getting indexed or searched.

B. The node in Elastic Search

The node is a single server which is the part of the cluster that stores the data. Node has a unique name as like a cluster name. Node provides important capabilities such as search and index which is part of a cluster. An important thing to remember is the node names are in all lower case. We can create as many nodes as we want. There is no limit on it. If a cluster has more than one node than each node contains a subset of data.

C. Index

So, what is an index? As we know the nodes contain indices and an index is a collection of similar documents. for example, the document can be customer information or production information. In short for each type of document we create the index. The index name is in lowercase. The index name is used for indexing, searching, updating, deleting documents within an index. We can create n number of indices in a cluster.

D. Category or Type in Elastic Search

Inside each index, we have a type it is nothing but a category. We can create multiple categories such as Customer, Product, Vendor, Supplier, Broker, etc. Assume that our index name is the customer then we can create categories such as Individual, Organization, Self Proprietor, etc. Under each category, we can have document. The type has a name and associated with mapping. We create a separate mapping for each type of index. Here is some additional note about category or type. As we know Elastic search is built on Lucene and in Lucene there is no concept of type or category. The category is stored as _type in the metadata. while search document of a particular type, elastic search applies a filter on this field

E. Mapping in Elastic Search

The mapping describes fields and their types. e.g. data types such as string, integer, date, geo, etc. It also contains details about how each field will be indexed and stored. In many cases we don't have to create mapping explicitly, it is called dynamic mapping.

F. Document

The document is the base unit of information in the Elastic Search. The document contains fields with key/value pair values. The value can be of any data type such as string, date, integer which is defined in the mapping. It could be a single Customer or Product or vendor etc. The document is in JSON format and it physically resides in the index which we create. We can as many documents as we need in a given index.

G. Shard

The shard is a portion of that index. We can divide index into multiple pieces i.e. shards which will be helpful if we have large set data to store on the physical disk. If the physical disk does not have enough capacity then we can divide the index into multiple pieces. each shard is a fully functional index in its own. By default while creating an index we create five shards, however, we can configure as many shards as we need. In short, shards are created to achieve scalability.

H. Replica

The replica is a segment of an index or a copy of the shard. We never locate a replica on the same node where the primary shard is present so that when one node goes down, another node will be helpful for recovery. By default, while creating an index we create only one replica. Assume that we have two nodes, in that case, we will have five replica shards and five primary shards across two nodes. So replica's are helpful to achieve high availability. An important thing to note about replica is - Search queries can be executed on all replicas in parallel.

Wednesday, July 29, 2020

Informatica MDM - How to create Elastic Search certificate to access Elastic Search secure way

Are you trying to access Elastic Search API through the browser? Are you also planning to execute Elastic Search APIs using Postman or Soap UI? If yes, then you need to create a certificate in order to access Elastic Search API in a secure way. In this article, we will discuss what are the steps which need to be executed in order to generate the certificates.

Step 1: Location of steps execution
We need to execute certificate generation commands from the location below. Hence go to this location

<MDM hub install directory>/hub/server/resources/certificates

Step 2: Execute the command below to convert Java Key Store (JKS) files to p12 file. P12 file contains a digital certificate with Public Key Cryptography Standard #12 encryption. P12 file is a portable format to transfer personal private keys and other sensitive information. This file will be used to access Elastic Search API such as GET, POST, PUT etc.

keytool -importkeystore -srckeystore MDM_ESCLIENT_FILE_JKS.keystore -srcstoretype jks -destkeystore MDM_ESCLIENT_FILE_JKS.keystore.p12 -deststoretype pkcs12 -alias esclient -destkeypass changeit

Here, changeit is a password.

Step 3: We need public key to create access to Elastic Search. In order to create a public key, we need to use P12 file which is created in Step 2. The public key will be used to encrypt the data before sending over the network. Execute the command below to generate a public key.

openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.key.pem -nocerts -nodes

Step 4: Certification creation is another important step. Before understanding why we need crt file, we need to know little about .pfx file. The .pfx file includes both the public and private keys for the given certificate. Normally used for TLS/SSL on web site. The .cer file only has the public key and used for verifying tokens or client authentication requests. To generate certificate run the command below-

openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.crt.pem -clcerts -nokeys

Step 5: Execute the command below check Elastic Search accessible in a secure way. The command below will list all the indices present in the Elastic Search server.

curl -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/_cat/indices

Step 6: This step is optional but if you are looking for how to make POST or PUT call using curl command on Elastic Search server then this will be helpful.

First, prepare the request body and save it in the file. e.g. Create file Sample.txt. Add request below anything you want (a JSON message). A sample one is provided below:

{
"index.search.slowlog.threshold.query.debug": "-1ms",
"index.search.slowlog.level":"info"
}

Execute the command below using the Sample.txt file. Here we need to use the index name on which the PUT or POST request will be executed. e.g 43456-customer is an index name which you can get from step 5.

curl -d "@Sample.txt" -H "Content-Type: application/json" -X PUT -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/43456-customer/_settings

Step 7: If you are using a clustered environment and would like to check the status of the cluster then execute the command below -

curl -k -E ./file.crt.pem --key ./file.key.pem -XGET 'https://localhost:9200/_cluster/health?pretty'

Wednesday, July 22, 2020

Informationca MDM - Validation after install or upgrade

Are you looking for the details about the validation of Informatica MDM components after installation or upgrade or patch fix? Are you also looking for what functionalities we need to validate in IDD or Informatica MDM hub? If yes, then you reached the right place. In this article, we will understand component validation details. So let's start.

Components to validate

Here is a list of components we need to validate after installation, upgrade or after applying the patch to Informatica MDM:
1. Informatica MDM Hub Validation
2. Informatica Data Director Validation
3. Provisioning Tool Validation
4. Active VOS Validations

1. Informatica MDM Hub Validation

We need to perform the mentioned below validations in Informatica Hub Console after new install or upgrade:

A) Validation of MDM hub access-

i) Launch the Hub Console using URL and try to login with the user name and password.

B) Validation of MDM hub tool-

i) Verify all users are corrected created or migrated by using the Users tool in the Configuration workbench. Verify that the properties of the users are intact.

ii) Verify the data model by selecting the Schema Viewer tool in the Model workbench, and then connect to an Operational Reference Store.
iii) Verify that the cleanse functions are working fine. You can select the Cleanse Functions tool in the Model workbench and execute any cleanse function and make sure it is working properly.

iv) Verify Base Object tables, Staging tables, Relationships among the tables, Validation Rules (if exist), and the Match/Merge Setup for a base object.

v) Validate that record creation working as expected by creating a record using the Data Manager tool.

vi) Use the merge manager and merge some sample records to make sure merge processing is working as expected.

vii) Verify that jobs are running fine by running any sample batch job such as Stage job and make sure it executes successfully.
viii) Verify the connectivity to process servers from the MDM hub by selecting the Process Server tool in the Utilities workbench and click the Test the connection.
ix) Verify that queries and packages are showing data in the view page

2. Informatica MDM- Data Director Validation

The validation below needs to be performed if you are using the Data Director with subject areas. you need to deploy the application before you begin the tests. Perform the following upgrade tests that apply to your environment:

A) Validation of Data Director access-
i) Use the Data Director Configuration Manager URL and try to access it. then access the Informatica Data Director application using the username and password.

B) Validation of Informatica Data Director-
i) Create search query using fields from Subject Area and Subject Area child fields and make sure able to create, edit, and delete the queries.
ii) Run the queries to perform searches. Perform multiple searches to verify search functionality.
iii) Open searched the record and perform the update operation.
iv) Verify record creation process by creating a new record.
v) Verify History, Timeline sections are working fine
vi) Validate Matches section and try to add merge candidate and merge record

C) Validation of Tasks in Informatica Data Director -
i) Open task manager in IDD and verify all the tasks are listed.
ii) Verify the opening of tasks is working fine.
iii) Claim the task to make sure, claim action is working as expected.
iv) If it is an update task then update the record and make sure the task successfully completed.
v) If it is a merge task then merge the record and verify the task get cleared from the task list.

3. Provisioning Tool Validation

We need to perform the validation below for the Provisioning tool.

A) Validation Provisioning Tool Access-
i) Login to the Provisioning Tool using username and password.

B) Business Entity, Transformation, View verification, and Task Configuration
i) Verify that all the Business Entity are present in the provisioning tool
ii) Verify all the transformation between View to Business Entity and Business Entity to View as well as Business Entity to Business Entity
iii) Verify all the views
iv) Verify Task configuration such as Task Type, Task Triggers etc

C) Verify Elastic Search configuration
i) Verify Elastic Search server configuration under Infrastructure settings
ii) Verify all layout manager, application configuration

4. Active VOS Validation

Validate Active VOS for the items below:
i) Verify status of Active VOS in Active VOS console
ii) Verify Identity Service connection from AVOS console
iii) Verify all the workflows are in a deployed status
iv) Verify all the task in running state

Friday, July 17, 2020

What is Build Match Group (BMG) in Informatica MDM?

Are you looking for details about the Build Match Group (BMG) process which is used in Informatica MDM? Are you also would like to know when the BMG process gets executed? Would you be interested in knowing how to control this behavior? If so, then you reached the right place. In this article, we will discuss the BMG process in detail.

What is the Build Match Group (BMG) Process?

The process by which redundant matching records are removed from the match set prior to the consolidation process is called the Build Match Group (BMG) process. It is a very important process for the matching process and plays vital role in Informatica MDM jobs.

How does the Build Match Group process removes the record?

Let's assume that the BMG indicator is on then in such a case if we run a match job then it will remove one of the symmetric matches from the manual match pairs.

e.g.

Let's consider the records below

Pair 1: 'Bob Paul' is matched with 'Robert Paul' with match rule number 3

Pair 2: 'Robert Paul' is matched with 'Bob Paul' with match rule number 5

As we know that the automerge_ind is set 1 for the matching pairs if records matched through auto-merge rules. The BMG process will trigger if all the records are matched with manual match rule then the BMG process will take effect. However, few records matched with the auto-merge rule, and few records matched with manual merge rule than one of the symmetric match entries will be removed from the match table.

When does the BMG process get execute?

There are two jobs during which the BMG process executed.

1. During Match Job: BMG process get triggered during match process if we enable 'BMG on match indicator' property.

2. During Merge Job: BMG process always gets executed during the merge job. There is no option to turn ON and OFF during the merge job.

What is impact of the BMG process on Manual match records?

There is no impact due to the BMG process on manually matched records. BMG process only applicable for auto-merge jobs i.e. AUTOMERGE_IND is 1 in <BASE_OBJECT>_MTCH table and we also need to enable Base Object for BMG process.

How to enable the Base Object for the BMG process?

In order to enable the Base Object for the BMG process, we need to update the C_REPOS_TABLE table for the BMG_ON_MATCH_IND field. If value of BMG_ON_MATCH_IND is 1 then BMG is ON, if the value is 0 then BMG is OFF for the given table.

Here is sample sql statement to update this field-

update C_REPOS_TABLE set BMG_ON_MATCH_IND=1 where table_name='<TABLE_NAME>'

Important note: Restart the application server with clearing the cache after making the above change.

Thursday, July 9, 2020

Best Practices for Elastic Search in Informatica MDM

Elastic Search a search engine that is based on the Lucene library is used in the Informatica MDM in order to achieve free text searches like google as well as a fuzzy search like match engine search. In this article, we will understand what are the best practices which we need to follow in order to implement Elastic Search using the Informatica MDM solution successfully.

Introduction

It is vital to follow best practices while integrating Elastic Search with Informatica MDM. Some minor configuration may lead to expensive performance cost. The best practices provided here helps not only to achieve better performance but also for better search results.

Elastic Search Best Practices

Here are the details about the Best Practices

1. Indexing Job Execution

If we enable searchable properties for Base Object tables including lookup table then we need to run indexing job for lookup table first then followed by indexing job on remaining Base Object table.

2. Indexing Job execution for all tables

If we have configured Searchable property for parent and child tables e.g. Party table, Party Phone table, etc. Then we need to run an indexing job for all the tables. First, run the indexing job for Party table and then run jobs for child tables

3. Facets configuration
Facets are used for pre-emptive grouping of the records. We need to use a limited number of facet fields as it has an advance impact on the performance of search functionality. We also have to make sure the fields for which we need to configure facets are having low entropy. Low entropy fields have a low set of unique values.

4. Unused Business Entities
If there are unused Business Entities with searchable properties then delete those as it will cause performance issues for indexing and load jobs.

5. Index Auto commit property
We need to increase the value of the auto-commit property and keep it optimum based on your environment configuration. The property es.index.refresh.interval can be used to set it

6. Indexing jobs in parallel

We should try to avoid running indexing jobs in parallel as that may cause resource exhaustion.

7. Running load jobs in parallel

If we have configured searchable on multiple tables such as Party and Address tables then do not run load jobs for these tables in parallel. This is because during load job indexing job get executed and may lead to resource exhaustion scenario and job will fail.

8. Deleting indexes

The CleanTable API will not delete the indexes, we need to manually delete it if required. However, in case you still would like to delete the indexes then we need to use the curl command to execute Elastic Search APIs to delete those. As of now, there is no Informatica API to handle this use case.

9. Limiting the number of searchable fields for Business Entities

We have limitations on how many searchable fields we should use for the Elastic Search document. By default 50 number of nested fields are allowed in Elastic Search. Apart from it, there is a limit on the amount of data is required for Elastic Search REST calls. The limit is 104857600. So make sure less number of searchable columns are configured for the Business Entities.

Learn more about Informatica MDM here -

DronaBlog