DronaBlog
Tuesday, August 25, 2020
JBOSS 7.1.0 - Deployment failed with error - org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory
Thursday, August 20, 2020
Elastic Search - Types of Analyzer in the Elastic Search
Do you know how many types of analyzers available in the Elastic Search? Are you looking for the details about all the analyzers come with Elastic Search? If so, then you reached the right place. In this article, we will discuss the types of analyzes which are more commonly used in Elastic Search.
What is an Analyzer?
An analyzer is a package which contains three lower-level building blocks: character filters, tokenizers, and token filters which are used for analyzing the data
Types of Analyzer
Here is a list of analyzer which comes with Elastic Search-
- Standard Analyzer
- Simple Analyzer
- Whitespace Analyzer
- Stop Analyzer
- Keyword Analyzer
- Pattern Analyzer
- Language Analyzers
- Fingerprint Analyzer
Understanding Analyzers
- Standard Analyzer
The text gets divided into terms of word boundaries in a standard analyzer. The punctuations are removed and the upper case is converted into lowercase. It also supports removing stop words.
e.g
Input: "This is a sample example, for STANDARD-Aanlyzer"
Output:[this, is, a, sample, example, for, standard, analyzer]
- Simple Analyzer
With Simple Analyzer, the text is divided into separate terms whenever non-letter character appears. The non-letter character can be number, hyphens, and space, etc. The upper case characters are converted into lowercase.
Input: "My dog's name is Rocky-Hunter"
Output:[my, dog, s, name, is, rocky, hunter]
- Whitespace Analyzer
The input phrase is divided into terms based on whitespace. It does not lowercase terms.
Input: "Technology-World has articles on ElasticSearch and Artificial-Intelligence etc."
Output:[Technology-World, has, articles, on, ElasticSearch, and, Artificial-Intelligence, etc.]
- Stop Analyzer
A stop analyzer is a form of Simple Analyzer where the text is divided into separate terms whenever non-letter characters encountered. The non-letter character can be number, hyphens, and space, etc. Like Simple analyzer in Stop Analyzer, the upper case characters are converted into lowercase. Additionally, it removed the stop words. Assume that stop word file includes work 'the', 'is', 'of',
Input: "Gone with the wind is one of my favorite books."
Output:[Gone, with, wind, one, my, favorite, books]
- Keyword Analyzer
The input phrase is NOT divided into terms rather output phrase/token is the same as the input phrase.
Input: "Mount Everest is one of the worlds natural wonders"
Output:[Mount Everest is one of the worlds natural wonders]
- Pattern Analyzer
The regular expression is used in the pattern analyzer to split the text into terms. The default regular expression is \W+ which is nothing but all non-word characters. We need to remember that the regular expression is used as a term separator in the input phrase. The upper case characters are converted into lower case, also the stop words are removed.
Input: "My daughter's name is Rita and she is 7 years old"
Output:[my, daughter, s, name, is, Rita, and, she, is, 7, years, old]
- Language Analyzers
The language-specific such as English, French, Hindi are provided in the Elasticsearch.
Here is a sample keyword from the Hindi language analyzer.
e.g. "keywords": ["उदाहरण"]
- Fingerprint Analyzer
The fingerprint analyzer is used for duplicate detection. The input phrase is converted into lowercase, the extended characters are removed. The duplicate words are removed and a single toke is created. It also supports stop words.
Input: "á is a Spanish accents character"
Output:[a, accents, character, is, spanish]
Monday, August 10, 2020
Elastic Search Concepts - Cluster, Node, Index, Document, Shard and Replica
Are you looking for detailed information about the various concepts used in Elastic Search? Are you also interested in knowing what is Document, Shard, and Replica in Elastic Search? If so, then you reached the right place. In this article, we will understand all the important concepts which are more commonly used in the Elastic Search.
A. Elastic Search Cluster
The Cluster is a collection of nodes. It has a unique name. If we do not provide any name to the cluster then it defaults as elasticsearch. We can create clusters specific to each environment. for example, we can development cluster or QA clusters or production clusters. We can create clusters with more than one node, however, it is totally okay if we have just one node in a cluster. The cluster provides indexing and searchable capabilities across all the nodes. i.e. when we perform search or index a data we do not have worry about on which node the data is getting indexed or searched.
B. The node in Elastic Search
The node is a single server which is the part of the cluster that stores the data. Node has a unique name as like a cluster name. Node provides important capabilities such as search and index which is part of a cluster. An important thing to remember is the node names are in all lower case. We can create as many nodes as we want. There is no limit on it. If a cluster has more than one node than each node contains a subset of data.
C. Index
So, what is an index? As we know the nodes contain indices and an index is a collection of similar documents. for example, the document can be customer information or production information. In short for each type of document we create the index. The index name is in lowercase. The index name is used for indexing, searching, updating, deleting documents within an index. We can create n number of indices in a cluster.
D. Category or Type in Elastic Search
Inside each index, we have a type it is nothing but a category. We can create multiple categories such as Customer, Product, Vendor, Supplier, Broker, etc. Assume that our index name is the customer then we can create categories such as Individual, Organization, Self Proprietor, etc. Under each category, we can have document. The type has a name and associated with mapping. We create a separate mapping for each type of index. Here is some additional note about category or type. As we know Elastic search is built on Lucene and in Lucene there is no concept of type or category. The category is stored as _type in the metadata. while search document of a particular type, elastic search applies a filter on this field
E. Mapping in Elastic Search
The mapping describes fields and their types. e.g. data types such as string, integer, date, geo, etc. It also contains details about how each field will be indexed and stored. In many cases we don't have to create mapping explicitly, it is called dynamic mapping.
F. Document
The document is the base unit of information in the Elastic Search. The document contains fields with key/value pair values. The value can be of any data type such as string, date, integer which is defined in the mapping. It could be a single Customer or Product or vendor etc. The document is in JSON format and it physically resides in the index which we create. We can as many documents as we need in a given index.
G. Shard
The shard is a portion of that index. We can divide index into multiple pieces i.e. shards which will be helpful if we have large set data to store on the physical disk. If the physical disk does not have enough capacity then we can divide the index into multiple pieces. each shard is a fully functional index in its own. By default while creating an index we create five shards, however, we can configure as many shards as we need. In short, shards are created to achieve scalability.
H. Replica
The replica is a segment of an index or a copy of the shard. We never locate a replica on the same node where the primary shard is present so that when one node goes down, another node will be helpful for recovery. By default, while creating an index we create only one replica. Assume that we have two nodes, in that case, we will have five replica shards and five primary shards across two nodes. So replica's are helpful to achieve high availability. An important thing to note about replica is - Search queries can be executed on all replicas in parallel.
Wednesday, July 29, 2020
Informatica MDM - How to create Elastic Search certificate to access Elastic Search secure way
Step 1: Location of steps execution
We need to execute certificate generation commands from the location below. Hence go to this location
<MDM hub install directory>/hub/server/resources/certificates
Step 2: Execute the command below to convert Java Key Store (JKS) files to p12 file. P12 file contains a digital certificate with Public Key Cryptography Standard #12 encryption. P12 file is a portable format to transfer personal private keys and other sensitive information. This file will be used to access Elastic Search API such as GET, POST, PUT etc.
keytool -importkeystore -srckeystore MDM_ESCLIENT_FILE_JKS.keystore -srcstoretype jks -destkeystore MDM_ESCLIENT_FILE_JKS.keystore.p12 -deststoretype pkcs12 -alias esclient -destkeypass changeit
Here, changeit is a password.
Step 3: We need public key to create access to Elastic Search. In order to create a public key, we need to use P12 file which is created in Step 2. The public key will be used to encrypt the data before sending over the network. Execute the command below to generate a public key.
openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.key.pem -nocerts -nodes
Step 4: Certification creation is another important step. Before understanding why we need crt file, we need to know little about .pfx file. The .pfx file includes both the public and private keys for the given certificate. Normally used for TLS/SSL on web site. The .cer file only has the public key and used for verifying tokens or client authentication requests. To generate certificate run the command below-
openssl pkcs12 -in MDM_ESCLIENT_FILE_JKS.keystore.p12 -out file.crt.pem -clcerts -nokeys
Step 5: Execute the command below check Elastic Search accessible in a secure way. The command below will list all the indices present in the Elastic Search server.
curl -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/_cat/indices
Step 6: This step is optional but if you are looking for how to make POST or PUT call using curl command on Elastic Search server then this will be helpful.
First, prepare the request body and save it in the file. e.g. Create file Sample.txt. Add request below anything you want (a JSON message). A sample one is provided below:
{
"index.search.slowlog.threshold.query.debug": "-1ms",
"index.search.slowlog.level":"info"
}
Execute the command below using the Sample.txt file. Here we need to use the index name on which the PUT or POST request will be executed. e.g 43456-customer is an index name which you can get from step 5.
curl -d "@Sample.txt" -H "Content-Type: application/json" -X PUT -k -E ./file.crt.pem --key ./file.key.pem https://<Elastic Search Server host>:<Port>/43456-customer/_settings
Wednesday, July 22, 2020
Informationca MDM - Validation after install or upgrade
Components to validate
Here is a list of components we need to validate after installation, upgrade or after applying the patch to Informatica MDM:1. Informatica MDM Hub Validation
2. Informatica Data Director Validation
3. Provisioning Tool Validation
4. Active VOS Validations
1. Informatica MDM Hub Validation
iii) Verify that the cleanse functions are working fine. You can select the Cleanse Functions tool in the Model workbench and execute any cleanse function and make sure it is working properly.
viii) Verify the connectivity to process servers from the MDM hub by selecting the Process Server tool in the Utilities workbench and click the Test the connection.
ix) Verify that queries and packages are showing data in the view page
2. Informatica MDM- Data Director Validation
The validation below needs to be performed if you are using the Data Director with subject areas. you need to deploy the application before you begin the tests. Perform the following upgrade tests that apply to your environment:A) Validation of Data Director access-
i) Use the Data Director Configuration Manager URL and try to access it. then access the Informatica Data Director application using the username and password.
B) Validation of Informatica Data Director-
i) Create search query using fields from Subject Area and Subject Area child fields and make sure able to create, edit, and delete the queries.
ii) Run the queries to perform searches. Perform multiple searches to verify search functionality.
iii) Open searched the record and perform the update operation.
iv) Verify record creation process by creating a new record.
v) Verify History, Timeline sections are working fine
vi) Validate Matches section and try to add merge candidate and merge record
C) Validation of Tasks in Informatica Data Director -
i) Open task manager in IDD and verify all the tasks are listed.
ii) Verify the opening of tasks is working fine.
iii) Claim the task to make sure, claim action is working as expected.
iv) If it is an update task then update the record and make sure the task successfully completed.
v) If it is a merge task then merge the record and verify the task get cleared from the task list.
3. Provisioning Tool Validation
A) Validation Provisioning Tool Access-
i) Login to the Provisioning Tool using username and password.
B) Business Entity, Transformation, View verification, and Task Configuration
i) Verify that all the Business Entity are present in the provisioning tool
ii) Verify all the transformation between View to Business Entity and Business Entity to View as well as Business Entity to Business Entity
iii) Verify all the views
iv) Verify Task configuration such as Task Type, Task Triggers etc
C) Verify Elastic Search configuration
i) Verify Elastic Search server configuration under Infrastructure settings
ii) Verify all layout manager, application configuration
4. Active VOS Validation
Validate Active VOS for the items below:i) Verify status of Active VOS in Active VOS console
ii) Verify Identity Service connection from AVOS console
iii) Verify all the workflows are in a deployed status
iv) Verify all the task in running state
Friday, July 17, 2020
What is Build Match Group (BMG) in Informatica MDM?
What is the Build Match Group (BMG) Process?
How does the Build Match Group process removes the record?
When does the BMG process get execute?
What is impact of the BMG process on Manual match records?
How to enable the Base Object for the BMG process?
Thursday, July 9, 2020
Best Practices for Elastic Search in Informatica MDM
Introduction
Elastic Search Best Practices
3. Facets configuration
Facets are used for pre-emptive grouping of the records. We need to use a limited number of facet fields as it has an advance impact on the performance of search functionality. We also have to make sure the fields for which we need to configure facets are having low entropy. Low entropy fields have a low set of unique values.
4. Unused Business Entities
If there are unused Business Entities with searchable properties then delete those as it will cause performance issues for indexing and load jobs.
5. Index Auto commit property
We need to increase the value of the auto-commit property and keep it optimum based on your environment configuration. The property es.index.refresh.interval can be used to set it
Learn more about Informatica MDM here -
Understanding Survivorship in Informatica IDMC - Customer 360 SaaS
In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...
-
Are you working on a project where the oracle database is being used for implementation? Are you also facing an ORA-00604 and looking for f...
-
Are you looking for how to fix the error - "ORA-12801: error signaled in parallel query server P00D" in Oracle? Are you also inte...
-
Purpose of the CleansePut API: The CleansePut API is used to insert or update a record into a base or dependent child base object ...