DronaBlog

Thursday, December 3, 2020

How to prepare for MuleSoft Certified Developer Certification - Part I

 Are you preparing for MuleSoft Certified Developer Certification and looking for some guidelines and material about how to prepare then you have reached the right place. During preparing for MuleSoft Certified Developer Certification, I captured notes. I thought let me share everyone so that it will be beneficial to whoever preparing for the certification. This is the first part of the notes. You can visit the second part of the notes here -  How to prepare for MuleSoft Certified Developer Certification - Part II.






Item

Notes

Introducing application networks and API-led connectivity

1. Rate of change - the delivery gap has increased over time.

2. Central IT / Line of Business (LOB) IT - create reusable assets and make them discoverable and reusable

3. Modern API -  discoverable and accessible through self-service. Productized and designed for ease of consumption, secured, scalable and performance-oriented.

4. API-led connectivity

                a. Uses modern API

                b. Three layers - System APIs, Process APIs, Experience APIs

                c. Responsibility -

                                System APIs -> Central IT (Unlock assets and decentralize)

                                Process APIs -> LoB IT (Discover, reuse System API and compose)

                                Experience APIs -> Developers (Discover, self-serve, reuse, and consume process APIs)

                d. Advantages - reusable, agile, productive, better governance, speed within the same timeline

                e. Application network created using API-led connectivity is a bottom-up approach

5. Center for enablement (C4E) - cross-functional team called Center

                Responsibility - promoting consumption of assets in an organization

6. API - Application Programming Interface. It has Input, Output, Operation and Data Types

                Normally referred -> as API Specifications, a web Service (Implementation), an API Proxy (Controls access to web service, restrict access and usage through API Gateway)

7. Web Service - Method of communication between two software

                i) It has three meanings 

                        a. Web Service API  (Define how to interact with Web Service)

                        b. Web Service Interface (Provide structure) 

                        c. Web Service Implementation (Actual code)

                ii) Types - SOAP Based Web Service, RESTful Web Service

                iii) REST Web Service methods - GET, POST, PUT, DELETE etc.

8. RESTful web service response with status code.

                Status codes: 200 - OK (GET, DELETE, PATCH, PUT), 201 - Created (POST), 304 - Not modified (PATCH PUT), 400 - Bad request (All), 401 - Unauthorized (All), 404 - Resource not found (All), 500 - Server error (All)

9. API Development lifecycle - a) API specification (design), simulation (create prototype and make available to consumer) , validation (output - API specification/contract)

10. System - MuleSoft API-Led connectivity layer is intended to expose part of the backend without business logic.

11. Mulesoft is an application network is used - to create reusable APIs and assets designed to be consumed by other business units.

12. Center for Enablement - creates and manages discoverable assets to be consumed byline of business developers

13. Modern API - is designed first using an API Specification for rapid feedback

14. 'PUT' HTTP method in RESTful web service is used to replace an existing resource.

 

Introducing Anypoint Platform

1. Anypoint Platform - design, build, deploy and manage

2. Major components:

Design center - (Rapid development) Design API

Exchange - (Collaboration) Discoverable, accessible through self-service

Management center - (Visibility and control) Security, scalability, performance

3. Anypoint platform is used by

                Specialist, Admin, Ops, DevOps, Ad-hoc integrators, App developers

4. Supported platforms

                On-Premises, Private Cloud, Cloud Service Providers, Hosted By Mulesoft (CloudHub), Hybrid

5. Benefits of API-led connectivity

                Speedy delivery, actionable visibility, secure, future proof, intentional self-service

6. API Specification phase tools - API Designer, API Console and mocking service, Exchange, API Portal, API notebook --> output - Validated API Specification in RAML

7. Build or Implementation Phase tools -> Anypoint Studio, Munit

8. API Management Phase tools - API Manager, API Analytics, Runtime Manager, Visualizer

9. Troubleshooting and scaling - Runtime manager, API manager

10. Design center - To create Integration applications, API Specification, and API Fragments

                Flow designer - Web app to connect systems and consume APIs

                API Designer - Web app to design, document, mocking APIs

                Anypoint Studio - IDE to implement APIs and Build integration applications

11. Mule Applications can be created using Flow Designer or Anypoint Studio or writing code (XML)

                Mule Runtime environment decouples point-to-point integration. It also enforces policies for API governance

12. Mule applications accept and process a Mule event through multiple Mule event processors. All these plugged together in a flow.

                Flow is the only thing is executed in the Mule application.

                Flow has three areas - Source, Process area, Error handling

13. Mule cloudhub worker - is a dedicated instance of a mule which runs a single application

14: Mule event is the data structure has below components

                Mules Message

                                Attributes - metadata (headers  and parameters)

                                Payload - actual data

                Variables - declared using processors within the flow

15. Flow designer is used to design and develop a fully functional Mule application in a hosted environment

16. Deployed flow designer application run in CloudHub worker

17. Anypoint exchange is used to publish, share and search APIs

18. Using the design center we cannot create API Portals

Designing APIs

1. API Design approaches - Hand Coding, Apiary (API Blueprint), Swagger (Open API Specification), RAML

2. RAML used to auto-generate documentation, mock endpoints, create interfaces for API Specification

3. RAML Contains nodes and facets

                Resources are nodes. Start with /

                facets  are special configurations applied to resources

4. RAML code can be modularized using

                Data Types, examples, traits, resource types, overlays, extensions, security, schemas, documentation, annotations and libraries

5. Fragments can be stored

                In files and folders within a project

                In a separate API fragment project in the Design center

                In a separate RAML fragment in Exchange

6. As an anonymous user, we can make calls to an API instance that uses the mocking service but not managed APIs.

7. In order make API discoverable we need to publish it to Anypoint Exchange

Building APIs

1. Mule event source initiates the execution of the flow

2. Mule event processors transform, filter, enrich and process the event data

3. Variables which are part of Mule event are referenced by processors

4. Mule flow contains - Source, Process, and Error Handling

                Source - optional

                Process - required

                Error handling - optional

5. Default data responded in java format. Transform component is used to convert java to JSON format using DataWave

6. A RESTful interface for an application will have listeners for each resource method

7. We can create the interface either manually or generated from API definition

8. APIKit is the open-source toolkit comes with Anypoint studio and used to generate interface based on the RAML API definition.

                Generates main routing flow and flows for each API resource

                The generated interface can be hooked implementation logic

                APIKit creates a separate flow for each HTTP method

                APIkit router is used to validates requests against RAML API Specification and routes to API implementation

9. Anypoint platform uses GIT for version control which internally uses pull,    push, and merges operations for code edits.








Tuesday, December 1, 2020

Infomatica MDM - MDM Installation Topology

 Are you planning to install the Informatica MDM hub in the Development or Production environment? And looking for the details about the best possible way to make use of your infrastructure? If so, then you have reached the right place. In this article, we will explore different Informatica MDM installation topologies






Introduction

Basically, there are three types of topologies recommended by Informatica. We can use one of them while installing the Informatica MDM hub, based on project needs and benefits which we are looking for. Here is a list of recommended topologies

a. MDM Topology for Clusters

b. No Cluster - No High Availability

c. No Cluster - High Availability


A. MDM Topology for Clusters

In this type of topology Hub Server and Process server resides in a different machine and these are clustered together.


Characteristics:




B. No Cluster - No High Availability

In this topology, Hub server and Process servers are not clustered, hence we will not achieve high availability.



Characteristics:



C. No Cluster - High Availability

In this type of topology, Hub Server and Process are not clustered, however an external load balancer can be used to make the MDM system highly available.




Characteristics:







The detailed information types of MDM styles are provided here -















Friday, November 13, 2020

Informatica MDM - How to fix an error - ORA-01555: snapshot too old?

While working on Informatica MDM jobs, I came across one issue. The issue is an ORA-01555: snapshot too old. This error message was reported while running the tokenization job. If you are also noticing a similar issue then this article will help. This article provides details about an error message and a solution to fix it.






Error Message:

The detailed error message is as below -

java.sql.BatchUpdateException: ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11_2399779032$" too small

SIP-16084: Error occurred while verifying the need to tokenize records. Return code 12801, 

Error SQLException During VerifyNeedToStrip :ORA-12801: error signaled in parallel query server P000,
ORA-01555: snapshot too old: rollback segment number 30 with name "_SYSSMU30_2998435469$" too small.
 at com.siperian.common.SipRuntimeException.createNotExternalized(SipRuntimeException.java:74)
 at com.delos.cmx.server.interact.caller.InteractCleanseClient.executeGenerateMatchTokens(InteractCleanseClient.java:460)


Solution:

To fix the issue perform the steps below -

A. Database Issue
First, analyze if there is any database issue going on. If the database looks good then perform the steps below

Step 1: Stop any job running as 'INCOMPLETE'
Step 2: Stop the Application server
Step 3: Verify undo_retention value by running the query below on db side
           
             show parameter undo_retention;

Step 4: If the value is lower then increase the value to 4000 by executing the command below

            ALTER SYSTEM SET UNDO_RETENTION = 4000;

Step 5: Increase Undo Tablespace to Auto Extended on.

Step 6: Restart the database servers with the clear cache

Step 7: Drop T$ table if present any in database

Step 8: Start the MDM servers with a clear cache.






This will fix the issue. I hope this is helpful. You can learn more about job tuning here -





Wednesday, November 11, 2020

How to fix Error - SIP-52054: Failed to create collection name for orsId


Are you looking for how to fix an error : SIP-52054: Failed to create collection name for orsId  in the MDM hub? Are also interested in knowing what is the root cause of this error? If so, then you reached the right place. In this article, we will focus on Elastic Search error in Informatica Master Data Management (MDM).






Error Message:

If you are running any soap request against Business Entity services which internally uses Elastic Search then may encounter an error below :

SIP-52054: Failed to create collection name for orsId


Detailed Error stack:


[ERROR] com.informatica.mdm.cs.server.CompositeServiceInvoker: SIP-52054: Failed to create collection name for orsId [mdmsbx-CMX_ORS] because of error: Connection refused.
com.informatica.mdm.spi.cs.StepException: SIP-52054: Failed to create collection name for orsId [CMX_ORS] because of error: Connection refused.
 at com.informatica.mdm.cs.steps.SearchCO.invoke(SearchCO.java:337)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.executeStep(CompositeServiceInvoker.java:426)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.processService(CompositeServiceInvoker.java:308)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.executeService(CompositeServiceInvoker.java:385)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.processService(CompositeServiceInvoker.java:312)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.process(CompositeServiceInvoker.java:187)
 at com.informatica.mdm.cs.server.CompositeServiceInvoker.invoke(CompositeServiceInvoker.java:118)
 at com.informatica.mdm.cs.server.ejb.CompositeServiceEjbBean.doProcess(CompositeServiceEjbBean.java:53)
 at com.informatica.mdm.cs.server.ejb.CompositeServiceEjbBean.process(CompositeServiceEjbBean.java:37)


How to fix this issue?

In order to fix this issue, perform the steps below -

1. Verify the MDM hub is accessible. Also, verify the connection to the process server from MDM hub -> Utilities -> Process Server

2. Verify Elastic Search server is working fine

3. If the above two steps look good then make sure the Elastic Search is properly configured in the Provisioning tool.

The location is : Provisioning Tool -> Configuration -> Infrastructure Settings  -> ESCluster

Here make sure the server name is properly configured for Elastic Search






Root cause:

The error 'SIP-52054: Failed to create collection name for orsId ' normally occurs when the Elastic Search server tries to make a connection to the MDM hub. If there is a mismatch in server ion in the Provisioning tool then we get this error.


Learn more about the provisioning tool here -



Tuesday, August 25, 2020

JBOSS 7.1.0 - Deployment failed with error - org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory

While deploying the web service on Jboss 7.1.0 application server, the error -  No implementation defined for org.apache.commons.logging.LogFactory was noticed. 

After trying out multiple approaches, finally, the issue was resolved. Here are the steps performed to fix the issue

Resolution:

1. Removed old axis.jar and put axis-1.3.jar under WebConent/WEB-INF/lib location
2. Removed common-logging.jar file from WebConent/WEB-INF/lib location
3. Added jcl-over-slf4j-1.7.7.redhat-3.jar to WebConent/WEB-INF/lib location

Now web service is successfully deployed and able to access WSDL.



Error Details:

Here is error stack.

on Jboss 21:38:14,854 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-2) MSC000001: Failed to start service jboss.deployment.subunit."WSTestEAR.ear"."Sample.war".INSTALL: org.jboss.msc.service.StartException in service jboss.deployment.subunit."WSTestEAR.ear"."Sample.war".INSTALL: WFLYSRV0153: Failed to process phase INSTALL of subdeployment "Sample.war" of deployment "WSTestEAR.ear" at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:172) at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:2032) at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1955) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.jboss.msc.service.DuplicateServiceException: Service jboss.undertow.deployment.default-server.default-host./Sample.session is already registered at org.jboss.msc.service.ServiceRegistrationImpl.setInstance(ServiceRegistrationImpl.java:158) at org.jboss.msc.service.ServiceControllerImpl.startInstallation(ServiceControllerImpl.java:235) at org.jboss.msc.service.ServiceContainerImpl.install(ServiceContainerImpl.java:768) at org.jboss.msc.service.ServiceTargetImpl.install(ServiceTargetImpl.java:223) at org.jboss.msc.service.ServiceControllerImpl$ChildServiceTarget.install(ServiceControllerImpl.java:2555) at org.jboss.msc.service.ServiceTargetImpl.install(ServiceTargetImpl.java:223) at org.jboss.msc.service.ServiceControllerImpl$ChildServiceTarget.install(ServiceControllerImpl.java:2555) at org.jboss.msc.service.ServiceBuilderImpl.install(ServiceBuilderImpl.java:317) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.lambda$processDeployment$0(UndertowDeploymentProcessor.java:405) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.processDeployment(UndertowDeploymentProcessor.java:405) at org.wildfly.extension.undertow.deployment.UndertowDeploymentProcessor.deploy(UndertowDeploymentProcessor.java:190) at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:165) ... 5 more 21:38:14,914 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 75) MSC000001: Failed to start service jboss.undertow.deployment.default-server.default-host./WSTest: 

org.jboss.msc.service.StartException in service jboss.undertow.deployment.default-server.default-host./WSTest: java.lang.ExceptionInInitializerError at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:84) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.jboss.threads.JBossThread.run(JBossThread.java:320) 

Caused by: java.lang.ExceptionInInitializerError at org.apache.axis.transport.http.AxisServlet.(AxisServlet.java:75) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.axis.transport.http.AxisServletBase.class$(AxisServletBase.java:59) at org.apache.axis.transport.http.AxisServletBase.(AxisServletBase.java:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.jboss.as.ee.component.ConstructorComponentFactory.create(ConstructorComponentFactory.java:24) at org.jboss.as.ee.component.ComponentInstantiatorInterceptor.processInvocation(ComponentInstantiatorInterceptor.java:67) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.AroundConstructInterceptorFactory$1.processInvocation(AroundConstructInterceptorFactory.java:26) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:60) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:161) at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:134) at org.jboss.as.ee.component.BasicComponent.createInstance(BasicComponent.java:88) at org.jboss.as.ee.component.ComponentRegistry$ComponentManagedReferenceFactory.getReference(ComponentRegistry.java:149) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$6.createInstance(UndertowDeploymentInfoService.java:1221) at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:245) at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:133) at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:565) at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:536) at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42) at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43) at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:578) at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100) at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:81) ... 6 more Caused by: 

org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory at org.apache.commons.discovery.tools.ClassUtils.verifyAncestory(ClassUtils.java:176) at org.apache.commons.discovery.tools.SPInterface.verifyAncestory(SPInterface.java:201) at org.apache.commons.discovery.tools.SPInterface.newInstance(SPInterface.java:195) at org.apache.commons.discovery.tools.DiscoverClass.newInstance(DiscoverClass.java:579) at org.apache.commons.discovery.tools.DiscoverSingleton.find(DiscoverSingleton.java:418) at org.apache.commons.discovery.tools.DiscoverSingleton.find(DiscoverSingleton.java:378) at org.apache.axis.components.logger.LogFactory$1.run(LogFactory.java:45) at java.security.AccessController.doPrivileged(Native Method) at org.apache.axis.components.logger.LogFactory.getLogFactory(LogFactory.java:41) at org.apache.axis.components.logger.LogFactory.(LogFactory.java:33) ... 45 more 21:38:14,918 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 1) WFLYCTL0013: Operation ("full-replace-deployment") failed - address: ([]) - failure description: { "WFLYCTL0080: Failed services" => { "jboss.deployment.subunit.\"WSTestEAR.ear\".\"Sample.war\".INSTALL" => "WFLYSRV0153: Failed to process phase INSTALL of subdeployment \"Sample.war\" of deployment \"WSTestEAR.ear\" Caused by: org.jboss.msc.service.DuplicateServiceException: Service jboss.undertow.deployment.default-server.default-host./Sample.session is already registered", "jboss.undertow.deployment.default-server.default-host./WSTest" => "java.lang.ExceptionInInitializerError Caused by: java.lang.ExceptionInInitializerError Caused by: org.apache.commons.discovery.DiscoveryException: No implementation defined for org.apache.commons.logging.LogFactory" },


Thursday, August 20, 2020

Elastic Search - Types of Analyzer in the Elastic Search

 Do you know how many types of analyzers available in the Elastic Search? Are you looking for the details about all the analyzers come with Elastic Search? If so, then you reached the right place. In this article, we will discuss the types of analyzes which are more commonly used in Elastic Search.




What is an Analyzer?

An analyzer is a package which contains three lower-level building blocks: character filters, tokenizers, and token filters which are used for analyzing the data 


Types of Analyzer

Here is a list of analyzer which comes with Elastic Search-

  • Standard Analyzer
  • Simple Analyzer
  • Whitespace Analyzer
  • Stop Analyzer
  • Keyword Analyzer
  • Pattern Analyzer
  • Language Analyzers
  • Fingerprint Analyzer


Understanding Analyzers

  • Standard Analyzer

The text gets divided into terms of word boundaries in a standard analyzer. The punctuations are removed and the upper case is converted into lowercase. It also supports removing stop words.

e.g 

Input: "This is a sample example, for STANDARD-Aanlyzer"

Output:[this, is, a, sample, example, for, standard, analyzer]


  • Simple Analyzer

With Simple Analyzer, the text is divided into separate terms whenever non-letter character appears. The non-letter character can be number, hyphens, and space, etc. The upper case characters are converted into lowercase. 

Input: "My dog's name is Rocky-Hunter"

Output:[my, dog, s, name, is, rocky, hunter]


  • Whitespace Analyzer

The input phrase is divided into terms based on whitespace. It does not lowercase terms.

Input: "Technology-World has articles on ElasticSearch and Artificial-Intelligence etc."

Output:[Technology-World, has, articles, on, ElasticSearch, and,  Artificial-Intelligence, etc.]


  • Stop Analyzer

A stop analyzer is a form of  Simple Analyzer where the text is divided into separate terms whenever non-letter characters encountered. The non-letter character can be number, hyphens, and space, etc.  Like Simple analyzer in Stop Analyzer, the upper case characters are converted into lowercase. Additionally, it removed the stop words. Assume that stop word file includes work 'the', 'is', 'of', 

Input: "Gone with the wind is one of my favorite books."

Output:[Gone, with, wind, one, my, favorite, books]




  • Keyword Analyzer

The input phrase is NOT divided into terms rather output phrase/token is the same as the input phrase.

Input: "Mount Everest is one of the worlds natural wonders"

Output:[Mount Everest is one of the worlds natural wonders]


  • Pattern Analyzer

The regular expression is used in the pattern analyzer to split the text into terms. The default regular expression is \W+  which is nothing but all non-word characters. We need to remember that the regular expression is used as a term separator in the input phrase. The upper case characters are converted into lower case, also the stop words are removed.

Input: "My daughter's name is Rita and she is 7 years old"

Output:[my, daughter, s, name, is, Rita, and, she, is, 7, years, old]


  • Language Analyzers

The language-specific such as English, French, Hindi are provided in the Elasticsearch. 

Here is a sample keyword from the Hindi language analyzer.

e.g. "keywords": ["उदाहरण"]


  • Fingerprint Analyzer

The fingerprint analyzer is used for duplicate detection. The input phrase is converted into lowercase, the extended characters are removed. The duplicate words are removed and a single toke is created. It also supports stop words.

Input: "á is a Spanish accents character"

Output:[a, accents, character, is, spanish]


Learn more about Elastic Search here




Monday, August 10, 2020

Elastic Search Concepts - Cluster, Node, Index, Document, Shard and Replica

 Are you looking for detailed information about the various concepts used in Elastic Search? Are you also interested in knowing what is Document, Shard, and Replica in Elastic Search? If so, then you reached the right place. In this article, we will understand all the important concepts which are more commonly used in the Elastic Search.


A. Elastic Search Cluster

The Cluster is a collection of nodes. It has a unique name. If we do not provide any name to the cluster then it defaults as elasticsearch. We can create clusters specific to each environment. for example, we can development cluster or QA clusters or production clusters. We can create clusters with more than one node, however, it is totally okay if we have just one node in a cluster. The cluster provides indexing and searchable capabilities across all the nodes. i.e. when we perform search or index a data we do not have worry about on which node the data is getting indexed or searched.


B. The node in Elastic Search

The node is a single server which is the part of the cluster that stores the data. Node has a unique name as like a cluster name. Node provides important capabilities such as search and index which is part of a cluster. An important thing to remember is the node names are in all lower case. We can create as many nodes as we want. There is no limit on it. If a cluster has more than one node than each node contains a subset of data.




C. Index

So, what is an index? As we know the nodes contain indices and an index is a collection of similar documents. for example, the document can be customer information or production information. In short for each type of document we create the index. The index name is in lowercase. The index name is used for indexing, searching, updating, deleting documents within an index. We can create n number of indices in a cluster.


D. Category or Type in Elastic Search

Inside each index, we have a type it is nothing but a category. We can create multiple categories such as Customer, Product, Vendor, Supplier, Broker, etc. Assume that our index name is the customer then we can create categories such as Individual, Organization, Self Proprietor, etc. Under each category, we can have document. The type has a name and associated with mapping. We create a separate mapping for each type of index. Here is some additional note about category or type. As we know Elastic search is built on Lucene and in Lucene there is no concept of type or category. The category is stored as _type in the metadata. while search document of a particular type, elastic search applies a filter on this field




E. Mapping in Elastic Search

The mapping describes fields and their types. e.g. data types such as string, integer, date, geo, etc. It also contains details about how each field will be indexed and stored. In many cases we don't have to create mapping explicitly, it is called dynamic mapping.


F. Document

The document is the base unit of information in the Elastic Search. The document contains fields with key/value pair values. The value can be of any data type such as string, date, integer which is defined in the mapping.   It could be a single Customer or Product or vendor etc. The document is in JSON format and it physically resides in the index which we create. We can as many documents as we need in a given index.


G. Shard

The shard is a portion of that index. We can divide index into multiple pieces i.e. shards which will be helpful if we have large set data to store on the physical disk. If the physical disk does not have enough capacity then we can divide the index into multiple pieces.  each shard is a fully functional index in its own. By default while creating an index we create five shards, however, we can configure as many shards as we need. In short, shards are created to achieve scalability.




H. Replica

The replica is a segment of an index or a copy of the shard. We never locate a replica on the same node where the primary shard is present so that when one node goes down, another node will be helpful for recovery. By default, while creating an index we create only one replica. Assume that we have two nodes, in that case, we will have five replica shards and five primary shards across two nodes. So replica's are helpful to achieve high availability. An important thing to note about replica is - Search queries can be executed on all replicas in parallel.



Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...