Thursday 9 July 2015

Solr Preprocessing and Indexing in WCS

Scenario

To create a custom preprocessor and index the new attribute ‘BESTSELLER’ for the catalog entry document.

Steps for Preprocessing

1. Navigate to ..\IBM\WCDE_ENT70\search\pre-processConfig\MC_10001\DB2 and create a folder ‘bestseller’. 
2. Now create a custom preprocessor with the below lines of code and name the file as  ‘wc-dataimport-preprocess-fullbuild.xml’ to perform the preprocessing explosion and flattening of data from the custom bestseller table. (We name the custom preprocessor as the “wc-..-..-fullbuild.xml because the preprocessing script looks for this file once it is executed).

<?xml version="1.0" encoding="UTF-8"?>

<_config:DIHPreProcessConfig xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../xsd/wc-dataimport-preprocess.xsd ">        
 <_config:data-processing-config processor="com.ibm.commerce.foundation.dataimport.preprocess.StaticAttributeDataPreProcessor" masterCatalogId="10001" batchSize="500">
    <_config:table definition="CREATE TABLE XI_BESTSELLER_0_#lang_tag# (CATENTRY_ID BIGINT NOT NULL, BESTSELLER VARCHAR(10240))" name="XI_ BESTSELLER _0_#lang_tag#"/>
                <_config:query sql="SELECT TI_CE.CATENTRY_ID CATENTRY_ID, ATTRVALDESC.STRINGVALUE BESTSELLER
                                                 FROM TI_CATENTRY_0 TI_CE, CATENTRYATTR CATENTRYATTR, ATTRVALDESC ATTRVALDESC
                                                  WHERE
                                                                TI_CE.CATENTRY_ID = CATENTRYATTR.CATENTRY_ID
                                                                AND CATENTRYATTR.ATTR_ID = (SELECT ATTR_ID FROM ATTR WHERE IDENTIFIER = 'BESTSELLER')
                                                                AND CATENTRYATTR.ATTRVAL_ID = ATTRVALDESC.ATTRVAL_ID
                                                                AND ATTRVALDESC.LANGUAGE_ID =?language_id?
                                                 ORDER BY CATENTRY_ID"/>
    <_config:mapping>
      <_config:key queryColumn="CATENTRY_ID" tableColumn="CATENTRY_ID"/>
      <_config:column-mapping>
        <_config:column-column-mapping>
                <_config:column-column queryColumn="BESTSELLER" tableColumn="BESTSELLER" />
        </_config:column-column-mapping>
        </_config:column-mapping>
    </_config:mapping>                  
  </_config:data-processing-config>
 
 </_config:DIHPreProcessConfig>
 
3. Navigate to ..\IBM\WCDE_ENT70\bin and run the preprocessing scripts using the below command –

di-preprocess.bat ..\IBM\WCDE_ENT70\search\pre-processConf
ig\MC_10001\DB2\bestseller -force true

4. Validation to check if the preprocessing is successfully done –
Query the temp table and check if the table is populated with the column values.

Steps for Indexing

1. The schema.xml needs to be customized to add in the new field assignments from the preprocessing tableNavigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\en_US\CatalogEntry\conf and edit the schema.xml
Add the following field within the <fields> tag –

<field name="BESTSELLER" type="wc_keywordText" indexed="true" stored="true" multiValued="true"/>

2. Each targetable file (wc-data-config.xml) needs to be modified to pull the BESTSELLER data from the XI table and add it to the index based on the particular store for the Catentry.
Now edit the wc-data-config.xml file and add the following lines –

Go to the query section and inside ‘select’, add XI_BESTSELLER. BESTSELLER BESTSELLER,
Inside ‘FROM CATENTRY’, add LEFT OUTER JOIN XI_ BESTSELLER _0_1 XI_BESTSELLER ON (CATENTRY.CATENTRY_ID=XI_BESTSELLER.CATENTRY_ID)

Add the field mapping –

<field column=" BESTSELLER" splitBy=";" sourceColName=" BESTSELLER"/>

Here, column refers to the schema field in Solr and sourceColName refers to table column in db.

3. Navigate to ..\IBM\WCDE_ENT70\bin and run the indexing scripts using the below command –

di-buildindex.bat -masterCatalogId 10001 -indextype Catalo
gEntry -localename en_US

4. Validation to check if the field is indexed into Solr –
Hit the Solr URL and check if the field is indexed and populated with values.


Load data into Solr using Data Load utility in WCS

Scenario

We need to load the product details into new Solr core using Data Load utility in WCS.

Description

The WCS Data Load utility performs the following functions in a single operation:

1.             Reads the data from the input source file.
2.             Transforms the source data to WebSphere Commerce business objects.
3.             Allocates and resolves WebSphere Commerce business objects to physical data.
4.             Loads the physical data into Solr.

Step 1 – Create a new Solr core

Navigate to ..\IBM\WCDE_ENT70\search\solr\home and edit solr.xml file. Create a new Solr core by adding the following lines (in bold) inside the <core> tag –
<cores>
<core instanceDir="MC_10001\generic\CatalogEntry\Product\" name="MC_10001_CatalogEntry_Product_generic">
</core>

</cores>

Step 2 – Create a folder structure to place the configuration, data and index load files for indexing the Product details

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry and create “Product” folder.
Now navigate to this folder and then create 3 sub folders:

1.     conf – to hold the Solr configuration files
2.     data – to store the indexes  
3.     indexload – to hold the files required to run the dataload utility.

Step 3 – Place the Solr configuration files

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\ Product \conf and place the following Solr configuration files (schema.xml, solrconfig.xml, wc-data-config.xml, stopwords.txt and protwords.txt). You can copy and paste these files from the default conf directory at …\IBM\WCDE_ENT70\search\solr\home\default\conf.

Now edit the schema.xml to add the Product details (in bold) inside the <fields> tag as given below -

<fields>
<!—Product Details-->
<field name="ProductID" type="string" indexed="true" stored="true" required="true"/>
<field name="ProductName" type="string" indexed="true" stored="true" required="false"/>
<field name="ProductDescription" type="string" indexed="true" stored="true" required="false"/>
 </fields>

And determine the unique key as given below –

<uniqueKey>ProductID</uniqueKey>

Set the “defaultSearchField” as “ProductID” as given below -

<defaultSearchField>ProductID</defaultSearchField>

Step 4 – Place the indexload files
There are three configurations files and an input source file required to complete the data loader setup.
The three configuration files are listed below –

1.             wc-solrIndex-env.xml
2.             wc-solrIndex-xml-loader.xml
3.             wc-solrIndex-load.xml

The wc-solrIndex-data.xml is the input source file.

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\Product\indexload and place the indexload files (wc-solrIndex-env.xml, wc-solrIndex-load.xml, wc-solrIndex-xml-loader.xml and wc-solrIndex-data.xml).

  
a.     wc-solrIndex-env.xml holds the environment settings to connect to the Solr server. Provide the Solr Server Url in the configuration property as given below -

<?xml version="1.0" encoding="UTF-8"?>
<_config:DataLoadEnvConfiguration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../xml/config/xsd/wc-dataload-env.xsd" 
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

<!-- database setting for derby in Toolkit
<_config:Database type="derby" name="..\db\mall" schema="APP"/> -->

<!-- database setting for Oracle  
<_config:Database name="<database name>" user="<user>" password="<password>"
port="1521" schema="<schema name>" server="<server>" type="Oracle" dbDriverType="thin" />  -->
<!-- database setting for AIX/DB2 server--> 
<!--
<_config:Database type="<type>" name="<database name>" user="<user>" password="<password>" server="<server>" port="<port>" schema="<schema name>" /> 
-->
<_config:DataWriter className="com.ibm.commerce.foundation.dataimport.dataload.datawriter.SolrDataWriter">
<_config:property name="solrServerURL" value="http://localhost:81/solr/" />
<!-- 
If solr Server security is enabled, the user name, and password can be specified below
or, they can be passed when running dataload utility
Usage:
./dataload.sh ../../dataload/wc-solrindex-load.xml -DsolrServerUser=xxxx -DsolrServerUserPwd=xxxx
<_config:property name="solrServerUser" value="${solrServerUser,}" />
<_config:property name="solrServerUserPwd" value="${solrServerUserPwd,}" />-->
<_config:DataLoadBatchService className="com.ibm.commerce.foundation.dataimport.dataload.batchservice.SolrIndexBatchService"/>
</_config:DataWriter>

</_config:DataLoadEnvConfiguration>


b.    wc-solrIndex-xml-loader.xml holds the mapping between your input xml file and the schema fields. The “idFieldName” value should match the index uniqueKey value as shown below –

<?xml version="1.0"?>
<_config:DataloadBusinessObjectConfiguration xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader"> 
<_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader"> 
<_config:XmlHandler className="com.ibm.commerce.foundation.dataload.xmlhandler.NVPXmlHandler"/> 
</_config:DataReader> 
<_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder"> 
<_config:DataMapping> 
<_config:mapping value="ProductID" xpath="ProductID"/> 
<_config:mapping value="ProductName" xpath="ProductName"/> 
<_config:mapping value="ProductDescription" xpath="ProductDescription"/> 
<_config:mapping value="delete" xpath="" deleteValue="true"/> 
</_config:DataMapping> 
<_config:BusinessObjectMediator className="com.ibm.commerce.foundation.dataimport.dataload.mediator.SolrInputDocumentMediator"> 
<!-- idFieldName value should match the index uniqueKey value -->
<_config:property value="ProductID" name="idFieldName"/>
</_config:BusinessObjectMediator> 
</_config:BusinessObjectBuilder> 
 </_config:DataLoader> 
 </_config:DataloadBusinessObjectConfiguration>

Here xpath holds the field from the input source file and the value holds its corresponding field from the schema file.


c.     wc-solrIndex-load.xml is the main xml that is passed as parameter to the dataload utility, this file contains information about the environment file, loader file and input xml file for the loader.
 <?xml version="1.0" encoding="UTF-8"?>
<_config:DataLoadConfiguration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload.xsd" 
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

<_config:DataLoadEnvironment configFile="wc-solrIndex-env.xml" />

    <_config:LoadOrder commitCount="100000" batchSize="100000" maxError="1" dataLoadMode="Replace" >
       <_config:LoadItem name="Product" loadSequence="1.0" businessObjectConfigFile="wc-solrIndex-xml-loader.xml">
<_config:property name="coreName" value="MC_10001_CatalogEntry_Product_generic" />
<_config:DataSourceLocation location="wc-solrIndex-data.xml" />
<!-- if not specified, default values will be used
<_config:property name="connectionTimeout" value="15000"/>
           <_config:property name="soTimeout" value="15000"/>
           <_config:property name="maxRetries" value="1"/>
           <_config:property name="allowCompression" value="true"/>
           <_config:property name="followRedirects" value="false"/>
           <_config:property name="defaultMaxConnectionsPerHost" value="600"/>
           <_config:property name="maxTotalConnections" value="600"/> 
-->      
</_config:LoadItem> 
    </_config:LoadOrder>

</_config:DataLoadConfiguration>

d. wc-solrIndex-data.xml This is the inpur source file that holds the data to be indexed into Solr.

<?xml version="1.0" encoding="UTF-8"?>
<ProductList>
  <Product  ProductID="UK4-809" ProductName="Product1" ProductDescription="XYZ"/>
  <Product  ProductID="ED0-234" ProductName="Product2" ProductDescription="ABC"/>
  <Product  ProductID="AE3-674" ProductName="Product3" ProductDescription="XYZ"/>
</ProductList>

Step 5 – Running the Data Load Utility

            Once the data loader setup is ready, restart the WebSphere Application Server. To run the                   dataload utility, go to command prompt and navigate to the bin folder inside the WCS                         installation folder (..\IBM\WCDE_INSTALL70\bin).

            Type dataload.bat <path of dataload file>

            For e.g
            dataload.bat  …             
\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\Product\indexload\wc-solrIndex-load.xml

            Data Load execution starts –



Step 6 – Retrieving Product details from Solr

            The Product details can be queried and retrieved from Solr using the Solrj API. Hit the below             URL to retrieve the product list –

            Query results –


Step 7 – Debugging the issues

            If you encounter any issues while running the data load utility, you can find the log files under             the name “Instagram_ERROR_<current_date>_<current_time>.log” generated at …                     \IBM\WCDE_ENT70\logs





Creating custom request handler in Solr

Scenario
We have two Solr cores, Product List and Store Locator. The “ProductList” stores the details of the product like ProductID, ProductName and ProductDescription. The “StoreLocator” stores the details of the stores like StoreID, ProductID, StoreLocation and ProductAvailability. This core provides us information about the store and product availability in each store.We need to display the product details and the store location (that are coming from two different cores) of each product in the solr response.

Description of the handler

The newly created request handler in Solr named “/retrieve” will search for all records in ProductList core and then iterates through all the Solr documents to read the ProductID attribute. It then establishes a connection with the StoreLocator core and reads the StoreLocation attribute for each product based on this ProductID. Finally, the handler displays all the attributes from ProductList core along with the store location attribute from the StoreLocator core for each product.

Steps to follow

Step 1 - Create new cores - ProductList and StoreLocator

Navigate to solr directory at ..\solr-4.3.1\example\solr and add the following lines (in bold) in solr.xml file -

  <cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}"
zkClientTimeout="${zkClientTimeout:15000}">
<core name="collection1" instanceDir="collection1" />
<core name="ProductList" instanceDir="ProductList" />
<core name="StoreLocator" instanceDir="StoreLocator" />
 </cores>

Step 2 - Place the configuration files for ProductList core

Copy and paste the existing ‘collection1’ folder at ..\solr-4.3.1\example\solr. Now rename the folder as ‘ProductList’.

Step 3 - Schema Design for ProductList core

Navigate to conf folder at ..\solr-4.3.1\example\solr\ProductList\conf and add the following lines (in bold) in schema.xml file -
<fields>
<!--Product List -->
<field name="ProductID" type="string" indexed="true" stored="true" required="true"/>
<field name="ProductName" type="string" indexed="true" stored="true" required="false"/>
<field name="ProductDescription" type="string" indexed="true" stored="true" required="false"/>
</fields>

Please note that the unique key is the ProductID here.

Step 4 - Place the configuration files for StoreLocator core

Copy and paste the existing ‘collection1’ folder at ..\solr-4.3.1\example\solr\collection1. Now rename the folder as ‘StoreLocator’.

Step 5 - Schema Design for StoreLocator core

Navigate to conf folder at ..\solr-4.3.1\example\solr\StoreLocator\conf and add the following lines (in bold) in schema.xml file -
<fields>
<!--Store Locator Details -->
<field name="StoreID" type="string" indexed="true" stored="true" required="true"/>
<field name="ProductID" type="string" indexed="true" stored="true" required="false"/>
<field name="StoreLocation" type="string" indexed="true" stored="true" required="false"/>
<field name="ProductAvailability" type="string" indexed="true" stored="true" required="false"/>
</fields>

Please note that the unique key is the StoreID here.

Step 6 - Start the Solr sever

Navigate to ..\solr-4.3.1\example and run the command –
java -jar start.jar
Once the server is started, you could see the ProductList and StoreLocator cores getting listed in the core selector in the Solr Admin Page as shown below.

Step 7 - Indexing data into both the cores

Index data into ProductList and StoreLocator cores. Please ensure that the ProductID is the common attribute in these cores.

After indexing data into ProductList core –



After indexing data into StoreLocator core –


Step 8 - Configure the Request Handler in both the cores

Configure the request handler in the solrconfig.xml of both the cores. The handler is configured to trigger in response to  ‘../solr/retrieve’ request.

<requestHandler name="/retrieve" class="com.custom.solr.handlers.SolrJSearcher">
</requestHandler>

Step 9 - Create a custom subclass of RequestHandlerBase and override the handleRequestBody(SolrQueryRequest, SolrQueryResponse) method

package com.custom.solr.handlers;

import java.util.Iterator;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.handler.RequestHandlerBase;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;

/**
* Title : SolrJSearcher Description : Searches for all entries in ProductList collection and gets the ProductID entries.
* Makes a connection with the StoreLocation collection, for every ProductID retrieved in the above step, it queries and
* gets the corresponding StoreLocation. It displays the product details and the store location
* of each product in the Solr response.
*
* Revision History ----------------
*
* @version 1.0
* @author 5/22/15 sofia l {None.} Written Code
*/
public class SolrJSearcher extends RequestHandlerBase {
HttpSolrServer serverProductList =  null;
HttpSolrServer serverStoreLocator =  null;
static org.apache.log4j.Logger logger = org.apache.log4j.Logger.getLogger(SolrJSearcher.class.getName());

public SolrJSearcher(){     
serverProductList = new HttpSolrServer("http://localhost:8983/solr/ProductList");
serverStoreLocator = new HttpSolrServer("http://localhost:8983/solr/StoreLocator");
}

@Override
public String getDescription() {
return "SolrJSearcher";
}

@Override
public String getSource() {
return null;
}

@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) throws Exception {

String productID = null;
ModifiableSolrParams qparamsProductID=new ModifiableSolrParams();
qparamsProductID.add("q","*:*");
QueryResponse qresProductID=serverProductList.query(qparamsProductID);
SolrDocumentList resultsProductID=qresProductID.getResults();
SolrDocumentList resultsStoreLocation = null;
long numFoundProductID = qresProductID.getResults().getNumFound();

// Iterate through solr response
SolrDocument solrDocumentforProductID = null;
for (Iterator<SolrDocument> iterator = resultsProductID.iterator(); iterator.hasNext();) {
for (int i = 0; i < numFoundProductID; i++) {
solrDocumentforProductID = (SolrDocument) iterator.next();
productID = (String) solrDocumentforProductID.getFieldValue("ProductID");
ModifiableSolrParams qparamsStoreLocation=new ModifiableSolrParams();
qparamsStoreLocation.add("q","ProductID:" +productID);
QueryResponse qresStoreLocation=serverStoreLocator.query(qparamsStoreLocation);
resultsStoreLocation=qresStoreLocation.getResults();
long numFoundStoreLocation = qresStoreLocation.getResults().getNumFound();

// Iterate through solr response
SolrDocument solrDocumentforStoreLocation = null;
String storeLocation = null;
for (Iterator<SolrDocument> iterator1 = resultsStoreLocation.iterator();iterator1.hasNext();) {
for (int j = 0; j < numFoundStoreLocation; j++) {
solrDocumentforStoreLocation = (SolrDocument) iterator1.next();
storeLocation = (String) solrDocumentforStoreLocation.getFieldValue("StoreLocation");
solrDocumentforProductID.addField("StoreLocation", storeLocation);
}
}
}
resp.add("Results", resultsProductID);                       
}
}
}

Now compile the code and generate the jar file “SolrJSearcher.jar”.

Step 10 - Place the library files

Place all the required jar files (solr-solrj-4.3.1.jar, solr-core-4.3.1.jar and log4j-1.2.16.jar) including the SolrJSearcher.jar inside a directory named ‘handerlib’ at solrhome . Specify the path of the directory in the solrconfig.xml of both the cores as given below –

<lib dir="C:\..\..\..\solr-4.3.1\solr-4.3.1\example\solr\handlerlib" regex=".*\.jar" />

Step 11 - Re-Start the solr server and hit the below url to access the custom request handler