Thursday, 9 July 2015

Load data into Solr using Data Load utility in WCS

Scenario

We need to load the product details into new Solr core using Data Load utility in WCS.

Description

The WCS Data Load utility performs the following functions in a single operation:

1.             Reads the data from the input source file.
2.             Transforms the source data to WebSphere Commerce business objects.
3.             Allocates and resolves WebSphere Commerce business objects to physical data.
4.             Loads the physical data into Solr.

Step 1 – Create a new Solr core

Navigate to ..\IBM\WCDE_ENT70\search\solr\home and edit solr.xml file. Create a new Solr core by adding the following lines (in bold) inside the <core> tag –
<cores>
<core instanceDir="MC_10001\generic\CatalogEntry\Product\" name="MC_10001_CatalogEntry_Product_generic">
</core>

</cores>

Step 2 – Create a folder structure to place the configuration, data and index load files for indexing the Product details

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry and create “Product” folder.
Now navigate to this folder and then create 3 sub folders:

1.     conf – to hold the Solr configuration files
2.     data – to store the indexes  
3.     indexload – to hold the files required to run the dataload utility.

Step 3 – Place the Solr configuration files

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\ Product \conf and place the following Solr configuration files (schema.xml, solrconfig.xml, wc-data-config.xml, stopwords.txt and protwords.txt). You can copy and paste these files from the default conf directory at …\IBM\WCDE_ENT70\search\solr\home\default\conf.

Now edit the schema.xml to add the Product details (in bold) inside the <fields> tag as given below -

<fields>
<!—Product Details-->
<field name="ProductID" type="string" indexed="true" stored="true" required="true"/>
<field name="ProductName" type="string" indexed="true" stored="true" required="false"/>
<field name="ProductDescription" type="string" indexed="true" stored="true" required="false"/>
 </fields>

And determine the unique key as given below –

<uniqueKey>ProductID</uniqueKey>

Set the “defaultSearchField” as “ProductID” as given below -

<defaultSearchField>ProductID</defaultSearchField>

Step 4 – Place the indexload files
There are three configurations files and an input source file required to complete the data loader setup.
The three configuration files are listed below –

1.             wc-solrIndex-env.xml
2.             wc-solrIndex-xml-loader.xml
3.             wc-solrIndex-load.xml

The wc-solrIndex-data.xml is the input source file.

Navigate to ..\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\Product\indexload and place the indexload files (wc-solrIndex-env.xml, wc-solrIndex-load.xml, wc-solrIndex-xml-loader.xml and wc-solrIndex-data.xml).

  
a.     wc-solrIndex-env.xml holds the environment settings to connect to the Solr server. Provide the Solr Server Url in the configuration property as given below -

<?xml version="1.0" encoding="UTF-8"?>
<_config:DataLoadEnvConfiguration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../xml/config/xsd/wc-dataload-env.xsd" 
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

<!-- database setting for derby in Toolkit
<_config:Database type="derby" name="..\db\mall" schema="APP"/> -->

<!-- database setting for Oracle  
<_config:Database name="<database name>" user="<user>" password="<password>"
port="1521" schema="<schema name>" server="<server>" type="Oracle" dbDriverType="thin" />  -->
<!-- database setting for AIX/DB2 server--> 
<!--
<_config:Database type="<type>" name="<database name>" user="<user>" password="<password>" server="<server>" port="<port>" schema="<schema name>" /> 
-->
<_config:DataWriter className="com.ibm.commerce.foundation.dataimport.dataload.datawriter.SolrDataWriter">
<_config:property name="solrServerURL" value="http://localhost:81/solr/" />
<!-- 
If solr Server security is enabled, the user name, and password can be specified below
or, they can be passed when running dataload utility
Usage:
./dataload.sh ../../dataload/wc-solrindex-load.xml -DsolrServerUser=xxxx -DsolrServerUserPwd=xxxx
<_config:property name="solrServerUser" value="${solrServerUser,}" />
<_config:property name="solrServerUserPwd" value="${solrServerUserPwd,}" />-->
<_config:DataLoadBatchService className="com.ibm.commerce.foundation.dataimport.dataload.batchservice.SolrIndexBatchService"/>
</_config:DataWriter>

</_config:DataLoadEnvConfiguration>


b.    wc-solrIndex-xml-loader.xml holds the mapping between your input xml file and the schema fields. The “idFieldName” value should match the index uniqueKey value as shown below –

<?xml version="1.0"?>
<_config:DataloadBusinessObjectConfiguration xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload-businessobject.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<_config:DataLoader className="com.ibm.commerce.foundation.dataload.BusinessObjectLoader"> 
<_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader"> 
<_config:XmlHandler className="com.ibm.commerce.foundation.dataload.xmlhandler.NVPXmlHandler"/> 
</_config:DataReader> 
<_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.dataload.businessobjectbuilder.MapObjectBuilder"> 
<_config:DataMapping> 
<_config:mapping value="ProductID" xpath="ProductID"/> 
<_config:mapping value="ProductName" xpath="ProductName"/> 
<_config:mapping value="ProductDescription" xpath="ProductDescription"/> 
<_config:mapping value="delete" xpath="" deleteValue="true"/> 
</_config:DataMapping> 
<_config:BusinessObjectMediator className="com.ibm.commerce.foundation.dataimport.dataload.mediator.SolrInputDocumentMediator"> 
<!-- idFieldName value should match the index uniqueKey value -->
<_config:property value="ProductID" name="idFieldName"/>
</_config:BusinessObjectMediator> 
</_config:BusinessObjectBuilder> 
 </_config:DataLoader> 
 </_config:DataloadBusinessObjectConfiguration>

Here xpath holds the field from the input source file and the value holds its corresponding field from the schema file.


c.     wc-solrIndex-load.xml is the main xml that is passed as parameter to the dataload utility, this file contains information about the environment file, loader file and input xml file for the loader.
 <?xml version="1.0" encoding="UTF-8"?>
<_config:DataLoadConfiguration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../xsd/wc-dataload.xsd" 
xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config">

<_config:DataLoadEnvironment configFile="wc-solrIndex-env.xml" />

    <_config:LoadOrder commitCount="100000" batchSize="100000" maxError="1" dataLoadMode="Replace" >
       <_config:LoadItem name="Product" loadSequence="1.0" businessObjectConfigFile="wc-solrIndex-xml-loader.xml">
<_config:property name="coreName" value="MC_10001_CatalogEntry_Product_generic" />
<_config:DataSourceLocation location="wc-solrIndex-data.xml" />
<!-- if not specified, default values will be used
<_config:property name="connectionTimeout" value="15000"/>
           <_config:property name="soTimeout" value="15000"/>
           <_config:property name="maxRetries" value="1"/>
           <_config:property name="allowCompression" value="true"/>
           <_config:property name="followRedirects" value="false"/>
           <_config:property name="defaultMaxConnectionsPerHost" value="600"/>
           <_config:property name="maxTotalConnections" value="600"/> 
-->      
</_config:LoadItem> 
    </_config:LoadOrder>

</_config:DataLoadConfiguration>

d. wc-solrIndex-data.xml This is the inpur source file that holds the data to be indexed into Solr.

<?xml version="1.0" encoding="UTF-8"?>
<ProductList>
  <Product  ProductID="UK4-809" ProductName="Product1" ProductDescription="XYZ"/>
  <Product  ProductID="ED0-234" ProductName="Product2" ProductDescription="ABC"/>
  <Product  ProductID="AE3-674" ProductName="Product3" ProductDescription="XYZ"/>
</ProductList>

Step 5 – Running the Data Load Utility

            Once the data loader setup is ready, restart the WebSphere Application Server. To run the                   dataload utility, go to command prompt and navigate to the bin folder inside the WCS                         installation folder (..\IBM\WCDE_INSTALL70\bin).

            Type dataload.bat <path of dataload file>

            For e.g
            dataload.bat  …             
\IBM\WCDE_ENT70\search\solr\home\MC_10001\generic\CatalogEntry\Product\indexload\wc-solrIndex-load.xml

            Data Load execution starts –



Step 6 – Retrieving Product details from Solr

            The Product details can be queried and retrieved from Solr using the Solrj API. Hit the below             URL to retrieve the product list –

            Query results –


Step 7 – Debugging the issues

            If you encounter any issues while running the data load utility, you can find the log files under             the name “Instagram_ERROR_<current_date>_<current_time>.log” generated at …                     \IBM\WCDE_ENT70\logs





No comments:

Post a Comment