Monday, April 28, 2014 - Architectural Overview

Shoppertom is an e-commerce comparison site. In order to understand it's architecture we should first understand its business requirements:

1. Collect data from multiple websites daily (using crawling and API methods)
2. Store data in highly efficient queryable format to be used for future data display
3. Be highly self maintained (The team maintaining shoppertom is small and cannot spend a lot of time on maintenance)
4. Display web pages to users
5. Collect images from web sites, create thumbnails from them and share them to users

To answer the above business requirements I am using the following tools and technologies.

  1. Heavily modified Nutch 1.6 version is being used to collect the data from different stores.
    The main reasons for using Nutch are:
    1. Nutch is a mature environment
    2. Nutch has built in support for plugins
    3. Nutch has built in support for robots.txt - which is critical to respect the law/common practice in regards to crawling
  2. MySql Database is used to host all the website crawling metadata and some crawling collected data
  3. Solr is used to host the main crawling data 
    1. Facetting is used for supporting the web search categories
    2. Extended Dis Max is used to support complicated query syntax and returning similar results from solr
  4. Combination of Python scripts and Jenkins are used to run the Nutch jobs and track their progress.
  5. ASP .Net MVC 4 is used to return the web pages displayed to the users
    1. I have choosen ASP .NET MVC as simple implementation is easily built and deployed - though TCO is higher.
  6. Entity Framework and Hibernate are used to query data from the MySQL database (from c# and java respectively)
  7. Image collection is done using:
    1. Some images are collected automatically using multiple Nutch plugins
    2. Some images are collected by a CGI script

In future posts I will elaborate more on some of these technologies (while still trying to hide some of the IP).


  1. I blog quite often and I seriously thank you for your content. This great article has truly peaked my interest. I am going to take a note of your site and keep checking for new information about once per week. I opted in for your Feed as well
    online statistics homework help

  2. The ultimate goal of online sociology research paper writing services is to provide Sociology Assignment Writing Services and sociology essay writing services since most sociology term paper writing service students lack time to complete their custom sociology coursework writing services.

  3. Biological science assignment writing service seekers have been on the rise lately since most learners need Biological Science Writing Services, biology research paper services and biological science essay writing services.