Subversion Repositories SmartDukaan

Rev

Blame | Last modification | View Log | RSS feed

Price Analyser
        It's basically a information retrieval and extraction tool. In our case this information is price and other features for a phone from different suppliers.
        The tool can also be used for comparison between the two supplier for a same phone.
        Brief description of its working:
                With the help of Turbogears2.0 forms are prepared for specifying certain parameters which would be helpful in which web page we want to retrieve
                and from that web page what we want to retrieve. These values are dumped into database.
                
                Now using Scrapy and Xpaths specifed as parameters in the forms, various information about the phones are captured and dumped into the database.

                Google charting API's are used for comparison and graphical viewing pleasure.

                The software also provides querying and searching functionality(a basic search engine functionality). Lucene is used for embedding this feature
                into our application.   



Detailled description:
        Forms part:
                VIEW    
                path to files : home/gaurav/code/TG2TEST/src/wiki20/templates
                codedata_forms.html: Give us the link for individual forms
                input_form.html : Form for a particular supplier, supplier depends on the link choosen in the codedata_forms.html

                MODEL   
                path to files : home/gaurav/code/TG2TEST/src/wiki20/widgets
                babuchak_form.py : Structure of the form for babuchak, contains the parameters that needs to be taken as input from the user 
                indiaplaza_form.py : Structure of the form for indiaplaza, contains the parameters that needs to be taken as input from the user 
                infibeam_form.py : Structure of the form for infibeam, contains the parameters that needs to be taken as input from the user 
                mobilestore_form.py : Structure of the form for mobilestore, contains the parameters that needs to be taken as input from the user 
                naaptol_form.py : Structure of the form for naaptol, contains the parameters that needs to be taken as input from the user 
                univercell_form.py : Structure of the form for univercell, contains the parameters that needs to be taken as input from the user 

                CONTROLLER
                path to files : home/gaurav/code/TG2TEST/src/wiki20/controllers
                root.py : All the processing part is done here, input is taken here, on submit this input is saved in database.
                There are two type of methods:
                1) form_infibeam : To create the structure it calls the template and pass it the values present in DB as default values
                2) save_infibeam : Take the values given by user, store it in DB and then then show the template for taking input again.
                Same template is used for both the type of methods.
        Scrapy: 
        Here individual spiders are written for each supplier, for some extra scripts are also needed because the number of pages they have for information
        is variable. These include infibeam, indiaplaza and mobilestore. These scripts are places in home/gaurav/code named runinfibeam, runindiaplaza and 
        runmobstore respectively.

        ******************************************INFIBEAM**********************************************************************************

        Documentation for script runinfibeam.py
        This is the script called by consetup.py
        First it will run the spider for infibeam dynamically (i.e for determining no of pages)
        Then, it will generate the csv file
        @param  path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsep

        Documentation for class infi_spider
        This spider collects the information for the individual phones
        and store them in table datastore_datadefinition_infibeam_data  
        
        Documentation for class infibeam_data
        It represents database table for infibeam, it stores
        name = name of the phone
        shown_price = price offered by infibeam
        final_price = price which one has to pay, 
        final_price = shown_price + taxes + ship-price

        Documentation for method add_infiphone 
        This method is used to add a phone in infibeam's table

        Documentation for method get_all_infibeam_data 
        This method is used to retrieve all the phones in infibeam's table
        
        Documentation of various parameters     
        Xpath1 = Give us section for individual phone
        Xpath2 = Give us name for individual phone
        Xpath3 = Give us quoted price for individual phone
        vatplustax = to get final price from quoted price
        Removelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'
        URL = The start url for the supplier from this page we start crawling
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname = Name by which this spider is known outside, passed as an argument when calling this spider

      **************************************************************************************************************************************


      ******************************************INDIAPLAZA**********************************************************************************
        
        Documentation for script runindiaplaza.py
        This is the script called by consetup.py
        First it will run the spiders for indiaplaza dynamically (i.e for determining no of pages)
        Then, it will generate the csv file
        @param  path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsep  
        
        Documentation for class indiaplaza_spider
        This spider collects the url for the individual phones
        and store them in table datastore_datadefinition_indiaplaza_data.

        Documentation for class indiaplaza_extra
        This spider collects all the information for the individual phones
        and store them in table datastore_datadefinition_indiaplaza_items.

        Documentation for class indiaplaza_data
        It represents database table for indiaplaza, it stores
        v_name = name of the vendor
        v_site = url of the vendor

        Documentation for class indiaplaza_items
        It represents database table for indiaplaza, it stores
        p_name = name of the phone
        p_shown_price = price offered by indiaplaza
        p_final_price = price which one has to pay, 
        p_final_price = p_shown_price + taxes + ship-price
        p_guaranteeinfo = duaration of guarantee and whether guarantee is from vendor or manufacturer 
        p_shipinfo = how much time would be taken for shipping
        
        Documentation for method add_ipbasic 
        This method is used to add a url for phone in indiaplaza's table
        
        Documentation for method get_all_ipbasic 
        This method is used to retrieve all phone-urls in indiaplaza's table

        Documentation for method add_ipextra 
        This method is used to add a phone in indiaplaza's table

        Documentation for method get_all_indiaplaza_phones 
        This method is used to retrieve all the phones in indiaplaza's table

        Documentation of various parameters     
        Xpath1 = Give us section for individual phone
        Xpath2 = Give us name of individual phone
        Xpath3 = Give us url of individual phone
        Url1 = To get full url for individual phones
        Xpath4 = Give us name for individual phone
        Xpath5 = Give us quoted-price for individual phone
        Xpath6 = Give us ship-price for individual phone
        Xpath7 = Give us ship_price for individual phone, if not gettable form xpath6
        Xpath8 = Give us guarantee-info for individual phone
        Xpath9 = Give us guarantee-info for individual phone, if not gettable form xpath8
        Xpath10 = Give us ship-info for individual phone
        Removelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'
        URL = The start url for the supplier from this page we start crawling
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname = Name by which the first spider is known outside, passed as an argument when calling this spider
        domainname1 = Name by which the second spider is known outside, passed as an argument when calling this spider
        var1 = To check for free shipping
        
      **************************************************************************************************************************************    
        
      ******************************************MOBILESTORE**********************************************************************************
        
        Documentation for script runmobstore.py
        This is the script called by consetup.py
        First it will run the spider for mobilestore dynamically (i.e for determining no of pages)
        Then, it will generate the csv file
        @param  path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsep

        Documentation for class mobilestore_spider0
        This spider collects the information for the individual phones
        and store them in table datastore_datadefinition_themobilestorephones_new                        
        
        Documentation for class themobilestorephones_new
        It represents database table for themobilestore, it stores
        name = name of the phone
        shown_price = price offered by themobilestore
        final_price = price which one has to pay, 
        final_price = shown_price + taxes + ship-price
        extra_info = whether phone can be bought or not

        Documentation for method add_new_mobstorephone_new 
        This method is used to add a phone in themobilestore's table

        Documentation for method get_allmobstorephone_new 
        This method is used to retrieve all the phones in themobilestore's table

        Documentation of various parameters     
        Xpath3 = Give us name for individual phone
        Xpath4 = Give us price for individual phone
        Xpath5 = Give us name for individual phone, if its not gettable from xpath3
        Xpath6 = Give us name for individual phone, if its not gettable from xpath3 and xpath5
        Xpath7 = to check that the phone can be bought or not
        Xpath8 = to check that the item is mobile phone
        url1 and url2 = used for getting actual start urls
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname0 = Name by which this spider is known outside, passed as an argument when calling this spider
        
        **************************************************************************************************************************************  
        
        ******************************************UNIVERCELL**********************************************************************************

        Documentation for class vendor_links
        This spider collects the url for the individual vendors 
        and store them in table datastore_datadefinition_univercell_data.

        Documentation for class univercell_price
        This spider collects the information for the individual phones
        and store them in table datastore_datadefinition_univercell_items
        
        Documentation for class univercell_data
        It represents database table for univercell, it stores
        v_name = name of the vendor
        v_site = url of the vendor

        Documentation for class univercell_items
        It represents database table for univercell, it stores
        p_title = name of the phone
        p_shown_price = price offered by univercell
        p_final_price = price which one has to pay, 
        p_final_price = p_shown_price + taxes + ship-price

        Documentation for method add_univervendor 
        This method is used to add a vendor in univercell's table
        
        Documentation for method get_all_univervendor 
        This method is used to retrieve all the vendors in univercell's table

        Documentation for method add_univerphone 
        This method is used to add a phone in univercell's table

        Documentation for method get_all_univercell_phones 
        This method is used to retrieve all the phones in univercell's table

        Documentation of various parameters     
        Xpath1 = Give us section for individual vendors
        Xpath2 = Give us name for individual vendors
        Xpath3 = Give us url for individual vendors
        Url1 = To get full url for individual vendors
        var1,var2,var3 and var4 are used to get proper url
        Xpath4 = Give us section for individual phone
        Xpath5 = Give us name for individual phone
        Xpath6 = Give us quoted-price for individual phone
        vatplustax = Give us final_price for individual phone on adding with quoted-price
        Removelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'
        URL = The start url for the supplier from this page we start crawling
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname = Name by which the first spider is known outside, passed as an argument when calling this spider
        domainname1 = Name by which the second spider is known outside, passed as an argument when calling this spider
        Url1 = To get full url for individual vendor

        **************************************************************************************************************************************  
        
        ******************************************BABUCHAK**********************************************************************************

        Documentation for class babuchak1
        This spider collects the url for the individual vendors 
        and store them in table datastore_datadefinition_babuchak_urls.
        
        Documentation for class babuchak2
        This spider collects the url for the individual phones
        and store them in table datastore_datadefinition_babuchak_phoneurls.

        Documentation for class babuchak3
        This spider collects the information for the individual phones
        and store them in table datastore_datadefinition_babuchak_phones.

        Documentation for class babuchak_urls
        It represents database table for babuchak, it stores
        url = url for the vendors
        no_pages = number of pages for individual vendor

        Documentation for class babuchak_phoneurls
        It represents database table for babuchak, it stores
        url = url for the individual phones

        Documentation for class babuchak_phones
        It represents database table for babuchak, it stores
        name = name of the phone
        shown_price = price offered by babuchak
        final_price = price which one has to pay, 
        final_price = shown_price + taxes + ship-price

        Documentation for method add_babuchakurl 
        This method is used to add a url for vendor in babuchak's table

        Documentation for method get_allbabuchakurls 
        This method is used to retrieve all the vendor-urls in babuchak's table
        
        Documentation for method add_babuchakphoneurl 
        This method is used to add a url for phone in babuchak's table

        Documentation for method get_allbabuchakphoneurls 
        This method is used to retrieve all the phone-urls in babuchak's table

        Documentation for method add_babuchakphone 
        This method is used to add a phone in babuchak's table

        Documentation for method get_allbabuchakphones 
        This method is used to retrieve all the phone in babuchak's table

        Documentation of various parameters     
        Xpath1 = Give us section for individual vendors
        Xpath2 = Give us no of pages for individual vendors
        Xpath3 = Give us url for individual vendors
        Url1 = To get full url for individual vendors
        Xpath4 = Give us url for individual phone
        Url2 = To get full url for individual vendors
        Xpath5 = Give us name for individual phone
        Xpath6 = Give us quoted-price for individual phone
        Xpath7 = Give us final_price for individual phone
        Removelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'
        URL = The start url for the supplier from this page we start crawling
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname = Name by which the first spider is known outside, passed as an argument when calling this spider
        domainname1 = Name by which the second spider is known outside, passed as an argument when calling this spider
        domainname2 = Name by which the third spider is known outside, passed as an argument when calling this spider

        **************************************************************************************************************************************  
        
        ******************************************NAAPTOL**********************************************************************************
                
        Documentation for class naaptol_spider
        This spider collects the url for the individual phones
        and store them in table datastore_datadefinition_naaptol_urls.  

        Documentation for class naaptol_price
        Since the urls collected in the previous spider for naaptol.com
        are redirected to get the data for individual phones.
        Some are of the form "http://www.naaptol.com/features/10417-Fly-E300.html"
        while others are of the form "http://www.naaptol.com/price/10417-Fly-E300.html".
        So to make data extraction symmetric, this spider will accomplish 2 tasks
        First, for the urls conatining 'features' it collects the information for the 
        individual phones and store them in table datastore_datadefinition_naaptol_phones
        for the ones conatining 'prices' in the url, a new url having 'price' repalced  
        with 'features' is framed and stored in the table datastore_datadefinition_morenaaptol_urls.

        Documentation for class naaptol_price
        Spider collects the information for the individual phones and store them in table 
        datastore_datadefinition_naaptol_phones

        Documentation for class naaptolurls
        It represents database table for naaptol, it stores
        url = url of the phones, which we got from sitemap.xml

        Documentation for class naaptolurls
        It represents database table for naaptol, it stores
        url = url of the phones, here urls are the ones which are redirected 
        and contained 'price' but before storing 'price' is replaced by 'features'      
        
        Documentation for class naaptolphones
        It represents database table for naaptol, it stores
        name = name of the phone, 
        range = price range for each phone
        range is in one of the 3 forms, i.e
        range = a to b
        range = a
        range = a(approx)
        here a,b are integers

        Documentation for class ntonlinesp
        It represents database table for naaptol, it stores
        nid = id of the phone in naaptolphones 
        name = name of the onlinesupplier, 
        price = price offered by the supplier for the phone

        Documentation for class ntofflinesp
        It represents database table for naaptol, it stores
        nid = id of the phone in naaptolphones 
        name = name of the offlinesupplier, 
        price = price offered by the supplier for the phone

        Documentation for method add_naaptolurl 
        This method is used to add a url for phone in naaptol's table
        These are taken from sitemap.xml

        Documentation for method get_allnaaptolurls 
        This method is used to retrieve all the url for phones in naaptol's table

        Documentation for method add_morenaaptolurl 
        This method is used to add a url for phone in naaptol's table
        These are the urls generated by replacing 'price' with 'features'
        which were redirected

        Documentation for method get_allmorenaaptolurls 
        This method is used to retrieve all the extra urls for phones in naaptol's table

        Documentation for method add_new_naaptolphone 
        This method is used to add a phone in naaptol's table

        Documentation for method get_naaptolphone 
        This method is used to retrieve a phone in naaptol's table
        given its name and range

        Documentation for method get_allnaaptolphones 
        This method is used to retrieve all the phone in naaptol's table

        Documentation for method add_new_ntonlinesp 
        This method is used to add a online-supplier for a particular phone
        in naaptol's table

        Documentation for method get_allntonlinesp 
        This method is used to retrieve all the online-suppliers in naaptol's table

        Documentation for method get_allntonlinespbynid 
        This method is used to retrieve all the online-supplier for a particular phone 
        in naaptol's table given the id of phone in naaptolphones

        Documentation for method add_new_ntofflinesp 
        This method is used to add a offline-supplier for a particular phone
        in naaptol's table

        Documentation for method get_allntofflinesp 
        This method is used to retrieve all the offline-suppliers in naaptol's table

        Documentation for method get_allntolinespbynid 
        This method is used to retrieve all the offline-supplier for a particular phone 
        in naaptol's table given the id of phone in naaptolphones

        Documentation of various parameters     
        Xpath1 = Give us url for individual phones
        chklist1 = elements in chk_list are specific to this site for determining valid sites
        Xpath2 = Give us price-range for individual phone
        Xpath3 = Give us price-range for individual phone, if unable to retrieve from xpath2
        Xpath4 = Give us number of onlinesellers for a particular phone
        Xpath5 = Give us price for a particular phone offered by onlinesellers
        Xpath6 and Xpath7 = Give us name of onlinesellers for a particular phone 
        Xpath8 = Give us number of offlinesellers for a particular phone
        Xpath9 = Give us price for a particular phone offered by offlinesellers
        Xpath10 = Give us name of offlinesellers for a particular phone
        Removelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'
        chklist2 = contains what needs to be replaced, presently it conatains 'price'
        part = contains 'features'
        URL = The start url for the supplier from this page we start crawling
        homepage = Homepage for the supplier
        referer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/search
        domainname = Name by which the first spider is known outside, passed as an argument when calling this spider
        domainname1 = Name by which the second spider is known outside, passed as an argument when calling this spider
        domainname2 = Name by which the third spider is known outside, passed as an argument when calling this spider

        **************************************************************************************************************************************  
        
        ******************************************SUPPLIERS**********************************************************************************
        
        Documentation for class suppliers
        It represents database table for suppliers, it stores
        name = name of the supplier
        site = url of the supplier
        last_crawled = date of the last run for this supplier
        This table spans all the suppliers in our database
        
        Documentation for method add_suppliers 
        This method is used to add a supplier in supplier's table

        Documentation for method get_all_suppliers 
        This method is used to retrieve all the suppliers from the supplier's table

        Documentation for method get_suppId 
        This method is used to retrieve id of the supplier given his name
        from the supplier's table

        Documentation for method get_supp_byId 
        This method is used to retrieve a supplier given his id
        from the supplier's table

        Documentation for method get_suppbyName 
        This method is used to retrieve a supplier given his name
        from the supplier's table

        Documentation for method get_suppbySite 
        This method is used to retrieve a supplier given his url
        from the supplier's table

        **************************************************************************************************************************************  

        ******************************************MODELS**********************************************************************************

        Documentation for class models
        It represents database table for models, it stores
        brand = name of the brand for a particular phone
        model = name of the model for a particular phone
        This table spans all the phones-models in our database

        Documentation for method add_models 
        This method is used to add a model in model's table
        
        Documentation for method get_all_models 
        This method is used to retrieve all the model from the model's table

        Documentation for method get_modId 
        This method is used to retrieve the id of a model
        given its name from the model's table

        Documentation for method get_modId 
        This method is used to retrieve a model given its id from the model's table

        Documentation for method get_modId 
        This method is used to retrieve a model given its name from the model's table

        Documentation for method get_modId 
        This method is used to retrieve a model given its brand-name from the model's table

        **************************************************************************************************************************************  
        
        ******************************************PRICES**********************************************************************************
        
        Documentation for class prices
        It represents database table for prices, it stores
        supplier_id = id of the supplier who is selling this phone, from suppliers table
        mobile_id = id of the model of this phone, from models table
        quoted_price = price of the phone as offered by the supplier 
        final_price = price one has to pay to buy this phone, i.e
        it includes vat, tax and shippping charges
        extra_info = extra-info about this phone
        This table spans all the phones in our database

        Documentation for method add_prices 
        This method is used to add all the info for a phone in prices's table

        Documentation for method get_all_prices 
        This method is used to retrieve all the info for all the phones in prices's table

        Documentation for method get_prbyId 
        This method is used to retrieve all the info for a phone given its id in prices's table

        Documentation for method get_prbySid 
        This method is used to retrieve all the info for a phone in prices's table 
        given its supplier-id

        Documentation for method get_prbySid 
        This method is used to retrieve all the info for a phone in prices's table 
        given its model-id

        Documentation for method get_prbySid 
        This method is used to retrieve all the info for a phone in prices's table 
        given its model-name

        **************************************************************************************************************************************  

        ******************************************GUARANTEE and SHIPINFO**********************************************************************
        
        Documentation for class guarantee_info
        It represents database table for guarantee_info, it stores
        mid = id of the phone in models table
        guaranteeinfo = duaration of guarantee and whether guarantee is from vendor or manufacturer 
        shipinfo = how much time would be taken for shipping.
        This table spans all the phones in our database

        Documentation for method add_gs_info 
        This method is used to add guarantee and ship info for a phone

        Documentation for method get_all_gs_info 
        This method is used to retrieve guarantee and ship info for all the phones

        Documentation for method get_gs_bymid
        This method is used to retrieve guarantee and ship info for a phones
        given its phone-id

        **************************************************************************************************************************************  
        
        ******************************************extra_vars**********************************************************************************

        Documentation for class extra_vars
        It represents database table for extra_vars, it stores
        var = name of the variable
        val = value of the variable
        desc = description of the variable
        For some suppliers the number of pages to be crawled is not fixed,
        for them variables are created and based on the value of the variable 
        number of the pages to be crawled is determined dynamically

        Documentation for method set_extra_vars  
        This method is used to add a variable for a particular supplier.
        Some suppliers has no fixed count of number of pages on which they have data
        so to deal with that this function is used to set variables.

        Documentation for method get_extra_vars  
        This method is used to retrieve a variable's value given its name

        **************************************************************************************************************************************  

        ******************************************CRAWL**********************************************************************************

        Documentation for class crawl
        It represents database table for crawl, it stores
        crawled_date = date of crawling
        On each new crawl a new entry is made   
        Reason for creating this table is for retaining past data for comparison

        Documentation for method add_newcrawler 
        This method is used to add a new crawlid in crawl's table
        
        Documentation for method get_latestcrawler 
        This method is used to get the latest crawler from the crawl's table

        Documentation for method get_latestcrawlerid 
        This method is used to get the latest crawler-id from the crawl's table 

        **************************************************************************************************************************************  

        ******************************************OTHERS**************************************************************************************
        Documentation for class Datahelper
        This class contains various methods to access the database tables

        Documentation for method init
        Before using all the tables described in this module, one has to call this method

        Documentation for method initxy
        It calls a method init() so that when one needs to access the helper methods of 
        this class to access database, the database tables are in the scope

        **************************************************************************************************************************************

        ******************************************code_words**********************************************************************************

        Documentation for class code_words
        It represents database table for code_words used for parameters taken from forms, it stores
        key = name of the parameter
        value = the value of the parameter
        description = information about the parameter, for what purpose its stored

        Documentation for method init
        Before using all the tables described in this module, one has to call this method

        Documentation for method initialize_table
        It calls a method init() so that when one needs to access the helper methods of 
        this class to access database, the database tables are in the scope
        
        Documentation for method set_code_word 
        This method is used to set a code_word through input from form
        
        Documentation for method get_code_word_byId 
        This method is used to retrieve a code_word given its id

        Documentation for method get_code_word 
        This method is used to retrieve the value of a code_word given its key i.e name

        Documentation for method get_allcode_words 
        This method is used to retrieve all the code_words given the key i.e name

        Documentation for method get_code_word_byKey 
        This method is used to retrieve the code_word given its key i.e name

        
        **************************************************************************************************************************************