Blame | Last modification | View Log | RSS feed
Price AnalyserIt's basically a information retrieval and extraction tool. In our case this information is price and other features for a phone from different suppliers.The tool can also be used for comparison between the two supplier for a same phone.Brief description of its working:With the help of Turbogears2.0 forms are prepared for specifying certain parameters which would be helpful in which web page we want to retrieveand from that web page what we want to retrieve. These values are dumped into database.Now using Scrapy and Xpaths specifed as parameters in the forms, various information about the phones are captured and dumped into the database.Google charting API's are used for comparison and graphical viewing pleasure.The software also provides querying and searching functionality(a basic search engine functionality). Lucene is used for embedding this featureinto our application.Detailled description:Forms part:VIEWpath to files : home/gaurav/code/TG2TEST/src/wiki20/templatescodedata_forms.html: Give us the link for individual formsinput_form.html : Form for a particular supplier, supplier depends on the link choosen in the codedata_forms.htmlMODELpath to files : home/gaurav/code/TG2TEST/src/wiki20/widgetsbabuchak_form.py : Structure of the form for babuchak, contains the parameters that needs to be taken as input from the userindiaplaza_form.py : Structure of the form for indiaplaza, contains the parameters that needs to be taken as input from the userinfibeam_form.py : Structure of the form for infibeam, contains the parameters that needs to be taken as input from the usermobilestore_form.py : Structure of the form for mobilestore, contains the parameters that needs to be taken as input from the usernaaptol_form.py : Structure of the form for naaptol, contains the parameters that needs to be taken as input from the userunivercell_form.py : Structure of the form for univercell, contains the parameters that needs to be taken as input from the userCONTROLLERpath to files : home/gaurav/code/TG2TEST/src/wiki20/controllersroot.py : All the processing part is done here, input is taken here, on submit this input is saved in database.There are two type of methods:1) form_infibeam : To create the structure it calls the template and pass it the values present in DB as default values2) save_infibeam : Take the values given by user, store it in DB and then then show the template for taking input again.Same template is used for both the type of methods.Scrapy:Here individual spiders are written for each supplier, for some extra scripts are also needed because the number of pages they have for informationis variable. These include infibeam, indiaplaza and mobilestore. These scripts are places in home/gaurav/code named runinfibeam, runindiaplaza andrunmobstore respectively.******************************************INFIBEAM**********************************************************************************Documentation for script runinfibeam.pyThis is the script called by consetup.pyFirst it will run the spider for infibeam dynamically (i.e for determining no of pages)Then, it will generate the csv file@param path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsepDocumentation for class infi_spiderThis spider collects the information for the individual phonesand store them in table datastore_datadefinition_infibeam_dataDocumentation for class infibeam_dataIt represents database table for infibeam, it storesname = name of the phoneshown_price = price offered by infibeamfinal_price = price which one has to pay,final_price = shown_price + taxes + ship-priceDocumentation for method add_infiphoneThis method is used to add a phone in infibeam's tableDocumentation for method get_all_infibeam_dataThis method is used to retrieve all the phones in infibeam's tableDocumentation of various parametersXpath1 = Give us section for individual phoneXpath2 = Give us name for individual phoneXpath3 = Give us quoted price for individual phonevatplustax = to get final price from quoted priceRemovelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'URL = The start url for the supplier from this page we start crawlinghomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname = Name by which this spider is known outside, passed as an argument when calling this spider********************************************************************************************************************************************************************************INDIAPLAZA**********************************************************************************Documentation for script runindiaplaza.pyThis is the script called by consetup.pyFirst it will run the spiders for indiaplaza dynamically (i.e for determining no of pages)Then, it will generate the csv file@param path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsepDocumentation for class indiaplaza_spiderThis spider collects the url for the individual phonesand store them in table datastore_datadefinition_indiaplaza_data.Documentation for class indiaplaza_extraThis spider collects all the information for the individual phonesand store them in table datastore_datadefinition_indiaplaza_items.Documentation for class indiaplaza_dataIt represents database table for indiaplaza, it storesv_name = name of the vendorv_site = url of the vendorDocumentation for class indiaplaza_itemsIt represents database table for indiaplaza, it storesp_name = name of the phonep_shown_price = price offered by indiaplazap_final_price = price which one has to pay,p_final_price = p_shown_price + taxes + ship-pricep_guaranteeinfo = duaration of guarantee and whether guarantee is from vendor or manufacturerp_shipinfo = how much time would be taken for shippingDocumentation for method add_ipbasicThis method is used to add a url for phone in indiaplaza's tableDocumentation for method get_all_ipbasicThis method is used to retrieve all phone-urls in indiaplaza's tableDocumentation for method add_ipextraThis method is used to add a phone in indiaplaza's tableDocumentation for method get_all_indiaplaza_phonesThis method is used to retrieve all the phones in indiaplaza's tableDocumentation of various parametersXpath1 = Give us section for individual phoneXpath2 = Give us name of individual phoneXpath3 = Give us url of individual phoneUrl1 = To get full url for individual phonesXpath4 = Give us name for individual phoneXpath5 = Give us quoted-price for individual phoneXpath6 = Give us ship-price for individual phoneXpath7 = Give us ship_price for individual phone, if not gettable form xpath6Xpath8 = Give us guarantee-info for individual phoneXpath9 = Give us guarantee-info for individual phone, if not gettable form xpath8Xpath10 = Give us ship-info for individual phoneRemovelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'URL = The start url for the supplier from this page we start crawlinghomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname = Name by which the first spider is known outside, passed as an argument when calling this spiderdomainname1 = Name by which the second spider is known outside, passed as an argument when calling this spidervar1 = To check for free shipping********************************************************************************************************************************************************************************MOBILESTORE**********************************************************************************Documentation for script runmobstore.pyThis is the script called by consetup.pyFirst it will run the spider for mobilestore dynamically (i.e for determining no of pages)Then, it will generate the csv file@param path to the folder in which spider-projects reside (:/home/gaurav/code) but start with pathsepDocumentation for class mobilestore_spider0This spider collects the information for the individual phonesand store them in table datastore_datadefinition_themobilestorephones_newDocumentation for class themobilestorephones_newIt represents database table for themobilestore, it storesname = name of the phoneshown_price = price offered by themobilestorefinal_price = price which one has to pay,final_price = shown_price + taxes + ship-priceextra_info = whether phone can be bought or notDocumentation for method add_new_mobstorephone_newThis method is used to add a phone in themobilestore's tableDocumentation for method get_allmobstorephone_newThis method is used to retrieve all the phones in themobilestore's tableDocumentation of various parametersXpath3 = Give us name for individual phoneXpath4 = Give us price for individual phoneXpath5 = Give us name for individual phone, if its not gettable from xpath3Xpath6 = Give us name for individual phone, if its not gettable from xpath3 and xpath5Xpath7 = to check that the phone can be bought or notXpath8 = to check that the item is mobile phoneurl1 and url2 = used for getting actual start urlshomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname0 = Name by which this spider is known outside, passed as an argument when calling this spider********************************************************************************************************************************************************************************UNIVERCELL**********************************************************************************Documentation for class vendor_linksThis spider collects the url for the individual vendorsand store them in table datastore_datadefinition_univercell_data.Documentation for class univercell_priceThis spider collects the information for the individual phonesand store them in table datastore_datadefinition_univercell_itemsDocumentation for class univercell_dataIt represents database table for univercell, it storesv_name = name of the vendorv_site = url of the vendorDocumentation for class univercell_itemsIt represents database table for univercell, it storesp_title = name of the phonep_shown_price = price offered by univercellp_final_price = price which one has to pay,p_final_price = p_shown_price + taxes + ship-priceDocumentation for method add_univervendorThis method is used to add a vendor in univercell's tableDocumentation for method get_all_univervendorThis method is used to retrieve all the vendors in univercell's tableDocumentation for method add_univerphoneThis method is used to add a phone in univercell's tableDocumentation for method get_all_univercell_phonesThis method is used to retrieve all the phones in univercell's tableDocumentation of various parametersXpath1 = Give us section for individual vendorsXpath2 = Give us name for individual vendorsXpath3 = Give us url for individual vendorsUrl1 = To get full url for individual vendorsvar1,var2,var3 and var4 are used to get proper urlXpath4 = Give us section for individual phoneXpath5 = Give us name for individual phoneXpath6 = Give us quoted-price for individual phonevatplustax = Give us final_price for individual phone on adding with quoted-priceRemovelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'URL = The start url for the supplier from this page we start crawlinghomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname = Name by which the first spider is known outside, passed as an argument when calling this spiderdomainname1 = Name by which the second spider is known outside, passed as an argument when calling this spiderUrl1 = To get full url for individual vendor********************************************************************************************************************************************************************************BABUCHAK**********************************************************************************Documentation for class babuchak1This spider collects the url for the individual vendorsand store them in table datastore_datadefinition_babuchak_urls.Documentation for class babuchak2This spider collects the url for the individual phonesand store them in table datastore_datadefinition_babuchak_phoneurls.Documentation for class babuchak3This spider collects the information for the individual phonesand store them in table datastore_datadefinition_babuchak_phones.Documentation for class babuchak_urlsIt represents database table for babuchak, it storesurl = url for the vendorsno_pages = number of pages for individual vendorDocumentation for class babuchak_phoneurlsIt represents database table for babuchak, it storesurl = url for the individual phonesDocumentation for class babuchak_phonesIt represents database table for babuchak, it storesname = name of the phoneshown_price = price offered by babuchakfinal_price = price which one has to pay,final_price = shown_price + taxes + ship-priceDocumentation for method add_babuchakurlThis method is used to add a url for vendor in babuchak's tableDocumentation for method get_allbabuchakurlsThis method is used to retrieve all the vendor-urls in babuchak's tableDocumentation for method add_babuchakphoneurlThis method is used to add a url for phone in babuchak's tableDocumentation for method get_allbabuchakphoneurlsThis method is used to retrieve all the phone-urls in babuchak's tableDocumentation for method add_babuchakphoneThis method is used to add a phone in babuchak's tableDocumentation for method get_allbabuchakphonesThis method is used to retrieve all the phone in babuchak's tableDocumentation of various parametersXpath1 = Give us section for individual vendorsXpath2 = Give us no of pages for individual vendorsXpath3 = Give us url for individual vendorsUrl1 = To get full url for individual vendorsXpath4 = Give us url for individual phoneUrl2 = To get full url for individual vendorsXpath5 = Give us name for individual phoneXpath6 = Give us quoted-price for individual phoneXpath7 = Give us final_price for individual phoneRemovelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'URL = The start url for the supplier from this page we start crawlinghomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname = Name by which the first spider is known outside, passed as an argument when calling this spiderdomainname1 = Name by which the second spider is known outside, passed as an argument when calling this spiderdomainname2 = Name by which the third spider is known outside, passed as an argument when calling this spider********************************************************************************************************************************************************************************NAAPTOL**********************************************************************************Documentation for class naaptol_spiderThis spider collects the url for the individual phonesand store them in table datastore_datadefinition_naaptol_urls.Documentation for class naaptol_priceSince the urls collected in the previous spider for naaptol.comare redirected to get the data for individual phones.Some are of the form "http://www.naaptol.com/features/10417-Fly-E300.html"while others are of the form "http://www.naaptol.com/price/10417-Fly-E300.html".So to make data extraction symmetric, this spider will accomplish 2 tasksFirst, for the urls conatining 'features' it collects the information for theindividual phones and store them in table datastore_datadefinition_naaptol_phonesfor the ones conatining 'prices' in the url, a new url having 'price' repalcedwith 'features' is framed and stored in the table datastore_datadefinition_morenaaptol_urls.Documentation for class naaptol_priceSpider collects the information for the individual phones and store them in tabledatastore_datadefinition_naaptol_phonesDocumentation for class naaptolurlsIt represents database table for naaptol, it storesurl = url of the phones, which we got from sitemap.xmlDocumentation for class naaptolurlsIt represents database table for naaptol, it storesurl = url of the phones, here urls are the ones which are redirectedand contained 'price' but before storing 'price' is replaced by 'features'Documentation for class naaptolphonesIt represents database table for naaptol, it storesname = name of the phone,range = price range for each phonerange is in one of the 3 forms, i.erange = a to brange = arange = a(approx)here a,b are integersDocumentation for class ntonlinespIt represents database table for naaptol, it storesnid = id of the phone in naaptolphonesname = name of the onlinesupplier,price = price offered by the supplier for the phoneDocumentation for class ntofflinespIt represents database table for naaptol, it storesnid = id of the phone in naaptolphonesname = name of the offlinesupplier,price = price offered by the supplier for the phoneDocumentation for method add_naaptolurlThis method is used to add a url for phone in naaptol's tableThese are taken from sitemap.xmlDocumentation for method get_allnaaptolurlsThis method is used to retrieve all the url for phones in naaptol's tableDocumentation for method add_morenaaptolurlThis method is used to add a url for phone in naaptol's tableThese are the urls generated by replacing 'price' with 'features'which were redirectedDocumentation for method get_allmorenaaptolurlsThis method is used to retrieve all the extra urls for phones in naaptol's tableDocumentation for method add_new_naaptolphoneThis method is used to add a phone in naaptol's tableDocumentation for method get_naaptolphoneThis method is used to retrieve a phone in naaptol's tablegiven its name and rangeDocumentation for method get_allnaaptolphonesThis method is used to retrieve all the phone in naaptol's tableDocumentation for method add_new_ntonlinespThis method is used to add a online-supplier for a particular phonein naaptol's tableDocumentation for method get_allntonlinespThis method is used to retrieve all the online-suppliers in naaptol's tableDocumentation for method get_allntonlinespbynidThis method is used to retrieve all the online-supplier for a particular phonein naaptol's table given the id of phone in naaptolphonesDocumentation for method add_new_ntofflinespThis method is used to add a offline-supplier for a particular phonein naaptol's tableDocumentation for method get_allntofflinespThis method is used to retrieve all the offline-suppliers in naaptol's tableDocumentation for method get_allntolinespbynidThis method is used to retrieve all the offline-supplier for a particular phonein naaptol's table given the id of phone in naaptolphonesDocumentation of various parametersXpath1 = Give us url for individual phoneschklist1 = elements in chk_list are specific to this site for determining valid sitesXpath2 = Give us price-range for individual phoneXpath3 = Give us price-range for individual phone, if unable to retrieve from xpath2Xpath4 = Give us number of onlinesellers for a particular phoneXpath5 = Give us price for a particular phone offered by onlinesellersXpath6 and Xpath7 = Give us name of onlinesellers for a particular phoneXpath8 = Give us number of offlinesellers for a particular phoneXpath9 = Give us price for a particular phone offered by offlinesellersXpath10 = Give us name of offlinesellers for a particular phoneRemovelist = To filer the prices so as to make them integer for eg remove ',' or 'Rs'chklist2 = contains what needs to be replaced, presently it conatains 'price'part = contains 'features'URL = The start url for the supplier from this page we start crawlinghomepage = Homepage for the supplierreferer = For some suppliers, the spiders were unable to fetch the data without setting this field, I set it to google/searchdomainname = Name by which the first spider is known outside, passed as an argument when calling this spiderdomainname1 = Name by which the second spider is known outside, passed as an argument when calling this spiderdomainname2 = Name by which the third spider is known outside, passed as an argument when calling this spider********************************************************************************************************************************************************************************SUPPLIERS**********************************************************************************Documentation for class suppliersIt represents database table for suppliers, it storesname = name of the suppliersite = url of the supplierlast_crawled = date of the last run for this supplierThis table spans all the suppliers in our databaseDocumentation for method add_suppliersThis method is used to add a supplier in supplier's tableDocumentation for method get_all_suppliersThis method is used to retrieve all the suppliers from the supplier's tableDocumentation for method get_suppIdThis method is used to retrieve id of the supplier given his namefrom the supplier's tableDocumentation for method get_supp_byIdThis method is used to retrieve a supplier given his idfrom the supplier's tableDocumentation for method get_suppbyNameThis method is used to retrieve a supplier given his namefrom the supplier's tableDocumentation for method get_suppbySiteThis method is used to retrieve a supplier given his urlfrom the supplier's table********************************************************************************************************************************************************************************MODELS**********************************************************************************Documentation for class modelsIt represents database table for models, it storesbrand = name of the brand for a particular phonemodel = name of the model for a particular phoneThis table spans all the phones-models in our databaseDocumentation for method add_modelsThis method is used to add a model in model's tableDocumentation for method get_all_modelsThis method is used to retrieve all the model from the model's tableDocumentation for method get_modIdThis method is used to retrieve the id of a modelgiven its name from the model's tableDocumentation for method get_modIdThis method is used to retrieve a model given its id from the model's tableDocumentation for method get_modIdThis method is used to retrieve a model given its name from the model's tableDocumentation for method get_modIdThis method is used to retrieve a model given its brand-name from the model's table********************************************************************************************************************************************************************************PRICES**********************************************************************************Documentation for class pricesIt represents database table for prices, it storessupplier_id = id of the supplier who is selling this phone, from suppliers tablemobile_id = id of the model of this phone, from models tablequoted_price = price of the phone as offered by the supplierfinal_price = price one has to pay to buy this phone, i.eit includes vat, tax and shippping chargesextra_info = extra-info about this phoneThis table spans all the phones in our databaseDocumentation for method add_pricesThis method is used to add all the info for a phone in prices's tableDocumentation for method get_all_pricesThis method is used to retrieve all the info for all the phones in prices's tableDocumentation for method get_prbyIdThis method is used to retrieve all the info for a phone given its id in prices's tableDocumentation for method get_prbySidThis method is used to retrieve all the info for a phone in prices's tablegiven its supplier-idDocumentation for method get_prbySidThis method is used to retrieve all the info for a phone in prices's tablegiven its model-idDocumentation for method get_prbySidThis method is used to retrieve all the info for a phone in prices's tablegiven its model-name********************************************************************************************************************************************************************************GUARANTEE and SHIPINFO**********************************************************************Documentation for class guarantee_infoIt represents database table for guarantee_info, it storesmid = id of the phone in models tableguaranteeinfo = duaration of guarantee and whether guarantee is from vendor or manufacturershipinfo = how much time would be taken for shipping.This table spans all the phones in our databaseDocumentation for method add_gs_infoThis method is used to add guarantee and ship info for a phoneDocumentation for method get_all_gs_infoThis method is used to retrieve guarantee and ship info for all the phonesDocumentation for method get_gs_bymidThis method is used to retrieve guarantee and ship info for a phonesgiven its phone-id********************************************************************************************************************************************************************************extra_vars**********************************************************************************Documentation for class extra_varsIt represents database table for extra_vars, it storesvar = name of the variableval = value of the variabledesc = description of the variableFor some suppliers the number of pages to be crawled is not fixed,for them variables are created and based on the value of the variablenumber of the pages to be crawled is determined dynamicallyDocumentation for method set_extra_varsThis method is used to add a variable for a particular supplier.Some suppliers has no fixed count of number of pages on which they have dataso to deal with that this function is used to set variables.Documentation for method get_extra_varsThis method is used to retrieve a variable's value given its name********************************************************************************************************************************************************************************CRAWL**********************************************************************************Documentation for class crawlIt represents database table for crawl, it storescrawled_date = date of crawlingOn each new crawl a new entry is madeReason for creating this table is for retaining past data for comparisonDocumentation for method add_newcrawlerThis method is used to add a new crawlid in crawl's tableDocumentation for method get_latestcrawlerThis method is used to get the latest crawler from the crawl's tableDocumentation for method get_latestcrawleridThis method is used to get the latest crawler-id from the crawl's table********************************************************************************************************************************************************************************OTHERS**************************************************************************************Documentation for class DatahelperThis class contains various methods to access the database tablesDocumentation for method initBefore using all the tables described in this module, one has to call this methodDocumentation for method initxyIt calls a method init() so that when one needs to access the helper methods ofthis class to access database, the database tables are in the scope********************************************************************************************************************************************************************************code_words**********************************************************************************Documentation for class code_wordsIt represents database table for code_words used for parameters taken from forms, it storeskey = name of the parametervalue = the value of the parameterdescription = information about the parameter, for what purpose its storedDocumentation for method initBefore using all the tables described in this module, one has to call this methodDocumentation for method initialize_tableIt calls a method init() so that when one needs to access the helper methods ofthis class to access database, the database tables are in the scopeDocumentation for method set_code_wordThis method is used to set a code_word through input from formDocumentation for method get_code_word_byIdThis method is used to retrieve a code_word given its idDocumentation for method get_code_wordThis method is used to retrieve the value of a code_word given its key i.e nameDocumentation for method get_allcode_wordsThis method is used to retrieve all the code_words given the key i.e nameDocumentation for method get_code_word_byKeyThis method is used to retrieve the code_word given its key i.e name**************************************************************************************************************************************