Blame | Last modification | View Log | RSS feed
To run on a fresh machine following steps need to be followed:1)install python2)install external packages elixir, sql, turbogears, lucene, os, sys, subprocess, smtplib, email, urllib3)install eclipse gallileo4)copy the folder named 'code' into your machine5)set PYTHONPATH in the eclipse6)start the sqlserver by the following commandsudo /path-to-mysql/mysql.server startmysql -u root7)create a database named 'phonecrawler'9)run the script test.py using the commandpython /path-to-test.py/test.py /path-to-test.py10)One can also change the crawling interval between 2 pages for a spider by modifying the settings file for that spider,for e.g for infibeamthe file is "/path-to-all-the-projects/infibeamScrapy/src/demo/settings.py"Just modify the variable "DOWNLOAD_DELAY " its unit is in seconds.For taking dump of database following command can be used:/path-to-mysqldump/.mysqldump -u root phonecrawler>~/file.sqlDependenciesAll the projects and scripts need to be placed in a separate folder and the path till that folder needs to be given as input parameter.One can also change the crawling interval between 2 pages for a spider by modifying the settings file for that spider,for e.g for infibeamthe file is "/path-to-all-the-projects/infibeamScrapy/src/demo/settings.py"Just modify the variable "DOWNLOAD_DELAY " its unit is in seconds.Before starting the application i.e. running the script a database named phonecrawler needs to be createdKnown issuesIf you make a separate script for any other spider like I made for infibeam(runinfibeam.py), then if there is any external libraries imported in the spiderthen the the PYTHONPATH to them must be set in the script.Logo of turbogears needs to be removed from forms, need to modify the template.Some hints about the parameters should be shown in the forms