| 1336 |
rajveer |
1 |
To run on a fresh machine following steps need to be followed:
|
|
|
2 |
|
|
|
3 |
1)install python
|
|
|
4 |
2)install external packages elixir, sql, turbogears, lucene, os, sys, subprocess, smtplib, email, urllib
|
|
|
5 |
3)install eclipse gallileo
|
|
|
6 |
4)copy the folder named 'code' into your machine
|
|
|
7 |
5)set PYTHONPATH in the eclipse
|
|
|
8 |
6)start the sqlserver by the following command
|
|
|
9 |
sudo /path-to-mysql/mysql.server start
|
|
|
10 |
mysql -u root
|
|
|
11 |
7)create a database named 'phonecrawler'
|
|
|
12 |
9)run the script test.py using the command
|
|
|
13 |
python /path-to-test.py/test.py /path-to-test.py
|
|
|
14 |
10)One can also change the crawling interval between 2 pages for a spider by modifying the settings file for that spider,for e.g for infibeam
|
|
|
15 |
the file is "/path-to-all-the-projects/infibeamScrapy/src/demo/settings.py"
|
|
|
16 |
Just modify the variable "DOWNLOAD_DELAY " its unit is in seconds.
|
|
|
17 |
|
|
|
18 |
|
|
|
19 |
|
|
|
20 |
For taking dump of database following command can be used:
|
|
|
21 |
/path-to-mysqldump/.mysqldump -u root phonecrawler>~/file.sql
|
|
|
22 |
|
|
|
23 |
|
|
|
24 |
|
|
|
25 |
|
|
|
26 |
Dependencies
|
|
|
27 |
|
|
|
28 |
All the projects and scripts need to be placed in a separate folder and the path till that folder needs to be given as input parameter.
|
|
|
29 |
|
|
|
30 |
One can also change the crawling interval between 2 pages for a spider by modifying the settings file for that spider,for e.g for infibeam
|
|
|
31 |
the file is "/path-to-all-the-projects/infibeamScrapy/src/demo/settings.py"
|
|
|
32 |
Just modify the variable "DOWNLOAD_DELAY " its unit is in seconds.
|
|
|
33 |
|
|
|
34 |
Before starting the application i.e. running the script a database named phonecrawler needs to be created
|
|
|
35 |
|
|
|
36 |
|
|
|
37 |
Known issues
|
|
|
38 |
If you make a separate script for any other spider like I made for infibeam(runinfibeam.py), then if there is any external libraries imported in the spider
|
|
|
39 |
then the the PYTHONPATH to them must be set in the script.
|
|
|
40 |
|
|
|
41 |
Logo of turbogears needs to be removed from forms, need to modify the template.
|
|
|
42 |
|
|
|
43 |
Some hints about the parameters should be shown in the forms
|
|
|
44 |
|
|
|
45 |
|
|
|
46 |
|
|
|
47 |
|