WordPress实现蜘蛛爬行分析页面

文章目录

这个蜘蛛爬行页面确实很好玩,我也是在千丝海阁那里看到的,自己顺手就在博客上弄了一个,我基本上没做改动,不过他是分两篇文章写得,我在这篇文章里详细的介绍下怎么实现。

为了避免functions.php过于臃肿,我单独把蜘蛛爬行分析页面的代码放在了一个php文件中,命名为:mylogs.php,放在当前主题目录下。

生成网站日志

首先是先生成网站日志,把以下代码放入mylogs.php中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<?php
//保存日志文件至mylogs.txt
make_log_file();
function make_log_file(){
        //log文件名
	$filename = 'mylogs.txt'; 
        //去除rc-ajax评论以及cron机制访问记录
	if(strstr($_SERVER["REQUEST_URI"],"rc-ajax")== false 
		&& strstr($_SERVER["REQUEST_URI"],"wp-cron.php")== false ) {
		$word .= date('mdHis',$_SERVER['REQUEST_TIME'] + 3600*8) . " ";
                //访问页面
		$word .= $_SERVER["REQUEST_URI"] ." ";
                //协议
		$word .= $_SERVER['SERVER_PROTOCOL'] ." ";
                //方法,POST OR GET
		$word .= $_SERVER['REQUEST_METHOD'] . " ";
		//$word .= $_SERVER['HTTP_ACCEPT'] . " ";
                //获得浏览器信息
		$word .= getbrowser(). " ";
                //传递参数
		$word .= "[". $_SERVER['QUERY_STRING'] . "] ";
                //跳转地址
		$word .= $_SERVER['HTTP_REFERER'] . " ";
                //获取IP
		$word .= getIP() . " ";
        $word .= "\n";
        $day = date('md',$_SERVER['REQUEST_TIME'] + 3600*8);    
        if (file_exists($filename)) {
        $fh = fopen($filename, "r");
        $data = fread($fh, 10);
        if(substr($data,0,4) == $day) 
	    $fh = fopen($filename, "a");
        else 
	    $fh = fopen($filename, "w");
        fwrite($fh, $word);    
        fclose($fh);
	}
   //endif;
}
}
//获取IP地址,网上现成代码
function getIP() //get ip address
    {
        if (getenv('HTTP_CLIENT_IP')) 
        {
            $ip = getenv('HTTP_CLIENT_IP');
        } 
        else if (getenv('HTTP_X_FORWARDED_FOR')) 
        {
            $ip = getenv('HTTP_X_FORWARDED_FOR');
        } 
        else if (getenv('REMOTE_ADDR')) 
        {
            $ip = getenv('REMOTE_ADDR');
        } 
        else 
        {
            $ip = $_SERVER['REMOTE_ADDR'];
        }
        return $ip;
    }
//获取浏览器信息,移动端,平板电脑数据还未加上。
 function getbrowser()
    {
        $Agent = $_SERVER['HTTP_USER_AGENT'];
        $browser = '';
        $browserver = '';
 
        if(ereg('Mozilla', $Agent) && ereg('Chrome', $Agent))
        {
            $temp = explode('(', $Agent);
            $Part = $temp[2];
            $temp = explode('/', $Part);
            $browserver = $temp[1];
            $temp = explode(' ', $browserver);
            $browserver = $temp[0];
            $browserver = $browserver;
            $browser = 'Chrome';
        }
		if(ereg('Mozilla', $Agent) && ereg('Firefox', $Agent))
        {
            $temp = explode('(', $Agent);
            $Part = $temp[1];
            $temp = explode('/', $Part);
            $browserver = $temp[2];
            $temp = explode(' ', $browserver);
            $browserver = $temp[0];
            $browserver = $browserver;
            $browser = 'Firefox';
        }
        if(ereg('Mozilla', $Agent) && ereg('Opera', $Agent)) 
        {
            $temp = explode('(', $Agent);
            $Part = $temp[1];
            $temp = explode(')', $Part);
            $browserver = $temp[1];
            $temp = explode(' ', $browserver);
            $browserver = $temp[2];
            $browserver = $browserver;
            $browser = 'Opera';
        }
        if(ereg('Mozilla', $Agent) && ereg('MSIE', $Agent))
        {
            $temp = explode('(', $Agent);
            $Part = $temp[1];
            $temp = explode(';', $Part);
            $Part = $temp[1];
            $temp = explode(' ', $Part);
            $browserver = $temp[2];
            $browserver = $browserver;
            $browser = 'Internet Explorer';
        }
        if($browser != '')
        {
            $browseinfo = $browser.' '.$browserver;
        } 
        else
        {
            $browseinfo = $_SERVER['HTTP_USER_AGENT'];
        }
        return $browseinfo;
    }
?>

它会自动在网站根目录生成网站日志,如https://www.tennfy.com/mylogs.txt。

日志分析代码

下面这段代码的作用就是读取上面的mylogs.txt,通过正则匹配蜘蛛特征标志然后进行统计,并显示结果。创建短代码spiderlogs,该段代码可用参数text,默认为yes,生成文本描述+圆饼图,如只需要显示圆饼图,则设置text为no即可。同样是放入mylogs.php中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
<?php	
//日志文件查看
function get_spider_log($atts) {
	extract(shortcode_atts(array(
    'text' => 'yes'),$atts));
	$fh = fopen(site_url() ."/mylogs.txt", "r");
	$contents = "";
	    while(!feof($fh)){
        $contents .= fread($fh, 8080);
    }
    fclose($fh);
	$str = "";
	$showtime=date("md");
	if($text == "yes") {
		$str.= "当天蜘蛛爬行记录:";	
		$str.= "<div style='background-color:#33A1C9;color:white;text-align:center;'>以下为国内常用蜘蛛。</div>";
	}
	$mytmp = array();
	//google
	$google = 0;
	if($text == "yes")
		$str.= "<a href=http://www.google.com/bot.html target=_blank>Google Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Googlebot\/",$text);
	$google += $mytmp[0];
	$str.= $mytmp[1];
	$mytmp = show_spider_result($showtime,$contents,"Googlebot-Image\/",$text);
	$google += $mytmp[0];
	$str.= $mytmp[1];
	$mytmp = show_spider_result($showtime,$contents,"Googlebot-Mobile\/",$text);
	$google += $mytmp[0];
	$str.= $mytmp[1];
	$mytmp = show_spider_result($showtime,$contents,"Feedfetcher-Google",$text);
	$google += $mytmp[0];
	$str.= $mytmp[1];
 
	// baidu
	$baidu = 0;
	if($text == "yes")
		$str.= "<br><a href=http://www.baidu.com/search/spider.html target=_blank>Baidu Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Baiduspider\/",$text);
	$baidu += $mytmp[0];
	$str.= $mytmp[1];
	$mytmp = show_spider_result($showtime,$contents,"Baiduspider-image",$text);
	$baidu += $mytmp[0];
	$str.= $mytmp[1];
 
	//bing
	$bing = 0;
	if($text == "yes")
		$str.= "<br><a href=http://www.bing.com/bingbot.htm target=_blank>bingbot Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"bingbot\/",$text);
	$bing += $mytmp[0];
	$str.= $mytmp[1];
	$mytmp = show_spider_result($showtime,$contents,"msnbot-media\/",$text);
	$bing += $mytmp[0];
	$str.= $mytmp[1];
 
	//sogou
	$sogou = 0;
	if($text == "yes")
		$str.= "<br><a href=http://www.sogou.com/docs/help/webmasters.htm#07 target=_blank>Sogou Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Sogou web spider\/",$text);
	$sogou += $mytmp[0];
	$str.= $mytmp[1];
 
	//soso
	$soso = 0;
	if($text == "yes")
		$str.= "<br><a href=http://help.soso.com/webspider.htm target=_blank>Soso Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Sosospider\/",$text);
	$soso += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<div style='background-color:#FA8072;color:white;text-align:center;'>以下为垃圾蜘蛛,可屏蔽抓取。</div>";
	//jike
	$else = 0;
	if($text == "yes")
		$str.= "<a href=http://shoulu.jike.com/spider.html target=_blank>Jike Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"JikeSpider",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	//easou
	if($text == "yes")
		$str.= "<br><a href=http://www.easou.com/search/spider.html target=_blank>Easou Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"EasouSpider",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	//yisou
	if($text == "yes")
		$str.= "<br>YisouSpider:";
	$mytmp = show_spider_result($showtime,$contents,"YisouSpider",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<br><a href=http://yandex.com/bots target=_blank>YandexBot Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"YandexBot\/",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<br><a href=http://go.mail.ru/help/robots target=_blank>Mail.RU Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Mail.RU_Bot\/",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<br><a href=http://www.acoon.de/robot.asp target=_blank>AcoonBot Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"AcoonBot\/",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<br><a href=http://www.exabot.com/go/robot target=_blank>Exabot Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"Exabot\/",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	if($text == "yes")
		$str.= "<br><a href=http://www.seoprofiler.com/bot target=_blank>spbot Spider</a>: ";
	$mytmp = show_spider_result($showtime,$contents,"spbot\/",$text);
	$else += $mytmp[0];
	$str.= $mytmp[1];
 
	$str.= draw_canvas($google,$baidu,$bing,$sogou,$soso,$else);
	return $str;
}
function show_spider_result($time,$contents,$str,$text){
	$count = array();
	$count[0] = preg_match_all("/".$time."\d*\s\/\S*\s.*".$str."/",$contents,$mymatches);
	if($text == "yes") {
		$str = preg_replace("{\\\/}","",$str);
		$count[1].= "<br> 蜘蛛类型=>".$str.": 爬行次数=".$count[0];
		if($count[0] >0) {
			$tmp = substr($mymatches[0][$count[0]-1],4,6);
			$tmp = substr($tmp,0,2) .":" . substr($tmp,2,2) .":" .substr($tmp,4,2) ;
			$count[1].= " 最后爬行时间:". $tmp;
		}
	}
	return $count;
}
function draw_canvas($google,$baidu,$bing,$sogou,$soso,$else){
	$tmp = $google + $baidu + $bing + $sogou + $soso + $else;
	if($tmp == 0) {
		return "<br><br>数据不足,无法生成分析图。<br><br>";
	}
	$google2 = $google*100/$tmp;
	$baidu2 = $baidu*100/$tmp;
	$bing2 = $bing*100/$tmp;
	$sogou2 = $sogou*100/$tmp;
	$soso2 = $soso*100/$tmp;
	$else2 = $else*100/$tmp;
	$str.= "<br><div style='border-top: 1px solid #e6e6e6;'><br>
	<div style='float:left;width:150px;border-width:1px;border-style:groove;padding:15px;'><b>蜘蛛爬行分析图:</b><br>";
	$str.= "日期:" . date("Y-m-d");
	$str.= "<br>蜘蛛一共爬行". $tmp . "次:<br>";
	$str.= "<li><span style='color:#33A1C9;'>google:". $google ."次(". intval($google2) ."%)</span></li>";
	$str.= "<li><span style='color:#0033ff;'>baidu:". $baidu ."次(". intval($baidu2) ."%)</span></li>";
	$str.= "<li><span style='color:#872657;'>bing:". $bing ."次(". intval($bing2) ."%)</span></li>";
	$str.= "<li><span style='color:#FF9912;'>sogou:". $sogou ."次(". intval($sogou2) ."%)</span></li>";
	$str.= "<li><span style='color:#FF6347;'>soso:". $soso ."次(". intval($soso2) ."%)</span></li>";
	$str.= "<li><span style='color:#55aa00;'>else:". $else ."次(". (100 - intval($google2) - intval($baidu2) - intval($bing2) - intval($sogou2) - intval($soso2)) ."%)</span></li></div>";
	$str.=	"<img src = 'http://chart.apis.google.com/chart?cht=p3&chco=33A1C9,0033ff,872657,FF9912,FF6347,55aa00&chd=t:".$google2 .",".$baidu2.",".$bing2.",".$sogou2.",".$soso2.",".$else2."&chs=400x200&chl=google|baidu|bing|sogou|soso|else' /></div><br>";
	return $str;
}
add_shortcode('spiderlogs','get_spider_log');
?>

建立蜘蛛爬行分析页面

在functions.php中引入mylogs.php:

1
include_once(TEMPLATEPATH.'/mylogs.php');

在wordpress后台建立页面,名字我取得是蜘蛛爬行,在内容中调用短代码[ spiderlogs ]即可(实际使用中,请去除括号内的空格)。

小结

效果页面:https://www.tennfy.com/mylogs

你可以下载我写好的文件:http://pan.baidu.com/s/1jGgiKui

参考文章:
WordPress技巧:生成网站访问日志
WordPress技巧:蜘蛛爬行分析页面的实现

本文出自 TENNFY博客,转载时请注明出处及相应链接。

本文永久链接: https://www.tennfy.com/1524.html

下一篇文章:

上一篇文章:

15人参与了讨论

  1. 极雪 说:

    用了之后打开首页变成空的了。。。。。。

  2. 银基网 说:

    年后初访,表示支持

  3. CY's BLOG 说:

    似曾相识。

    我记得我之前也想搞个这个。。想想没啥用,就作罢了

  4. 追梦 说:

    额,我下载你的文件,按照你的教程来页面显示空白的

  5. 李阳博客 说:

    为啥我按照教程一步一步做的,打开时空白。

  6. 黑色网魂 说:

    啥我按照教程一步一步做的,打开时空白啊

  7. 丶小蓝丶 说:

    为啥我的出现了502的错误呢?
    占用如此之大???
    都是默认的 网址在上方您可以看下

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注

*

6 + 5 = ?


您可以使用这些 HTML 标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

返回顶部