世上有一些很牛逼的事情,這些事情能讓很多牛逼的人死在牛逼里。
我們先看一個頁面 http://www.skxox.com/xxinfo_127691.html
這個頁面應該在瀏覽器里面可以正常顯示。不會出現亂碼。
再查看他的源文件,可以看到這一行 <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
于是,牛逼的你很牛逼的認為,這個頁面時gb2312編碼的。。。
那現在試試,讓瀏覽器以GB2313編碼顯示這個網頁試試:
涓?鍛ㄩ挗閾佽涓氫俊鎭嫨瑕?
尼瑪啊,這到底是神馬啊。。。。。。
所以,博客園上面那些自動識別網頁編碼的文章都是騙人的。。。
抓包工具看下:
HTTP/1.1 200 OK
Date: Thu, 21 Apr 2011 07:36:27 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
X-Powered-By: UrlRewriter.NET 1.7.0
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 41750
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
上面這個才是它真正的編碼。
所以,求求你不要再去分析網頁里面的charset了。
獲取編碼的語句換成:
string c = response.ContentType.Replace("text/html;", "").Replace("charset=", "").Trim();
一整坨代碼:
/// <summary>
/// 遠程獲取url地址的頁面源代碼
/// </summary>
/// <param name="url">要獲取頁面的URL</param>
/// <returns>返回HTML代碼</returns>
public static string GetHtml(string url, string ucoid)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader reader = null;
try
{
request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "www.svnhost.cn";
request.Timeout = 20000;
request.AllowAutoRedirect = true;
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK && response.ContentLength < 1024 * 1024)
{
string c = response.ContentType.Replace("text/html;", "").Replace("charset=", "").Trim();
if (ucoid.IsNullOrEmpty())
{
ucoid = c;
}
reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.GetEncoding(ucoid));
string html = reader.ReadToEnd();
return html;
}
}
catch { }
finally
{
if (response != null)
{
response.Close();
response = null;
}
if (reader != null)
{
reader.Close();
}
if (request != null)
{
request = null;
}
}
return string.Empty;
}
/// </summary>
/// <param name="url">要獲取頁面的URL</param>
/// <returns>返回HTML代碼</returns>
public static string GetHtml(string url, string ucoid)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader reader = null;
try
{
request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "www.svnhost.cn";
request.Timeout = 20000;
request.AllowAutoRedirect = true;
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK && response.ContentLength < 1024 * 1024)
{
string c = response.ContentType.Replace("text/html;", "").Replace("charset=", "").Trim();
if (ucoid.IsNullOrEmpty())
{
ucoid = c;
}
reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.GetEncoding(ucoid));
string html = reader.ReadToEnd();
return html;
}
}
catch { }
finally
{
if (response != null)
{
response.Close();
response = null;
}
if (reader != null)
{
reader.Close();
}
if (request != null)
{
request = null;
}
}
return string.Empty;
}
所以網頁設計師你桑不起啊。。。。他們上輩子都是掉進化糞池里折翼的天屎 啊。。。
浙公網安備 33010602011771號