[Rpm] 300 lines of code "polish" their search engine myso (sample source code download)

Category: C/C++ -> C Author: koslong Date: 2011-03-12 13:43:18
 
koslong
2011-03-12 13:43:18
From:

300 lines of code to polish its own search engine myso (sample source code download)

Keyword:
; myso search engine C language socket proxy CSP eybuild

Features Overview:
A. This example to "Wubi coding & Pinyin Query" and the "IP address and domain name location" for example.
B. Good OS platforms include: Linux / Unix, Windows
C. Windows platform supports automatic IE proxy search, to adapt to connect intenet.
D. direct calls to the socket-related system C interface API.
E. supports automatic DNS.

sample shots:

1. issues:
we want to polish an own search engine,
(1) need to have a data source. data sources come from?
(2 ) need to use dynamic pages to handle user requests, using what dynamic web technology?
(3) how to improve search engine concurrent performance, reducing the amount of code and run the memory size?
(3) where to put my server, how to make my engine accommodate a wider range of connectivity and application platform?
Namely: how to support the Direct Connection / Proxy Connection connection, while supporting Windows / Unix, and other OS platforms.
;
2. analysis and program:
1. dynamic data source I can get on from the intenet so both We reduce the amount of maintenance to the data source,
but also to ensure the data is current as used in this example from the data source.
2. supports many dynamic web technologies, such as CGI (CSP / eybuild), PHP, ASP, JSP, ...
3. taking into account the concurrent performance, code size, we can use the C language using the CGI CSP technology.
CSP / eybuild do not need to install any script interpreter, run as little as 64K of dynamic memory can,
; the C language with a more pan-platform support, easy to use.
4. CSP directly to the C-based in Unix / Windows on the transplant use, you can directly call the system
any system API, integrated and powerful. written proxy connection function is very easy (see sample source code).

3. grinding principle:

following figure show, showing an intention entire processing flow diagram:

CODE: [Copy to clipboard]
| + --------- + + ---------- + --------- +
O | | ; | --- HTTP ---> | MySo | --- HTTP ---> | DATA |
--- | ------> | Browser | | WEB Server | | Server |
; / \ | | PC | <- HTTP- - | (CGI) | <- HTTP ---- | |
(man) + --------- + + ---------- + + - -------- +

- 1 ------------- 2 --------------------- 3 ------------------- 4 ----
(1) the left is a user, the browser's form input data.
(2) The center-left is the browser device, which the user requests via http submit data to myso.cgi where webserver.
(3) Web server to perform myso.cgi Then myso.cgi will connect remote data server (to provide search capabilities),
myso.cgi after receiving query results analyze query results. reconstruct their query results.
(4) Data Server is actually a server can provide search functionality, which is itself search engines.
we have here is borrowed from its data source, "polish" its own search engine.

Well, after the above analysis has been able to see the entire process.

Some may say that this is a "hotlinking." Yes, we must admit that they are hotlinking, but this example also resolves
defense " ; hotlinking "Other technologies (see below) Here is the reason why the title called" polished "intention.

4. the original code analysis:
(1) the source code and API description provided such instructions:
1) ; <%%> - such as HTML template files used in embedded C program (ie CSP statement)
; 2) @ g - tag blocks, such as HTML template files for embedded C functions, etc. global information.
3) @ b - will be performed at the beginning of CSP page
4) G () - function from the user submits a form to obtain the values ​​entered related to the getParameter () function.
5) Other API refer to the relevant manuals.

(2) understand the source can be divided into three parts:
1) html part (1 - line 71), which is part of the main html template
2) @ b part (73-165 OK ), this part is when the request is submitted to the page, to be executed first. namely: begin
3) @ g portion (167-328 lines), this part of the definition of page-related functions globally,
; can call anywhere in the program, namely: global.

(3) @ b section (73-165 line):
1) 74 - 87 line, indicating that the function needs to contain the head of the C program file, pay attention to @ include not # include
2) 88 - 94 defined, Bureau of C variables
3) 96 - 123 for processing queries Wubi coding and pinyin.
4) 128 - 164 seat for handling five IP addresses and domain queries and (3) of the Similar processes.
5) 97 - 98 confirmed the submitted data is not empty, and successfully connect remote program data server
6) 103-110 handmade constructed HTTP POST request to submit queries to the remote server.
7) 112-116 send http request and receive a response in a buffer.
8) 117-121 separated from the response data we need, the response is an html file.
9) 160-164 If the connection error, an error message based on the error type constructor and stored in errmsg.

(4) html part (1-71 lines):
1) 56 - 60 lines, ; through the <%%> C program embedded in a broken judge sentences,
If it is wrong then the output error to the current place, otherwise the query results output to the current place.
2) 69 lines ; <% / / = buff%> is used for debugging has been / / commented out.
; can / / remove from the data server in order to print out the entire answer.
3 ) Other not elaborate.

(5) @ g portion (167-328 OK)
1) This defines six functions
int SocketInit ();
int get_proxy (char * proxy_ip, short * port);
char * make_error (char * errmsg);
; int connect_query_host (char * hostname, char * errmsg);
int send_http_req (int sock, char * http_req);
int recv_http_req (int sock, char * http_req, int maxlen);
2) The first two are windows dedicated.
SocketInit () is used to initialize the socket
; get_proxy () read from the registry IE's proxy settings.
3) connect_query_host () is used to connect remote data server,
When setting agent, ; first try to use a proxy to connect.
4) send_http_req () and recv_http_req () to send and receive http request .

(unfinished)
liuwu88525
2011-03-12 13:56:36
~ continued ~
(6) with source

001 <! DOCTYPE HTML PUBLIC "- / / W3C / / DTD HTML 4.01 Transitional / / EN"
002 "
003 <! -
004 Source From:
005 Supported platforms: Unix / Windows
006 Unix compiler: make clean all
007 changes, please notify the author: [email protected]
008 - >
009 <html>
010 <head>
011 <meta http-equiv = "Content-Type" content = "text / html; charset = gb2312">
012 <title> Wubi coding queries </title>
013 <link href = "/ css / tq.css" rel = "stylesheet" type = "text / css">
014 <script language = "javascript" src = "<% = romPrefix (NULL)%> / js / myso.js"> </script>
015 </head>
016 ; <style> <! -
017 body, td, a, p, {font-family: arial, sans-serif; font-size: 14px}
018 ; -> </style>
019 <body>
020 <p>
021 <center> < img src = "<% = romPrefix (NULL)%> / img / myso.jpg"> </center>
022 <table width = " ; 430 "border =" 0 "align =" center "cellspacing =" 0 ";">
023 <tr>
024 <td colspan="3"> </td>
025 ; </tr>
026 <tr>
027 <td colspan="3"> <strong> Wubi coding & Pinyin query </strong> </td>
028 ; </tr>
029 <form name = "form1" method = "post" action = "<% = thisCgiPrefix ()%>" onsubmit = "return check ()">
030 ; <tr>
031 <td width=135> Please enter the characters: </td>
032 <td> <input name = "querykey" type = "text" size = "20" value = ""> </td>
033 <td> <input type = " ; submit "name =" wubiquery "value =" Start Search ""> </td>
034 </tr> ;
035 </form>
036 <tr>
037 <td colspan="3"> </td>
038 ; </tr>
039 <tr>
040 <td colspan="3"> <strong> IP Address Location / Domain Search </strong> </td>
041 </tr>
042 <form name = "ipform" method = post action = "<% = thisCgiPrefix () %> "onsubmit =" return checkIP (); ">
043 <tr>
044 ; <td> enter the IP or domain name: </td>
045 <td> <input type="text" name="ip" size="20">
046 <td> <input type="submit" name="ipquery" value=" ">
047 <input TYPE="hidden" name="action" value="2">
048 </tr>
049 </form>
050 <tr>
051 <td colspan="3"> </td>
052 </tr>
053 </table>
054 <table width = "430" border = "0" align = "center" cellspacing = "0">
055 < ; tr>
056 <td> <font color=red> <% if (! isblankstr ( errmsg)) / * report error * /
057 ; print ("<b> query failed: </b>% s", errmsg);
058 else
059 ; print ("%. * s", qlen, qstart); / * output query result * /
060 %> </font>
061 </td>
062 ; </tr>
063 <tr> <td> </td> </tr>
064 <tr> <td> <center> <a href="" target=_blank> source download <A> |
065 ; <a href="[email protected]" target=_blank> Contact the author <A> |
066 <a href="" target=_blank> More Information <A> </center> </td> </tr>
067 <tr> <td> <center> (From:) </center> </td> </tr>
068 </table>
069 <xmp> <% ;/ / = buff%> </xmp>
070 </body>
071 </html>
072
073 <% @ b
074 @ include <undef.h>
075 @ ifdef WIN32
076 @ include <winsock2.h>
077 @ define close closesocket
078 @ else
079 @ include < unistd.h>
080 @ include <errno.h>
081 @ include <sys/types.h>
082 @ include ; <sys/socket.h>
083 @ include <netinet/in.h>
084 @ include <arpa/inet.h>
085 @ include <netdb.h>
086 @ endif / * WIN32 * /
087 @ include <ebdef.h>
088 int sock = 0;
089 char ; buff [4096] = "";
090 int ; maxlen = sizeof (buff);
091 char * qstart = ""; / * query result start address * /
092 int qlen = 0; / * query result length * /
093 ; char errmsg [256] = " ";
094 int ; ret = OK;
095
096 / * wubi query * /
097 if (! isblankstr (G ("querykey")) &&
098 (sock = connect_query_host ("qq.ip138.com", errmsg))> 0)
099 {
100 char ; req_buf [1024] = "";
101 ; char query [256] = "";
102 ;
103 / * make query and ; http header * /
104 sprintf (query, "querykey =% s", urlEncode (G ("querykey")));
105 sprintf (req_buf, "POST \ r \ n"
106 "Content-Type: application / x-www-form-urlencoded \ r \ n"
107 "Content-Length:% d \ r \ n" ;
108 ; "Host: qq.ip138.com \ r \ n"
109 ; "\ r \ n"
110 " % s ", strlen (query), query);
111
112 / * send ==> receive ==> parse result * /
113 if ((ret = send_http_req (sock, req_buf))> 0)
114 {
115 if ((ret = recv_http_req (sock, buff, sizeof (buff))) > 0)
116 {
117 ; / * separate result * /
118 ; if (NULL! = (qstart = strstr (buff, " ; <p align=\"center\"> \ r \ n ")))
119 {
120 ; qlen = strstr (qstart, "</p>") - qstart;
121 ;}
122 }
123}
124
125 close (sock);
126 ;}
127
128 / * ip query * /
129 if (! isblankstr (G ("ip")) &&
130 (sock = connect_query_host ("www.ip138.com", errmsg))> 0)
131 {
132 char ; req_buf [1024] = "";
133 ; char query [256] = "";
134 ;
135 / * make query and ; http header * /
136 sprintf (query, "ip =% s & action = 2", ; G ("ip"));
137 sprintf (req_buf, "POST \ r \ n"
138 ; "Referer: \ r \ n"
139 "Content-Type: application / x-www-form-urlencoded \ r \ n"
140 "Content-Length:% d \ r \ n"
141 ; "Host: www.ip138.com \ r \ n"
142 ; ; "\ r \ n"
143 "% s", strlen (query), query);
144
145 / * send ==> receive ==> parse result * /
146 if ((ret = send_http_req (sock, req_buf))> 0)
147 ; {
148 if ((ret = recv_http_req (sock, buff, sizeof (buff)))> 0)
149 {
150 ; if (NULL! = (qstart = strstr (buff, "<ul class=\"ul1\">")))
151 ; {
152 qlen = strstr (qstart, "</td>" ) - qstart;
153}
154}
155 }
156
157 ; close (sock);
158}
159
160 ; / * make error message * /
161 if (sock <0 | | ret <0)
162 {
163 ; make_error (errmsg);
164}
165%>
zjj_668
2011-03-12 14:03:35
166 ;
167 <% @ g
168
169 # ifdef WIN32
170 char * ; make_error (char * errmsg)
171 {
172 int ; errcode;
173 LPVOID lpMsgBuf;
174 ;
175 if (OK == (errcode = GetLastError ()))
176 return "Ready";
177
178 ; FormatMessage (
179 FORMAT_MESSAGE_ALLOCATE_BUFFER |
180 FORMAT_MESSAGE_FROM_SYSTEM |
181 FORMAT_MESSAGE_IGNORE_INSERTS,
182 ; NULL,
183 errcode,
184 ; MAKELANGID (LANG_NEUTRAL, SUBLANG_DEFAULT), / / ​​Default language
185 ; (LPTSTR) & lpMsgBuf,
186 0,
187 NULL
188);
189
190 / / Process any inserts in lpMsgBuf.
191 sprintf (errmsg, "% d:% s", errcode, lpMsgBuf);
192
193 / / Free the buffer.
194 LocalFree (lpMsgBuf);
195
196 return errmsg;
197}
198
199 int SocketInit ()
200 {
201
202 WSADATA wsaData;
203
204 if (WSAStartup (0x101, & wsaData)) {
205 ; fprintf (stderr, "Could not initialize WinSock \ n");
206 ; return ERROR;
207}
208
209 if (0x101! = WsaData.wVersion) {
210 ; fprintf (stderr, "Version% x not supported \ n", wsaData.wVersion);
211 ; return ERROR;
212}
213
214 return OK;
215}
216
217 / * try to get ; proxy from windows register * /
218 int get_proxy (char * proxy_ip, short * port)
219 {
220 HKEY hkey;
221 DWORD d = sizeof (char);
222 char * path = "Software \ \ Microsoft \ \ Windows \ \ CurrentVersion \ \ Internet Settings";
223 unsigned long type;
224 unsigned char data [256] = "";
225 unsigned long dlen = sizeof (data);
226 char * ; pstr = NULL;
227
228 proxy_ip [0] = ; '\ 0';
229 if (ERROR_SUCCESS == RegOpenKeyEx (HKEY_CURRENT_USER, path, 0, KEY_READ, & hkey) )
230 {
231 if (ERROR_SUCCESS! = RegQueryValueEx (
232 hkey, "ProxyServer", 0, & type, (unsigned char *) data, & dlen))
233 {
234 RegCloseKey (hkey);
235 return -1;
236 ;}
237
238 if (NULL! = (pstr = strchr (data, ':')))
239 {
240 sprintf (proxy_ip, "%. * s", pstr-data, ; data);
241 * port = (short) atoi (pstr +1 );
242}
243
244 ; RegCloseKey (hkey);
245 return 0;
246 }
247
248 return -1;
249}
250
251 # else / * WIN32 * /
252 / * for unix * /
253 char * make_error (char * errmsg)
254 {
255 strcpy (errmsg , strerror (errno));
256
257 return errmsg;
258 ;}
259 # endif / * WIN32 * /
260
261
262 int connect_query_host (char * hostname, char * errmsg)
263 {
264 int sock = 0;
265 struct sockaddr_in sa;
266 ; char dstip [20] = "";
267 ; short port = 0;
268
269 # ifdef ; WIN32
270 if (SocketInit () <0)
271 return ERROR;
272
273 memset (& sa, 0, sizeof (sa));
274 sa.sin_family = AF_INET;
275
276 / * by proxy * /
277 if (! get_proxy (dstip, & port) &&! isblankstr (dstip))
278 {
279 ; sa.sin_port = htons (port);
280 sa.sin_addr.s_addr ; = inet_addr (dstip);
281}
282 / * by domain name * /
283 else
284 # endif / * WIN32 * /
285 {
286 struct hostent * ht = ; gethostbyname (hostname);
287
288 / * parse domain error * /
289 if (NULL == ht )
290 return ERROR;
291
292 ; sa.sin_port = htons (80);
293 ; sa.sin_addr.s_addr = (* (struct in_addr *) ht-> h_addr_list [0]). s_addr;
294 ;}
295
296 if ((sock = socket (AF_INET, SOCK_STREAM , 0)) <0)
297 return ERROR;
298
299 if (connect (sock, (struct sockaddr *) & sa, sizeof (struct sockaddr_in)) < ; 0)
300 return ERROR;
301
302 return sock;
303}
304
305
306 int send_http_req (int sock, char * http_req)
307 {
308 int len;
309
310 len = send (sock, http_req, strlen (http_req), 0);
311
312 return len;
313}
314
315
316 int recv_http_req (int sock, char * http_req, int maxlen)
317 {
318 int len;
319 int dlen = 0;
320
321 ; while (dlen <maxlen-1 && (len = recv (sock, http_req + dlen, maxlen-dlen-1, 0) )> 0)
322 {
323 dlen + = len;
324}
325
326 ; return dlen;
327}
328%>

Appendix: Related reference
1 . CSP / eybuild main site:
2. "EyBuild Chinese manual" download address:
3. "CSP / eybuild APIs refer to" read online:
4. CSP / eybuild API function list, online reading:
5. "CSP / eybuild FAQ (Frequently Asked Questions)" Download Address:

~ END ~
2007.1.24