Ñò hí˜Mc@sdddkZddkZddkZddkZddkZddkZddkZddkZddkl Z l Z l Z ddk l Z ddk Z dZeiedƒZeiedƒZeiddd d d d ƒZZee_ee_e ƒZd Ze ƒZe d;ƒd„Zed„Zd„Zd„Zd„Zd„Zd„Z ddd„Z!de"d„Z#d„Z$d„Z%d„Z&ee"eee"d„Z'd „Z(d!„Z)d"„Z*d#„Z+d$„Z,d%„Z-x:e.eƒD],Z/e d&eƒZ0e0i1eƒe0i2ƒqðWe3d'jo3e4ei5ƒdjo e-ƒne4ei5ƒd(joBei5dd)jo e(ƒqRei5dd*jo e#ƒqRei5di6d+ƒoŒei5dd+joe!d,ƒq¢ei5dd-i7d.ƒZ8e4e8ƒd(jo%e!e9e8dƒe9e8d/ƒƒq¢e!e9e8d/ƒƒqRei5dd0jo e ƒqRei5dd1jo eƒqRei5dd2jo e,ƒqRn­e4ei5ƒd3jo–ei5dd4jo e-ƒqRd5ei5d(joYei5d(i7d5ƒZ:xFe.e9e:d/ƒe9e:dƒdƒD]Z/ei;e/ƒq-WqRei5d(i6d6ƒoeei5d(d(ƒqRei5d(i6d7ƒoeei5d(d(ƒqRei5d(i6d8ƒoeei5d(d(ƒqRei5d(i6d9ƒoeei5d(d(ƒqRd.ei5d(joe#ei5d(ƒqRe'e9ei5d(ƒd:eƒnei<ƒndS(<iÿÿÿÿN(tThreadtLockt stack_size(tQueues/var/www/simplecd.olds/verycd.sqlite3.dbs/lock.sqlite3.dbtdbtsimplecdtusertroottpasswdtguess8ii€i cCs®titdƒ}t|_titdƒ}t|_tiddddddƒ}tid d tƒx>to6t i ƒ}t |d |d |d |ƒt i ƒqlWdS(Ns/verycd.sqlite3.dbs/lock.sqlite3.dbRRRRRR shttp://www.verycd.comt needlogintconntdbltstatdb( tsqlite3tconnecttpathtstrt text_factorytMySQLdbtdownloadt httpfetchtTruetqtgettfetcht task_done(R R R ttopic((s /var/www/simplecd.old/fetchvc.pyt thread_fetch"s   c Cstd}t|dƒid|dƒd|}dGHti|ƒ}tidtiƒi|ƒ}g}|o'tidtiƒi|ƒ}|GHn|GH|o<x9|D]-}t|dƒi|dƒt i |ƒq«Wn|o,|o%x"|D]}t |d t ƒqõWnd S( s#search verycd, fetch search resultss /search.logtas s%http://www.verycd.com/search/folders/sfetching search results ...s /topics/(\d+)s /search/folders/(.*?\?start=\d+)t,tfullN( RtopentwriteRRtretcompiletDOTALLtfindallRtputtsearchtFalse( tkeywordRt searchlogturltresttopicstlinksRtkey((s /var/www/simplecd.old/fetchvc.pyR'.s(   c Csd}dGHti|ƒ}tidtiƒi|ƒiƒ}tidtiƒi|ƒ}g}|D] }||qg~}d}|d7}xW|d D]K}dG|d Gd GHti |d ƒ|d |d |d |d f7}q˜W|d 7}t t ddƒi |ƒdS(s. read verycd hot res and keep update very day shttp://www.verycd.com/sfetching homepage ...s热门资æº.*?s2]*>(《.*?》)[^<]*s.

æ¯æ—¥çƒ­é—¨èµ„æº

s i sfetching hot topicis...s is
s/static/hot.htmltwN( RRR"R#R$R'tgroupR%RR&R RR!(R+thomethotzonethott_[1]txthtmlR((s /var/www/simplecd.old/fetchvc.pyR4Fs$!  ' c Csðd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}tiddtƒxt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒqÑWq~WdS( s fetch normal res that need logint-shttp://www.verycd.comR is,http://www.verycd.com/orz/page%d?stat=normals /topics/(\d+)iN( tsplittintRRRtrangeR"R#R$R%RR&( tpagesR5R6tftttpageR+tidxtidstid((s /var/www/simplecd.old/fetchvc.pytnormalWs :  c Csðd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}tiddtƒxt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒqÑWq~WdS( s!fetch request res that need loginR8shttp://www.verycd.comR is-http://www.verycd.com/orz/page%d?stat=requests /topics/(\d+)iN( R9R:RRRR;R"R#R$R%RR&( R<R5R6R=R>R?R+R@RARB((s /var/www/simplecd.old/fetchvc.pytrequestfs :  c CsÝd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}xt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒq¾WqkWdS(s!fetch request res that need loginR8is)http://www.verycd.com/orz/page%d?stat=allR s /topics/(\d+)iN( R9R:R;RRRR"R#R$R%RR&( R<R5R6R=R>R?R+R@RARB((s /var/www/simplecd.old/fetchvc.pytallus :  cCs€d}dGHti|ƒ}tidtiƒi|ƒ}t|ƒ}|GHtiti ƒƒ}x|D]}t i |ƒqeWdS(s. read verycd feed and keep update very 30 min shttp://www.verycd.com/sto/feedsfetching feed ...s /topics/(\d+)N( RRR"R#R$R%tsetttimetmktimetgmtimeRR&(R+tfeedsRAtnowRB((s /var/www/simplecd.old/fetchvc.pytfeed„s i ic CsÝd}xÐt||dƒD]»}dG|GdGH|t|ƒ}ti|dtƒ}tidtiƒi|ƒ}|o|d}nqtidtiƒi|ƒ}t |ƒ}|GHx|D]}t i |ƒq¾WqWdS( Nshttp://www.verycd.com/sto/pageis fetching lists...R s"topic-list"(.*?)"pnav"is /topics/(\d+)( R;RRRRR"R#R$R%RFRR&( tnumtoffturlbasetiR+R,tres2R-R((s /var/www/simplecd.old/fetchvc.pytupdate–s   s1-maxc Csd}|djoFd}ti|ƒiƒ}ttidƒi|ƒidƒƒ}n|idƒ}t|dƒ}|ddjo@ti|ƒiƒ}ttidƒi|ƒidƒƒ}nt|dƒ}dG|Gd G|Gd GHxŠt ||dƒD]u}|d |d }d G|Gd GHt i |ƒ}tidti ƒi |ƒ} | GHx| D]} ti| ƒq`WqWdS(Nshttp://www.verycd.com/archives/s1-maxisarchives/(\d+)R8itmaxsfetching list fromttos...s%05ds.htmls fetching froms topics/(\d+)/(turllibturlopentreadR:R"R#R'R1R9R;RRR$R%RR&( trantdebugROtm1R,tm2tmRPR+RARB((s /var/www/simplecd.old/fetchvc.pytfetchall¨s, ++ cCs:tii|ƒ}tii|ƒpti|ƒndS(N(tosRtdirnametexiststmakedirs(R=td((s /var/www/simplecd.old/fetchvc.pyt ensure_dirÀscCs“y…tt|ƒdƒ}tt|ƒddƒ}tt|ƒdƒ}td|||f}tii|ƒoti|ƒnWnnXdS(Ni idiès/idcache/%s/%s/%s.html(RtlongRR^R`tremove(RBtl1tl2tl3t cachefile((s /var/www/simplecd.old/fetchvc.pyt clear_idcacheÅsc Cs%|pdSx|D] }yúdG|GHtidƒid|ƒ}|iddƒ}ttdƒttd|dƒttd|d|d d !fƒtd |d|d d !f|}tii|ƒ ptii|ƒdjo)t |d ƒi t i |ƒi ƒƒnWqqXqWdS( Nt ___cachings http://[^/]*tt/s /imgcache/1s/imgcache/%s/1is/imgcache/%s/%s/1iis/imgcache/%s/%s/R0(R"R#tsubtreplaceRcRR^R`tgetsizeR R!RURVRW(R.tlR=((s /var/www/simplecd.old/fetchvc.pyt cache_imageÑs  ##--c&-Cs³ dG|GdGHd}|t|ƒ}d}xAtdƒD]3} y ti|dtd|ƒ}PWq6q6q6Xq6Wtidtiƒi|ƒ} | pB|djp d |jo d GHdSd G|Gd GHt ||ƒSn| d } tidtiƒi| ƒ} | o| d } ndSyÞtidtiƒi | ƒi dƒ} tidtiƒi | ƒi dƒ} tidtiƒi d| ƒi ƒ} tidtiƒi| ƒd }tidtiƒi| ƒ}|p"tidtiƒi| ƒ}nt|d ƒ}|d iddƒ|d <|diddƒ|dNsfetching topics...shttp://www.verycd.com/topics/RlitreportR s ]*>(.*?)s&requestIcon"[^>]*>\s*]*>(.*?)(.*?)s<.*?>sJæ—¶é—´.*?.*?date-time.*?>(.*?).*?date-time.*?>(.*?)s<align:top;">分类.*?.*?>.*?>(.*?).*?>(.*?)s<align:top;">分类.*?.*?>\s*(.*?)\s+(.*?)\s*s s=ed2k="([^"]*)" (subtitle_[^=]*="[^"]*"[^>]*)[^<]*>([^<]*)sed2k="([^"]*)"[^>]*>([^<]*)iÿÿÿÿs/iptcomContents">(.*?)s+src="(http://image-\d*\.verycd\.com/[^"]*)"s<(/?OBJECT.*?)>s[\1]s<(/?PARAM.*?)>s<(/?EMBED.*?)>s <(img .*?)>s
s s&.*?;t s\n\s+s \[(img .*?)\]s<\1>
s\[(/?OBJECT.*?)\]s<\1>s\[(/?PARAM.*?)\]s\[(/?EMBED.*?)\]s(image-\d*)\.verycd\.coms\1.app-base.coms'http://stat.verycd.com/counters/folder/Rms \'(\d+)\'sselect * from t1 where id=%ssŒinsert into t1 (id,comments,hits,score,title,brief,category1,updtime,status,vcpv) values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)s+update t1 set vcpv=%s,status=%s where id=%ss0]*>(.*?)s(replace into lock values (?,?,?,?,?,?,?)t`s4href="(http://www\.verycd\.com/search/files/.*?rel)"s'javascript:generateUrl\('start',(\d+)\)s &start=%ss ed2k://.*?\|/t|is=replace into verycd values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)(,RR;RRRR"R#R$R%RR'R1RntstriptlistRotextendtlenRet ExceptiontIRrturllib2RVRWR:tcursortexecuteR]tclosetcommittMtg_mutextacquireRdtreleasetjoinRFtsortedR9tdbfindtdbinserttdbupdateRj(&RBR RYR R R ROR+R,t_tabstractttitletstatustbrieftpubtimetcategoryted2ktnewed2kRPtcontenttwhattimglinkstvcpvtstaturltsttc2R5R6trtownertclted2kstrted2kpagetstartststartted2ksttriestc((s /var/www/simplecd.old/fetchvc.pyRãs4     '''""%" !!!!!!!!&  ' 4     3          + +&&  8  cCs1tiƒ}|idƒtiƒ|iƒdS(Nscreate table verycd( verycdid integer primary key, title text, status text, brief text, pubtime text, updtime text, category1 text, category2 text, ed2k text, content text )(R RR€R‚R(R¥((s /var/www/simplecd.old/fetchvc.pytdbcreate«s    c Cs©|iƒ} d} x|| djonyF| id|||||d|d|d|d||df ƒPWq| d7} tidƒqqXqW|iƒ| iƒdS(Nii s0insert into verycd values(?,?,?,?,?,?,?,?,?,?,?)iRli(RR€RGtsleepR‚R( RBRŽRRR‘R’R“R•R R¥R¤((s /var/www/simplecd.old/fetchvc.pyRмs   (    c Cs¦d} |iƒ} xy| djokyC| id||||d|d|d|d|||f ƒPWq| d7} tidƒqqXqW|iƒ| iƒdS(Niis„update verycd set title=?,status=?,brief=?,pubtime=?, updtime=?,category1=?,category2=?,ed2k=?,content=? where verycdid=?i(RR€RGR§R‚R( RBRŽRRR‘R’R“R•R R¤R¥((s /var/www/simplecd.old/fetchvc.pyR‹Ìs   %    cCsT|iƒ}|id|fƒ|iƒx$|D]}d|jotStSq0WdS(Ns%select 1 from verycd where verycdid=?i(RR€RRR((RBR R¥R6((s /var/www/simplecd.old/fetchvc.pyR‰Ýs   cCsDtiƒ}|idƒx$|D]}x|D] }|GHq-Wq WdS(Nsselect * from verycd(R RR€(R¥R6ty((s /var/www/simplecd.old/fetchvc.pytdblistçs  cCs dGHdS(NsÃUsage: python fetchvc.py createdb python fetchvc.py fetchall python fetchvc.py fetch 1-1611 #fetch archive list python fetchvc.py fetch 5633~5684 #fetch topics python fetchvc.py fetch 5633 #fetch a topic python fetchvc.py fetch q=keyword python fetchvc.py list #list the database python fetchvc.py feed #run every 30 min to keep up-to-date python fetchvc.py hot python fetchvc.py update #update first 20 pages, run on a daily basis((((s /var/www/simplecd.old/fetchvc.pytusageîs ttargett__main__itcreatedbR]RRiiR8iRLR4RyiRt~sq=sn=sr=sa=RYi(=RUR~R"RRGR^tsysRt threadingRRRRRRRR R tmysqldbR RRRtMAXCR„RRR'R4RCRDRERLRRR(R]RcRjRrRR¦RŠR‹R‰R©RªR;RPR>t setDaemonR¢t__name__R{targvt startswithR9RXR:R\R&R‡(((s /var/www/simplecd.old/fetchvc.pyt s¦     "             È           %   (!