Ñò …%¾Mc@s9ddkZddkZddkZddkZddkZddkZddkZddkZddkZddk l Z ddk l Z l Z ddklZddkZdZeiedƒZeiedƒZeidd d d d d ƒZZee_ee_eƒZdZe ƒZd„Zed„Zd„Zd„Zd„Z d„Z!d„Z"ddd„Z#de$d„Z%d„Z&d„Z'd„Z(ee$eee$edd„Z*d„Z+d „Z,d!„Z-d"„Z.d#„Z/d$„Z0e1d%jo3e2ei3ƒdjo e0ƒne2ei3ƒd&joBei3dd'jo e+ƒq'ei3dd(jo e%ƒq'ei3di4d)ƒoŒei3dd)joe#d*ƒqwei3dd+i5d,ƒZ6e2e6ƒd&jo%e#e7e6dƒe7e6d-ƒƒqwe#e7e6d-ƒƒq'ei3dd.jo e"ƒq'ei3dd/jo eƒq'ei3dd0jo e/ƒq'n­e2ei3ƒd1jo–ei3dd2jo e0ƒq'd3ei3d&joYei3d&i5d3ƒZ8xFe9e7e8d-ƒe7e8dƒdƒD]Z:ei;e:ƒqWq'ei3d&i4d4ƒoeei3d&d&ƒq'ei3d&i4d5ƒoeei3d&d&ƒq'ei3d&i4d6ƒoe ei3d&d&ƒq'ei3d&i4d7ƒoe!ei3d&d&ƒq'd,ei3d&joe%ei3d&ƒq'e*e7ei3d&ƒd8eƒnei<ƒndS(9iÿÿÿÿN(tmd5(tThreadtLock(tQueues/var/www/simplecd.olds/verycd.sqlite3.dbs/lock.sqlite3.dbtusertroottpasswdtguess8tdbtsimplecdicCs›titdƒ}t|_titdƒ}t|_tiddddddƒ}x>to6tiƒ}t |d |d |d |ƒti ƒqYWdS( Ns/verycd.sqlite3.dbs/lock.sqlite3.dbRRRRRR tconntdbltstatdb( tsqlite3tconnecttpathtstrt text_factorytMySQLdbtTruetqtgettfetcht task_done(R R R ttopic((s%/var/www/simplecd.old/fetchvc_noth.pyt thread_fetch#s   c Cstd}t|dƒid|dƒd|}dGHti|ƒ}tidtiƒi|ƒ}g}|o'tidtiƒi|ƒ}|GHn|GH|o<x9|D]-}t|dƒi|dƒt i |ƒq«Wn|o,|o%x"|D]}t |d t ƒqõWnd S( s#search verycd, fetch search resultss /search.logtas s%http://www.verycd.com/search/folders/sfetching search results ...s /topics/(\d+)s /search/folders/(.*?\?start=\d+)t,tfullN( Rtopentwritetdownloadt httpfetchtretcompiletDOTALLtfindallRtputtsearchtFalse( tkeywordRt searchlogturltresttopicstlinksRtkey((s%/var/www/simplecd.old/fetchvc_noth.pyR&.s(   cCsÁd}dGHti|ƒ}tidtiƒi|ƒiƒ}tidtiƒi|ƒ}d}x>|D]6}dG|dGdGHti |dƒ|d |7}qiWt t d d ƒi |ƒd S( s. read verycd hot res and keep update very day shttp://www.verycd.com/sfetching homepage ...s热门资æº.*?s2]*>(《.*?》)[^<]*s.

æ¯æ—¥çƒ­é—¨èµ„æº

sfetching hot topicis...s5 %s  s/static/hot.htmltwN( RR R!R"R#R&tgroupR$RR%RRR(R*thomethotzonethotthtmlR((s%/var/www/simplecd.old/fetchvc_noth.pyR3Fs$c CsÝd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}xt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒq¾WqkWdS(s fetch normal res that need logint-is,http://www.verycd.com/orz/page%d?stat=normalt needlogins /topics/(\d+)iN( tsplittinttrangeRR RR!R"R#R$RR%( tpagest_[1]txtftttpageR*tidxtidstid((s%/var/www/simplecd.old/fetchvc_noth.pytnormalTs :  c CsÝd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}xt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒq¾WqkWdS(s!fetch request res that need loginR5is-http://www.verycd.com/orz/page%d?stat=requestR6s /topics/(\d+)iN( R7R8R9RR RR!R"R#R$RR%( R:R;R<R=R>R?R*R@RARB((s%/var/www/simplecd.old/fetchvc_noth.pytrequestbs :  c CsÝd|jo:g}|idƒD]}|t|ƒq!~\}}nt|ƒ}}xt||dƒD]j}d|}ti|dtƒ}tidtiƒi |ƒ}|dGHx|D]} t i | ƒq¾WqkWdS(s!fetch request res that need loginR5is)http://www.verycd.com/orz/page%d?stat=allR6s /topics/(\d+)iN( R7R8R9RR RR!R"R#R$RR%( R:R;R<R=R>R?R*R@RARB((s%/var/www/simplecd.old/fetchvc_noth.pytallps :  cCs€d}dGHti|ƒ}tidtiƒi|ƒ}t|ƒ}|GHtiti ƒƒ}x|D]}t i |ƒqeWdS(s. read verycd feed and keep update very 30 min shttp://www.verycd.com/sto/feedsfetching feed ...s /topics/(\d+)N( RR R!R"R#R$tsetttimetmktimetgmtimeRR%(R*tfeedsRAtnowRB((s%/var/www/simplecd.old/fetchvc_noth.pytfeeds i ic CsÝd}xÐt||dƒD]»}dG|GdGH|t|ƒ}ti|dtƒ}tidtiƒi|ƒ}|o|d}nqtidtiƒi|ƒ}t |ƒ}|GHx|D]}t i |ƒq¾WqWdS( Ns#http://www.verycd.com/sto/~all/pageis fetching lists...R6s"topic-list"(.*?)"pnav"is /topics/(\d+)( R9RRR RR!R"R#R$RFRR%( tnumtoffturlbasetiR*R+tres2R,R((s%/var/www/simplecd.old/fetchvc_noth.pytupdate‘s   s1-maxc Csd}|djoFd}ti|ƒiƒ}ttidƒi|ƒidƒƒ}n|idƒ}t|dƒ}|ddjo@ti|ƒiƒ}ttidƒi|ƒidƒƒ}nt|dƒ}dG|Gd G|Gd GHxŠt ||dƒD]u}|d |d }d G|Gd GHt i |ƒ}tidti ƒi |ƒ} | GHx| D]} ti| ƒq`WqWdS(Nshttp://www.verycd.com/archives/s1-maxisarchives/(\d+)R5itmaxsfetching list fromttos...s%05ds.htmls fetching froms topics/(\d+)/(turllibturlopentreadR8R!R"R&R0R7R9RR R#R$RR%( trantdebugROtm1R+tm2tmRPR*RARB((s%/var/www/simplecd.old/fetchvc_noth.pytfetchall£s, ++ cCs:tii|ƒ}tii|ƒpti|ƒndS(N(tosRtdirnametexiststmakedirs(R=td((s%/var/www/simplecd.old/fetchvc_noth.pyt ensure_dir»scCsÎyÀtt|ƒdƒ}tt|ƒddƒ}tt|ƒdƒ}td|||f}tii|ƒoti|ƒntidgddƒ}|it dt|ƒƒi ƒƒWnnXdS( Ni idiès/idcache/%s/%s/%s.htmls127.0.0.1:11211RYitidsum( RtlongRR^R`tremovetmemcachetClienttdeleteRt hexdigest(RBtl1tl2tl3t cachefiletmc((s%/var/www/simplecd.old/fetchvc_noth.pyt clear_idcacheÀs'c Cs|pdSx |D]}yôdG|GHtidƒid|ƒ}|iddƒ}ttdƒttd|dƒttd|d|d d !fƒtd |d|d d !f|}tii|ƒ ptii|ƒdjo#t |d ƒi t i |ƒƒnWqqXqWdS( Nt ___cachings http://[^/]*tt/s /imgcache/1s/imgcache/%s/1is/imgcache/%s/%s/1iis/imgcache/%s/%s/R/( R!R"tsubtreplaceRcRR^R`tgetsizeRRRR (R-tlR=((s%/var/www/simplecd.old/fetchvc_noth.pyt cache_imageÎs  ##-'c(-Csò dG|GdGHd}|t|ƒ} d} xAtdƒD]3} y ti| dtd|ƒ} PWq6q6q6Xq6Wtidtiƒi| ƒ} | pB| djp d | jo d GHdSd G|Gd GHt ||ƒSn| d } tidtiƒi| ƒ} | o| d } ndSyÿtidtiƒi | ƒi dƒ}tidtiƒi | ƒi dƒ}tidtiƒi d|ƒi ƒ}tidtiƒi| ƒd }|ot|ƒ}||dNsfetching topics...shttp://www.verycd.com/topics/RritreportR6s ]*>(.*?)s&requestIcon"[^>]*>\s*]*>(.*?)(.*?)s<.*?>sJæ—¶é—´.*?.*?date-time.*?>(.*?).*?date-time.*?>(.*?)s<align:top;">分类.*?.*?>.*?>(.*?).*?>(.*?)s<align:top;">分类.*?.*?>\s*(.*?)\s+(.*?)\s*s s=ed2k="([^"]*)" (subtitle_[^=]*="[^"]*"[^>]*)[^<]*>([^<]*)sed2k="([^"]*)"[^>]*>([^<]*)iÿÿÿÿs/iptcomContents">(.*?)s+src="(http://image-\d*\.verycd\.com/[^"]*)"s<(/?OBJECT.*?)>s[\1]s<(/?PARAM.*?)>s<(/?EMBED.*?)>s <(img .*?)>s
s s&.*?;t s\n\s+s \[(img .*?)\]s<\1>
s\[(/?OBJECT.*?)\]s<\1>s\[(/?PARAM.*?)\]s\[(/?EMBED.*?)\]s(image-\d*)\.verycd\.coms\1.app-base.coms'http://stat.verycd.com/counters/folder/Rss \'(\d+)\'sselect * from t1 where id=%ssŒinsert into t1 (id,comments,hits,score,title,brief,category1,updtime,status,vcpv) values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)s+update t1 set vcpv=%s,status=%s where id=%ss0]*>(.*?)s(replace into lock values (?,?,?,?,?,?,?)t`s4href="(http://www\.verycd\.com/search/files/.*?rel)"s'javascript:generateUrl\('start',(\d+)\)s &start=%ss ed2k://.*?\|/t|is=replace into verycd values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)(-RR9RR RR!R"R#R$RR&R0RttstriptlistRutextendtlenRft ExceptiontIRxtcursorturllib2RVRWR8texecuteR]tclosetcommittMtg_mutextacquireRetreleasetjoinRFtsortedR7tdbfindtdbinserttdbupdateRptmysqldb((RBR RYR R R6tcachetupdtimeROR*R+t_tabstractttitletstatustbrieftpubtimetcategoryted2ktnewed2kRPtcontenttwhattimglinkstvcpvtc2tstaturltstR;R<trtownertclted2kstrted2kpagetstartststartted2ksttriestc((s%/var/www/simplecd.old/fetchvc_noth.pyRás<     '''" "%" !!!!!!!! & ' 4     3          ++ &"  8  cCs1tiƒ}|idƒtiƒ|iƒdS(Nscreate table verycd( verycdid integer primary key, title text, status text, brief text, pubtime text, updtime text, category1 text, category2 text, ed2k text, content text )(R R„R†RˆR‡(R®((s%/var/www/simplecd.old/fetchvc_noth.pytdbcreate§s    c Cs©|iƒ} d} x|| djonyF| id|||||d|d|d|d||df ƒPWq| d7} tidƒqqXqW| iƒ|iƒdS(Niis0insert into verycd values(?,?,?,?,?,?,?,?,?,?,?)iRr(R„R†RGtsleepR‡Rˆ( RBR—R˜R™RšR›RœRžR R®R­((s%/var/www/simplecd.old/fetchvc_noth.pyR¸s   (    c Cs¦d} |iƒ} xy| djokyC| id||||d|d|d|d|||f ƒPWq| d7} tidƒqqXqW| iƒ|iƒdS(Niis„update verycd set title=?,status=?,brief=?,pubtime=?, updtime=?,category1=?,category2=?,ed2k=?,content=? where verycdid=?i(R„R†RGR°R‡Rˆ( RBR—R˜R™RšR›RœRžR R­R®((s%/var/www/simplecd.old/fetchvc_noth.pyR‘Ès   %    cCsT|iƒ}|id|fƒ|iƒx$|D]}d|jotStSq0WdS(Ns%select 1 from verycd where verycdid=?i(R„R†R‡RR'(RBR R®R<((s%/var/www/simplecd.old/fetchvc_noth.pyRÙs   cCsDtiƒ}|idƒx$|D]}x|D] }|GHq-Wq WdS(Nsselect * from verycd(R R„R†(R®R<ty((s%/var/www/simplecd.old/fetchvc_noth.pytdblistãs  cCs dGHdS(NsÃUsage: python fetchvc.py createdb python fetchvc.py fetchall python fetchvc.py fetch 1-1611 #fetch archive list python fetchvc.py fetch 5633~5684 #fetch topics python fetchvc.py fetch 5633 #fetch a topic python fetchvc.py fetch q=keyword python fetchvc.py list #list the database python fetchvc.py feed #run every 30 min to keep up-to-date python fetchvc.py hot python fetchvc.py update #update first 20 pages, run on a daily basis((((s%/var/www/simplecd.old/fetchvc_noth.pytusageês t__main__itcreatedbR]RRiiR5iRLR3RiRt~sq=sn=sr=sa=RY(=RUR…R!R RGR^tsysRRgthashlibRt threadingRRRRRRR R R’R RRRtMAXCRŠRRR&R3RCRDRERLRRR'R]RcRpRxtNoneRR¯RR‘RR²R³t__name__Rtargvt startswithR7RXR8R\R9RPR%R(((s%/var/www/simplecd.old/fetchvc_noth.pyt sž      "             Æ         %   (!