COMP9024 (21T0): AVVLJQPeQW SimSle GUaSh VWUXcWXUe-baVed SeaUch Engine [The specification ma\ change. A notice on the class Zeb page Zill be posted after each reYision, so please check class notice board freqXentl\.] Change log: No entr\ as \et! Objectives to implement a simple search engine based on the Zell knoZn PageRank algorithm (simplified). to giYe \oX fXrther practice Zith C and adYanced data strXctXres (BST and Graph ADTs) Admin MarkV 30 marks toZards total coXrse marks. Part-A (10 marks), Part-B (10 marks), Part-C (10 marks). DXe 10am Monda\ 01 Feb 2021. LaWe PenalW\ 10% marks per da\ off the ceiling. Last da\ to sXbmit this assignment is 10am Wednesda\ 03 Feb 2021, of coXrse Zith late penalt\. SXbmiW Read instrXctions in the “SXbmission” section beloZ. Aim In this assignment, \oXr task is to implement a simple search engine Xsing the Zell knoZn algorithm PageRank, simplified for this assignment, of coXrse!. YoX shoXld start b\ reading the Zikipedia entries on the topic. Later I Zill also discXss these topics in the lectXre. PageRank (read Xp to the section “Damping factor”) The main focXs of this assignment is to bXild a graph strXctXre, calcXlate PageRanks and rank pages based one these YalXes. YoX don’t need to spend time craZling, collecting and parsing Zeblinks for this assignment. YoX Zill be proYided Zith a collection of “Zeb pages” Zith the reqXired information for this assignment in a eas\ to Xse format. For e[ample, each page has tZo sections, Section-1 contains Xrls representing oXtgoing links. Urls are separated b\ one or more blanks, across mXltiple lines. Section-2 contains selected Zords e[tracted from the Xrl. Words are separated b\ one or more spaces, spread across mXltiple lines. HiQW: YoX can assXme that ma[imXm length of a line ZoXld be 1000 characters. YoX need to Xse a d\namic data strXctXre(s) to handle Zords in a file and across files, no need to knoZ ma[ Zords beforehand. E[ample file XUO31.W[W #VWDUW 6HFWLRQ-1 XUO2 XUO34 XUO1 XUO26 XUO52 XUO21 XUO74 XUO6 XUO82 #HQG 6HFWLRQ-1 #VWDUW 6HFWLRQ-2 0DUV KDV ORQJ EHHQ WKH VXEMHFW RI KXPDQ LQWHUHVW. EDUO\ WHOHVFRSLF REVHUYDWLRQV UHYHDOHG FRORU FKDQJHV RQ WKH VXUIDFH WKDW ZHUH DWWULEXWHG WR VHDVRQDO YHJHWDWLRQ DQG DSSDUHQW OLQHDU IHDWXUHV ZHUH DVFULEHG WR LQWHOOLJHQW GHVLJQ. #HQG 6HFWLRQ-2 YoXr tasks in sXmmar\: CaOcXOaWe PageRaQNV: YoX need to create a graph strXctXre that represents a h\perlink strXctXre of giYen collection of “Zeb pages” and for each page (node in \oXr graph) calcXlate PageRank YalXe and other graph properties. IQYeUWed IQde[: YoX need to create “inYerted inde[” that proYides a list of pages for eYer\ Zord in a giYen collection of pages. SeaUch EQgiQe: YoXr search engine Zill Xse the giYen inYerted inde[ to find pages Zhere qXer\ term(s) appear and rank these pages Xsing their PageRank YalXes (see beloZ for more details) HoZ Wo geW VWaUWed HinWV and Sample fileV Hints on “HoZ to Implement Assignment”, Zill be discXssed in the lectXre. Sample files for HoZ to Get Started (ass-getting-started.]ip), Zill be discXssed in the lectXre. Sample1.]ip AddiWional fileV YoX can sXbmit additional sXpporting files, *.F and *.K, for this assignment. For e[ample, \oX ma\ implement \oXr graph adt in files JUDSK.F and JUDSK.K and sXbmit these tZo files along Zith other reqXired files as mentioned beloZ. Part-A: Calculate PageRanks YoX need to Zrite a program in the file SDJHUDQN.F that reads data from a giYen collection of pages in the file FROOHFWLRQ.W[W and bXilds a graph strXctXre Xsing Adjacenc\ Matri[ or List Representation. Using the algorithm described beloZ, calcXlate PageRank for eYer\ Xrl in the file FROOHFWLRQ.W[W. In this file, Xrls are separated b\ one or more spaces or/and neZ line character. Add sXffi[ .W[W to a Xrl to obtain file name of the corresponding “Zeb page”. For e[ample, file XUO24.W[W contains the reqXired information for XUO24. E[ample file FROOHFWLRQ.W[W XUO25 XUO31 XUO2 XUO102 XUO78 XUO32 XUO98 XUO33 Simplified PageRank Algorithm (for this assignment) PageRank(d, diffPR, ma[IWeUaWiRnV) 5HDG “ZHE SDJHV” IURP WKH FROOHFWLRQ LQ ILOH “FROOHFWLRQ.W[W” DQG EXLOG D JUDSK VWUXFWXUH XVLQJ AGMDFHQF\ LLVW 5HSUHVHQWDWLRQ 1 = QXPEHU RI XUOV LQ WKH FROOHFWLRQ FRU HDFK XUO pi LQ WKH FROOHFWLRQ EQG FRU LWHUDWLRQ = 0; GLII = GLII35; // WR HQWHU WKH IROORZLQJ ORRS :KLOH (LWHUDWLRQ < PD[IWHUDWLRQ A1D GLII >= GLII35) LWHUDWLRQ++; ZKHUH, – LV D VHW FRQWDLQLQJ OLQNV(XUOV) SRLQWLQJ WR SL (LJQRUH VHOI-ORRSV DQG SDUDOOHO HGJHV) – LV RXW GHJUHH RI – FRUUHVSRQGV WR YDOXH RI “LWHUDWLRQ” EQG :KLOH YoXr program in SDJHUDQN.F Zill take three argXments (d – damping factor, diffPR – difference in PageRank sXm, ma[IWeraWionV – ma[imXm iterations) and Xsing the algorithm described in this section, calcXlate PageRank for eYer\ Xrl. For e[ample, % SDJHUDQN 0.85 0.00001 1000 YoXr program shoXld oXtpXt a list of Xrls in descending order of PageRank YalXes (Xse format string “%.7I”) to a file named SDJHUDQNLLVW.W[W. The list shoXld also inclXde oXt degrees (nXmber of oXt going links) for each Xrl, along Zith its PageRank YalXe. The YalXes in the list shoXld be comma separated. For e[ample, SDJHUDQNLLVW.W[W ma\ contain the folloZing: E[ample file SDJHUDQNLLVW.W[W XUO31, 3, 0.2623546 XUO21, 1, 0.1843112 XUO34, 6, 0.1576851 XUO22, 4, 0.1520093 XUO32, 6, 0.0925755 XUO23, 4, 0.0776758 XUO11, 3, 0.0733884 Sample FileV for ParW-A YoX can doZnload the folloZing three sample files Zith e[pected SDJHUDQNLLVW.W[W files. Use format string “%.7I” to oXtpXt pagerank YalXes. Please note that \oXr pagerank YalXes might be slightl\ different to that proYided in these samples. This might be dXe to the Za\ \oX carr\ oXt calcXlations. HoZeYer, make sXre that \oXr pagerank YalXes match to sa\ first 6 decimal points to the e[pected YalXes. For e[ample, sa\ an e[pected YalXe is 0.1843112, \oXr YalXe coXld be 0.184311[ Zhere [ coXld be an\ digit. All the sample files Zere generated Xsing the folloZing command: % SDJHUDQN 0.85 0.00001 1000 aE[1 aE[2 aE[3 Part-B: Inverted Index YoX need to Zrite a program in the file named LQYHUWHG.F that reads data from a giYen collection of pages in FROOHFWLRQ.W[W and generates an “inYerted inde[” that proYides a sorted list (set) of Xrls for eYer\ Zord in a giYen collection of pages. Before inserting Zords in \oXr inde[, \oX need to “normalise” Zords b\, remoYing leading and trailing spaces, conYerting all characters to loZercase, remoYe the folloZing pXnctXation marks, if the\ appear at the end of a Zord: ‘.’ (dot), ‘,’ (comma), ‘;’ (semicolon), ? (qXestion mark) In each sorted list (set), dXplicate Xrls are not alloZed. YoXr program shoXld oXtpXt this “inYerted inde[” to a file named LQYHUWHGIQGH[.W[W. One line per Zord, Zords shoXld be alphabeticall\ ordered, Xsing ascending order. Each list of Xrls (for a single Zord) shoXld be alphabeticall\ ordered, Xsing ascending order. E[ample file LQYHUWHGIQGH[.W[W GHVLJQ XUO2 XUO25 XUO31 XUO61 PDUV XUO101 XUO25 XUO31 YHJHWDWLRQ XUO31 XUO61 NoWe: for this part, in \oXr oXtpXt file, on each line, a Zord and Xrls mXst be separated b\ one (or more) spaces. The testing program Zill ignore additional spaces. Part-C: Search Engine Write a simple search engine in file VHDUFK3DJHUDQN.F that giYen search terms (Zords) as commandline argXments, finds pages Zith one or more search terms and oXtpXts (to stdoXt) top 30 pages in descending order of nXmber of search terms foXnd and then Zithin each groXp, descending order of PageRank. If nXmber of matches are less than 30, oXtpXt all of them. YoXr program mXst Xse data aYailable in tZo files LQYHUWHGIQGH[.W[W and SDJHUDQNLLVW.W[W, and mXst deriYe resXlt from them. We Zill test this program independentl\ to \oXr solXtions for “A” and “B”. NoWe: For this part, each line in “inYertedInde[.t[t” contains – a Zord and the corresponding Xrls separated b\ one (or more) spaces. YoXr program for Part-C needs to be able to handle sXch an inpXt. Please see the sample program proYided “e[Tkns.c” . each line in “pagerankList.t[t” contains – Xrl, oXt-degree and pagerank. To simplif\ \oXr task, \oX can assXme that th
e\ are separated b\ ", " – that is a comma and one space. E[ample: % VHDUFK3DJHUDQN PDUV GHVLJQ XUO31 XUO25 Submission AddiWional fileV: YoX can sXbmit additional sXpporting files, *.F and *.K, for this assignment. IMPORTANT: Make sXre that \oXr additional files (*.c) DO NOT haYe "main" fXnction. For e[ample, \oX ma\ implement \oXr graph adt in files JUDSK.F and JUDSK.K and sXbmit these tZo files along Zith other reqXired files as mentioned beloZ. HoZeYer, make sXre that these files do not haYe "main" fXnction. I e[plain beloZ hoZ Ze Zill test \oXr sXbmission, hopefXll\ this Zill ansZer all of \oXr qXestions. YoX need to sXbmit the folloZing files, along Zith \oXr sXpporting files (*.c and *.h): pagerank.c inYerted.c searchPagerank.c NoZ sa\ Ze Zant to mark \oXr SDJHUDQN.F program. The aXto marking program Zill take all \oXr sXpporting files (other *.h and *.c) files, along Zith SDJHUDQN.F and e[ecXte the folloZing command to generate e[ecXtable file sa\ called pagerank. Note that the other tZo files from the aboYe list (LQYHUWHG.F and VHDUFK3DJHUDQN.F) Zill be remoYed from the dir: % JFF -:DOO -OP -VWG=F11 *.F -R SDJHUDQN So Ze Zill noZ XVe \oXr Makefile (if an\). The aboYe command Zill generate object files from \oXr sXpporting files and the file to be tested (sa\ SDJHUDQN.F), links these object files and generates e[ecXtable file, sa\ SDJHUDQN in the aboYe e[ample. Again, please make sXre that \oX DO NOT haYe main fXncWion in \oXr VXpporWing fileV (other *.c files \oX sXbmit). We Zill Xse similar approach to generate other tZo e[ecXtables (for LQYHUWHG.F and VHDUFK3DJHUDQN.F). HoZ Wo SXbmiW InstrXctions on hoZ to sXbmit \oXr assignment Zill aYailable later. Plagiarism YoX are alloZed to Xse code from the coXrse material (for e[ample, aYailable as part of the labs, lectXres and tXtorials). If \oX Xse code from the coXrse material, please clearl\ acknoZledge it b\ inclXding a comment(s) in \oXr file. If \oX haYe qXestions aboXt the assignment, ask \oXr tXtor. YoXr program mXst be entirel\ \oXr oZn Zork. Plagiarism detection softZare compares all sXbmissions pairZise (inclXding sXbmissions for similar projects in preYioXs \ears, if applicable) and serioXs penalties Zill be applied, particXlarl\ in the case of repeat offences. DR nRW cRS\ fURm RWheUV; dR nRW allRZ an\Rne WR Vee \RXU cRde, nRW eYen afWeU Whe deadline Please refer to the on-line soXrces to help \oX Xnderstand Zhat plagiarism is and hoZ it is dealt Zith at UNSW: Plagiarism and Academic Integrit\ UNSW Plagiarism ProcedXre Before sXbmitting an\ Zork \oX shoXld read and Xnderstand the sXb section named PlagiaUiVm in the coXrse oXtline. We regard XnacknoZledged cop\ing of material, in Zhole or part, as an e[tremel\ serioXs offence. For fXrther information, see the coXrse oXtline. — end —


