Skip to main content


By January 13, 2021No Comments

COMP6714 Review Wei Wang weiw AT School of Computer Science and Engineering Universities of New South Wales November 11, 2020 Course Logisitics I THE formula: mark = { 0.25 · (ass1 + proj1) + 0.50 · exam , if exam ≥ 40 39FL , otherwise. I Exam date: Exact time to be announced, 2 Dec (Wed) afternoon. I Pre-exam consultations: I TBA I TBA I Sample exam papers to be released soon. I Course survey or private messages to me on the forum. (1) The final exam mark is important and you must achieve at least 40! (2) Supplementary exam is only for those who cannot attend final exam. (3) Apply for UNSW Special Consideration (SC) with sufficient evidence and the SC team will make the final decision. About the Final Exam I Time: 10 minutes reading time + 2 hr open-book exam + 15 minutes scanning+uploading+submission time. I Very important for you to know how to scan, upload, and submit. Practice before-hand !! We will launch a practice session before hand. I Designed to test your understanding and familiarity of the core contents of the course. I 100 (8 questions) I Similar to those in the assignment. Special Note on the Final Exam I We trust every student will uphold the academic integrity. I Severe consequences for any misconduct in the final exam. About the Final Exam . . . I Read the instructions carefully. I You can answer the questions in any order. I Some of the “Advanced” Methods/algorithms/systems are not required, unless explicitly mentioned here. Tip: Write down intermediate steps, so that we can give you partial marks even if the final answer is wrong. Disclaimer: We will go through the main contents of each lecture. However, note that it is by no means exhaustive. Boolean Model I incidence vector I semantics of the query model (AND/OR/NOT, and other operators, e.g., /k, /S) I inverted index, positional inverted index I query processing methods for basic and advanced boolean queries (including phrase query, queries with /S operator, etc.) I query optimization methods (list merge order, skip pointers) I Not required: next-word index Preprocessing I typical preprocessing steps: tokenization, stopword removal, stemming/lemmatization, Index Construction I Why we need dedicated algorithms to build the index? I BSBI: Blocked sort-based indexing I SPIMI: Single-pass in-memory indexing I Dynamic indexing: Immediate merge, no merge, logarithmic merge Vector Space Model I What is/why ranked retrieval? I raw and normalized tf, idf I cosine similarity I tf-idf variants (using SMART notation): e.g., lnc.ltc I basic query processing method: document-at-a-time vs term-at-a-time I exact & approximate query optimization methods (heap-based top-k algorithm, MaxScore and WAND algorithms, etc.) I Not required: Query processing methods based on advanced or tiered inverted indexes (e.g., high/low lists, impact-oriented lists, etc.) Evaluation I Existing method to prepare for the benchmark dataset, queries, and ground truth I For unranked results: Precision, recall, F-measure I For ranked results: precision-recall graph, 11-point interpolated precision, MAP, etc. I Not required: NDCG, Kappa (κ) measure for inter-judge (dis)agreement Probabilistic Model and Language Model I Probability ranking principle (intuitively, how to rank documents and when to stop) I derivation of the ranking formula of the probabilistic model I the BM25 method I Query-likelihood unigram language model with Jelinek-Mercer smoothing. Web Search Basics I Difference between Web search and Information Retrieval. I Estimation of relative sizes of two search engines. I Near duplicate detection: the shingling method I Not required: the SimHash method. Crawling I Understand the requirements and the current architecture of crawlers (e.g., the Mercator architecture). I Not required: optimization for age, finding content blocks, etc. Link Analysis I The pagerank algorithm: theory and practice I Not required: the topic-specific/personalized pagerank Thanks and Good Luck!


Author admin

More posts by admin