辅导案例-ISA 496

4/28/2016 Miami University ISA 496 Final Report Page | 0 Understanding the LexisNexis Customer Experience The Dream Team: MIAMI UNIVERSITY ISA 496 Client Project 4/28/2016 Miami University ISA 496 Final Report Page | 1 Table of Contents I. Introduction………………………………………………………………2 II. Project Approach and Outcomes ………………………………………….2 III. Methods…………………………………………………………………..4 A. Data Retrieval………………………………………………………4 B. Data Understanding………………………………………………….5 C. Data Preparation……………………………………………………..6 D. Data Modeling and Analysis………………………………………..6 E. Data Visualization…………………………………………………..8 IV. Insights and Client Delivery ………………………………………………22 V. Concluding Remarks……………………………………………………….26 VI. Appendix………………………………………………………………….27 4/28/2016 Miami University ISA 496 Final Report Page | 2 ISA 496: LexisNexis Final Report I. INTRODUCTION As students in the ISA 496 practicum course, we work to provide analytics consulting to various business clients to work through and solve analytical, data driven problems. This semester, we have been tasked with completing a client project for LexisNexis — a leading global provider of content-enabled workflow solutions designed specifically for professionals in the legal, risk management, corporate, government, law enforcement, accounting, and academic markets1. The basis of our project revolves around the customer feedback generated by LexisNexis products and how to understand it. The process of obtaining and storing this data is not currently standardized, leaving room for manual error. LexisNexis has asked that we help them better understand the customer sentiment related to their products, specifically Lexis Advance. We have developed an understanding of project requirements and expectations of our client, which in part has shaped the main goal of our analysis: To develop an automated process methodology for LexisNexis that will help them to better manage customer feedback. This process will allow LexisNexis to identify and extract subjective information in order to derive valuable business insights. Our team is interested in analyzing how net promoter score (NPS) relates to the customer feedback data at hand. Through our deliverables, LexisNexis will be able to better understand their consumer, which will enable them to make informed decisions in the future. II. PROJECT APPROACH & OUTCOMES We understand that there is a tremendous amount of value to be gained from better utilizing customer data. In order to fully satisfy LexisNexis’s needs, we have taken note of the specific questions they asked us to consider. These include:  Who uses the products? What do users like/dislike about the product? What is their overall sentiment regarding LexisNexis? 1 “Solutions for Professionals Who Shape the World.” Welcome to LexisNexis. RELX Group, 2016. Web. 03 Mar. 2016. . 4/28/2016 Miami University ISA 496 Final Report Page | 3  What drives user evaluation through Net Promoter Score (NPS)? ● What aspects of the product should we improve? What aspects of the product should we test better? Our goal is to answer these questions by completing four objectives: 1. A categorization or “binning” of the text data to better identify the top customer concerns. 2. A visualization or dashboard to gauge customer service and sentiment. 3. An analysis or model relating the customer feedback data to the NPS. 4. An implementable process that will help LexisNexis assess changes over time, show correlation between data streams, as well as correlation between usage and evaluation. In order to complete these objectives, our team utilized the following process methodology to structure our project. Figure 1. Dream Team Project Process Plan Please refer to Appendix A for descriptions of all tools and software packages utilized in the development of our solution. Data Retrieval Data Understanding Data Preparation Data Analysis & Modeling Data Visualization Insights & Client Delivery 4/28/2016 Miami University ISA 496 Final Report Page | 4 III. METHODS 1. Data Retrieval Upon project kickoff, LexisNexis provided a series of datasets for project use. The following table describes the datasets: Table 1. Description of Data Received from LexisNexis After receiving this data, we set out to further understand the current collection methods used in retrieving it. We sought to understand how the data was created and collected to identify possible solutions and reduce bottlenecks in the processes. In identifying issues with current collection, we can suggest resolutions at the source. We first began by looking at the Call Topics dataset. Calls are split into different categories at their initiation, ranging from account care, legal, news and financial and technical support, amongst other things. We were provided a look into the current Spectrum tool used to collect the information from these calls, which we were informed is a constantly evolving system. Example screenshots of this tool are provided in Appendix B, Figure 2. Because of the proactive move towards a new tool, our focus of change in the collection process will be upon the retrieval of email data as well as NPS data, rather than call topics. Data File Type No. of Observations Source Call Topics .xlsx 277,988 Spectrum: The LexisNexis Customer Support Desktop NL Feedback Emails .msg 3,607 Customer Emails to LexisNexis Service Representatives with comments through product webpages Net Promoter Score Information .xlsx 15,262 NPS Questionnaire Responses VOC & Summary Information .xlsx 1,405 Voice of Customer Survey LA (Lexis Advance) Feedback .xlsx 11,510 Summarized Call Topics 4/28/2016 Miami University ISA 496 Final Report Page | 5 The NPS questionnaire is currently sent out weekly with a response rate of 3.5%2. For the most recent week of data provided, approximately 70 responses were received. This suggests that about 2,000 NPS questionnaires are sent out weekly to select customers. A screenshot of the NPS questionnaire is provided in Appendix B, Figure 3. Where we saw the most opportunity for improvement in the collection process is in the collection of the NL Feedback emails. These emails are generated through a drop down on each product’s homepage, which offers a section for comments, alongside an optional name, and email address. A screenshot of a feedback email section is provided in Appendix B, Figure 4. Because this section was made up of manual input entirely, each response was incredibly difficult to get in a useable format to analyze. We are aware that the product is specified based on which product page the email is prompted from, but we believe that offering customers the option of standardized drop downs for multiple issues that they are experiencing could be very beneficial. We have derived these from our LDA topics found through analysis and will discuss the solutions further in our Insights & Client Delivery section. 2. Data Understanding In order to understand the datasets, we began to explore them with different software packages. For the Call Topics and NPS Information datasets, we used JMP to analyze the distributions of key variables (included in Appendix C, Figures 1 and 2), generate useable information from the data, and identify areas requiring further preparation. These distributions reveal that the majority of customer call topics come from the Call Type – Access Product/Service, the CSR Split – Legal Research Tas
ks or Technical Support, and the Market – Law Firm associate fields. Most of the responses for the NPS questionnaire come from Small Law customers. While most responders are considered detractors, the overwhelming majority say they definitely will continue use of Lexis Advance. When exploring the NL Feedback Emails dataset, we encountered difficulties due to the file format. In addition, a portion of the emails were automatic replies and email chains with customer service representatives, the removal of which was important to improving the quality of the dataset. 2 “The return rate to NPS emails varies by customer segments, but on average it is approximately 3.5%” Quoted from Jim Robinson, Google Groups Discussion: NPS Customer Survey Questionnaire Desired. Feb 11. 2016. 4/28/2016 Miami University ISA 496 Final Report Page | 6 We determined that the VOC & Summary Information and LA Feedback datasets contained data too summarized to be useful in accomplishing our objectives. Therefore, we will not be using these datasets in the development of any models or processes. 3. Data Preparation Each of the data sets provided required the use of different tools to cleanse, transform, and manipulate them to generate useful information. These tools are included in Appendix A, Table 1. The goal of this cleansing and transformation was to get the data in a format compatible with R in order to complete analysis, create visualizations and generate findings regarding the problem at hand. The NPS and Call Topics data sets were easy to clean in Excel, after which they were both imported into R for analysis. Appendix D, Tables 1-3 provide lookup tables of recoded variables in the Call Topics data. The Feedback emails were converted from Microsoft Outlook .msg files to raw text data through a system called Total Mail Converter, followed by a Python script in order to put the data into the proper format for R. Further preparation and processing in R will be included as part of our analysis section. 4. Data Analysis & Modeling Once data were prepared and compatible with R, we completed all data analysis using R and its component packages. In order to accomplish our team’s project objectives, we utilized a combination of text mining, sentiment analysis and topic modeling methods. In order to conduct text mining, we used the “tm” package in R for all three data sets. Within “tm”, we placed all of the open-ended text into a document term matrix. A document term matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. The rows correspond to documents in the collection and the columns correspond to terms. A more detailed explanation of how we constructed the document term matrix is located in Appendix F, List 1. 4/28/2016 Miami University ISA 496 Final Report Page | 7 As part of objective two, we wanted to identify the most prevalent emotions3 customers feel towards LexisNexis’s products. We conducted this analysis within R Studio with a tool called “Syuzhet” to calculate the percentage of emotion in the text. The NRC sentiment function within “Syuzhet” implements Saif Mohammad’s emotion lexicon. According to Mohammad, “the NRC emotion lexicon is a list of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust)”.4 For even further analysis of the NPS data and customer sentiment, we conducted sentiment analysis using the “Stringr” package in R to apply a sentiment function frequently used and studied by Richard T. Watson, a professor at the University of Georgia5. Sentiment analysis is a popular and simple text mining method of measuring aggregate feeling. The function matched the words within the document term matrix with a dictionary of positive words (score = +1) and negative words (score = -1). The goal of this method is to calculate a sentiment score (sum of positive words – sum of negative words) which can be aggregated at multiple levels of the NPS data. Although a goal of this project was to understand the sentiment of all customers, our team strayed away from conducting sentiment analysis for the NL Feedback emails dataset. The reasoning behind this was the likelihood of the dataset to be passive or negative. The email section on LexisNexis’s product page was presented as an outlet for customer service issues, therefore positive feedback is unlikely. To provide valuable analysis for LexisNexis, we analyzed the NL Feedback Emails data set using LDA or Latent Dirichlet Allocation; a form of topic modeling. Diane J. Hu of University of California – San Diego describes “Latent Dirichlet Allocation (LDA) [as] an unsupervised, statistical approach to document modeling that discovers latent semantic topics in large collections of text analysis and topic modeling”6. Topic modeling allows LexisNexis to categorize and bin (in accordance with objective 1) their product related 3 Saif, M. (n.d.). NRC Word-Emotion Association Lexicon. Retrieved April 21, 2016. from http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm 4 Saif, M. (n.d.). NRC Word-Emotion Association Lexicon. Retrieved April 21, 2016. from http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm 5 Watson, R. T. (1995). Data management: an organizational perspective. John Wiley & Sons, Inc. 6 Hu, D. J. (2009). Latent dirichlet allocation for text, images, and music.University of California, San Diego. Retrieved April, 26, 2013. 4/28/2016 Miami University ISA 496 Final Report Page | 8 text issues in an interactive and adaptable way. We completed this analysis using R’s “lda” package. Topic modeling, specifically LDA, is typically evaluated by measuring the performance of a training model on secondary, validation data. This process is iterative and can be implemented using a technique referred to as Gibbs sampling, which is effective with probabilistic models such as LDA. Gibbs sampling allows us to sample from a probabilistic distribution without having to perform more mathematical integral calculations. Further information regarding the implementation of Gibbs sampling for LDA evaluation can be accessed in the accompanied citation.7 The combination of sentiment analysis and topic modeling methods we used are made even more impactful when visualized using techniques like word clouds, “LDAvis” and more which will be discussed in greater depth in the next section. 5. Data Visualization Visualizations enable LexisNexis to grasp difficult analysis methodologies in order to improve decision-making. We created a multitude of visualizations in order to make sense of the large datasets provided. Accompanying the report, we have provided the R script necessary to produce an interactive dashboard. This combines all visualizations into an environment allowing LexisNexis to easily and effectively connect and understand our visualizations, removing the silos of the separate data sets. Through data visualization, we set out to answer the following questions prompted by LexisNexis. Who uses the products? What do users like/dislike about the product? What is their overall sentiment regarding LexisNexis? NPS Sentiment Analysis We used the NPS dataset in order to begin answering these questions. As previously mentioned, NPS scores ranks users into three groups—Detractors, Passives, or Promoters. It is a 7A gentle introduction to topic modeling using R [Web log post]. (2015, September 29). Retrieved March 3, 2016, from https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/#comments 4/28/2016 Miami University ISA 496 Final Report Page | 9 way of measuring the willingness of customers to recommend a product or service to others and a general way to gauge overall customer satisfaction with and loyalty towards a company. By manipulating
the NPS data from the LexisNexis survey in R, we were able to create word clouds as a measure of sentiment within groups. The detractor word cloud shows that those responders describe Lexis Advance as difficult to search within, dislike the results, and often make comparisons to Westlaw, a product from Thomas Reuters. Those responders who were categorized as passive, on the other hand, describe Lexis Advance as complicated and are frustrated with the results of their searches. Responders who fell into the promoter category described Lexis Advance as easy to use, useful, powerful, quick, up-to-date, and as a quality tool. Detractor Word Cloud Passive Word Cloud Promoter Word Cloud Figures 2-4. Word Clouds for NPS Categories 4/28/2016 Miami University ISA 496 Final Report Page | 10 NPS Emotions Analysis We were also interested in visually gauging the emotional sentiment within each group. To reiterate, the analysis is based off Saif Mohammad’s emotion lexicon and implemented using R’s “Syuzhet” package. In reviewing the results, the most prevalent emotion shown by detractors and passives is fear. On the other hand, the overwhelming emotion for promoters was trust. Figure 5. Emotional Sentiment for Detractor 4/28/2016 Miami University ISA 496 Final Report Page | 11 Figure 6. Emotional Sentiment for Passive Figure 7. Emotional Sentiment for Promoter 4/28/2016 Miami University ISA 496 Final Report Page | 12 Mean Sentiment for NPS Category, Segment, and Firm Size Figure 8 shows the calculation of mean sentiment score for each NPS category. As one would likely hypothesize, the mean sentiment score for detractors is the most negative while the mean sentiment for the promoters is the most positive with a score of 5.17. Figure 8. Mean Sentiment Score by NPS Category In Figure 9, we measured the mean sentiment score by user segment. According to the NPS questionnaire responses, there are 8 segments: Unknown, BIS (Business Insight Solution), Corporate Large, Corporate Small, Large Law, Small Law, Federal Government and State/Local Government. According to this visualization, small law firms have the lowest mean sentiment score. BIS, or Business Insight Solution users, have the highest mean sentiment score and therefore have more positive feelings about Lexis Advance than the rest of the segment. However, looking at the overall average sentiment scores, the highest score is only a two. This suggests that overall these segments do not have strong positive feelings towards Lexis Advance. 4/28/2016 Miami University ISA 496 Final Report Page | 13 Figure 9. Mean Sentiment Score by Segment 4/28/2016 Miami University ISA 496 Final Report Page | 14 In Figure 10, we evaluated mean sentiment score by firm size. We found that firms with 1500-1999 employees have the highest mean sentiment score, meaning that they have more positive feelings about Lexis Advance than any larger or smaller firms. Figure 10. Mean Sentiment Score by Firm Size Another important question we sought to answer was: What drives user evaluation through Net Promoter Score (NPS)? Predictive Modeling with NPS Data Through data visualization and modeling we found that Continue to Use – whether and to what degree a customer plans to continue using LexisNexis Products – and NPS Rating are highly correlated, evidence of which can be seen in Appendix F, Table 1. On average, users who specify that they definitely will continue to use Lexis Advance have an NPS rating of 9.83 while users who specify that they definitely will not have a rating of 1.53. 4/28/2016 Miami University ISA 496 Final Report Page | 15 We developed a decision tree using SAS Enterprise Miner modeling the NPS user evaluation data excluding the Continue to Use variable in order to find other variables, which might drive user evaluation. A full picture of the model is included in Appendix F, Figure 1. We found that when the variable Continue to Use is excluded, the most significant variables in explaining the categorization of responders as Promoter (+1), Passive (0) or Detractor (-1) are whether the responder had Decision Authority and the extent to which the responder uses (the variable Advance Usage) Lexis Advance. On average, an individual who uses LexisNexis products fewer than three times in 30 days is about 79% likely to be categorized as a detractor or passive when he or she has some decision authority in the products they use. Without those decision tree splits, there is a 70% likelihood of classification as a passive or detractor. 4/28/2016 Miami University ISA 496 Final Report Page | 16 Additional NPS Data Visualizations The relationship between the segment to which a customer belongs in and the variable Continue to Use also provides ideas for what drives user evaluation. This graph presents an interesting story—while the majority of all users plan to continue use of Lexis Advance, customers from the small law segment are the most likely to discontinue use. Figure 11. Relationship between Segment and Continued Use The bar charts below share more information regarding NPS user evaluation. The first bar chart shows the relationship between NPS Category and different segments. To reiterate, the NPS questionnaire identifies 8 segments: Unknown, BIS (Business Insight Solution), Corporate Large, Corporate Small, Large Law, Small Law, Federal Government and State/Local Government. We found that the segment of customers who most often promoted Lexis Advance were Large Corporations, Large Law Firms and Federal Government Agencies. The segments 4/28/2016 Miami University ISA 496 Final Report Page | 17 that contained the most detractors included Small Law firms and State/Local Government customers. These findings reveal that there is an association between smaller, more locally focused customers and dissatisfaction with Lexis Advance. Figure 12. Relationship between NPS Category and Segment The second bar chart shows the relationship between NPS Category and firm size by percentage. According to this plot, NPS ranking is associated with the size of the firm. Firms with less than fifty employees have more detractors than promoters, and firms with 1000 or more employees have more promoters. 4/28/2016 Miami University ISA 496 Final Report Page | 18 Figure 13. Relationship between NPS Category and Firm Size The final question we worked to answer was: What aspects of the product should we improve and which aspects should we test better? Call Topics Analysis We created visualizations with the Call Topics dataset as well as the NL Feedback emails dataset. Our first insight comes from Call Topics, which reveals that the top three most common call reasons for all three products are formulating searches, troubleshooting/access issues, and how to use feature/function. It is clear that an area to address is the ability for the customer to find the information they need. As a search engine, that functionality is paramount. 4/28/2016 Miami University ISA 496 Final Report Page | 19 Table 2. Top Ten Call Reasons by Product NL Feedback Email Visualization Our text analysis of the NL Feedback Emails data was also significant in helping identify areas of improvement in LexisNexis’s products. Figure 14 gives an initial look at the emails prior to more in-depth topic modeling analysis. The bar plot reveals that the most comments and complaints from these emails are associated with the FullDocView, Home, and Search page names. Therefore, we suggest more attention be placed on improving the ease of navigation within LexisNexis’s products. Figure 14. Number of Comments & Complaints by Page Name 4/28/2016 Miami University ISA 496 Final Report Page | 20 NL Feedback Emails Topics Modeling To represent the results of our LDA topic modeling, we utilized “LDAvi
s”, a package within R, which strives to make a fitted topic model interpretable and understandable. Since LDA finds “hidden topics”, it is necessary to develop a visualization that discovers and articulates the “found topics”. Through “LDAvis”, we can visualize questions of a topic model like the meaning of a topic, the prevalence of a topic, and the relationships between the words contained within a topic. Figure 15 shows a static rendering of the Lexis Advance web based LDA visualization using R’s “LDAvis” functionality. This visualization gives LexisNexis the ability to categorize and bin customer’s open-ended comments, revealing areas of top customer concern (See Objective 1). Accompanied with this report are the R scripts necessary to produce these interactive visualizations for all LexisNexis products. As the user navigates the interactive visualization, he or she can view the top 30 most relevant terms for each topic. From there, the user will need to apply judgment to identify the hidden topic meaning. For Lexis Advance, the topics found through LDA modeling are contained in Table 3. Topic Frequent Terms Topic Description 1 Time, Frustrating, Work User interaction with product at work 2 Footnotes, Citation, Format Search content citations and references 3 Error, Issues, Tried Troubleshooting and Access Issues 4 Search, Filter, View Search-related functions 5 Appellate, Court, Statute Legal Matters and Case Law Table 3. Description of Lexis Advance LDA Topic Categories An annotated version of Figure 15 containing a breakdown of the components of this visualization and a detailed description of each element are located in Appendix F, Figure 3. Our goal was to stop at a number of topics which were clearly differentiated (no overlapping with one another) and that were reasonably interpretable. Examples of other topic number outputs is included in Appendix F, Figure 4. 4/28/2016 Miami University ISA 496 Final Report Page | 21 Figure 15. Lexis Advance LDA Visualization 4/28/2016 Miami University ISA 496 Final Report Page | 22 IV. Insights & Client Delivery What does this all mean? After conducting extensive analysis and creating many visualizations, we set out to identify the most valuable ways for LexisNexis to utilize our findings. Below are actionable insights and recommendations, which address areas where LexisNexis can most improve its products and customer relationships. Insights Recommendations Data Collection is manual, open- ended and has low response rates (especially NPS)  Add dropdown options for NL Feedback Emails comments with multiple pre-filled options that align with the 5 LDA topics and/or most prominent call topic categories  Cater each NPS Survey question to all NPS categories. Instead of binary responses, they would be ordinal: -1, 0, +1 responses. (Refer to Appendix F, Figure 8 for an example)  Incentivize users to take survey to improve the current response rate (3.5%)  Bolster web chat capabilities to better address immediate problems WHY? Better collection of data = better analysis = more effective solutions Mean Sentiment Scores are higher among larger firms and small firms have some of the highest rates of detractors. Current pricing favors large firms with economies of scale8.  Continue to market heavily to larger firms with potentially larger contracts and approximately 41% operational income growth  Offer pricing options and discounts for small firms more likely to detract  Prevent more detracting scores because companies whose NPS <60 see a greater (-54%) decline in growth from detractors than increase in growth from promoters9 WHY? Customers who are detractors are 5 X more likely to attrite Detractors from the NPS data are apt to compare LexisNexis to WestLaw  Offer those detractors more easily accessible outlets to express concerns with LexisNexis products  Maintain communication with detractors and stress strengths of LN over WestLaw 8 http://www.lexisnexis.com/terms/21/pricing/ 9 Eastman, D. (n.d.). The ROI of NPS: How a Focus on Customer Loyalty Delivers Financial Gains. Retrieved April 21, 2016. 4/28/2016 Miami University ISA 496 Final Report Page | 23 WHY? Customers who are detractors are 5 X more likely to attrite Customers struggle to formulate searches and access the documents they want, when they want them  Like with the product feedback comments, bolster web chat capabilities to better address immediate problems Why? As a technology platform, ease of use is a key portion to determining actual use10 Table 4. Insights and Recommendations In order to provide the most value in LexisNexis’ customer journey and improve upon the insights we have gathered, we have provided the following deliverables: 1. A categorization or “binning” of the text data to better identify the top customer concerns. Through topic modeling and sentiment analysis in R, we were able to identify the top 10 reasons for inbound customer service calls by product according to the call topic dataset. The top three were consistent for all product lines. These three reasons are 1) formulating searches, 2) troubleshooting access, and 3) how to use features/functions. In the email dataset, LDA modeling identified five distinguishable topics of concern for Lexis Advance described in Table 3. The five topics include customers interaction with the product at work, customers with requests for assistance when searching content citations and references, customers who have troubleshooting and access issues, customers with comments on search-related functions, and last customers with comments regarding legal matters and case law. All supporting R Scripts and LDA visualizations provide LexisNexis the ability to reproduce this analysis and keep these categories in mind when conducting future analyses. 2. A visualization or dashboard to gauge customer service and sentiment. Using R’s Interactive web application interface Shiny Dashboard, we have developed a dashboard merging the visualizations contained within this report into one interactive dashboard. We have provided all supporting R script as an attachment to a web copy of the report or by request. 3. An analysis or model relating the customer feedback data to the NPS. Through the development of a decision tree model predicting the categorization of NPS, our team was able to better identify the main drivers in user evaluation. These drivers include 10 Technology Acceptance Model. https://en.wikipedia.org/wiki/Technology_acceptance_model. Retrieved April 27, 2016. 4/28/2016 Miami University ISA 496 Final Report Page | 24 Decision Authority – involvement in which product the customer uses –, Usage in the Last 30 days, and Browser Type. The extremely high correlation of the variable Continue to Use and the composition of questions within the NPS survey caused bottlenecks in relating other customer related data sets to NPS. In Table 4 and Figure 16, we provide a potential solution to improve the collection of NPS data for ease of analysis in the future. 4. An implementable process that will help LexisNexis assess changes over time, show correlation between data streams, as well as correlation between usage and evaluation. Finally, the following page contains a process we believe has the ability to enable LexisNexis to prevent customer attrition, improve product development and increase returns on a growing number of promoters. 4/28/2016 Miami University ISA 496 Final Report Page | 25 Figure 16. Flow Map of Process to Manage Customer Feedback 4/28/2016 Miami University ISA 496 Final Report Page | 26 V. CONCLUDING REMARKS We hope that our analysis of the three primary customer data streams has provided LexisNexis with a better understanding of customer sentiment related to their products. Utilizing this data in a more automated and standardized way, will al low LexisNexis to continue to be a market leader and further distance themselves from competitors. LexisNexis believes that when in the right hands, the information and technology they provide can enable people to change the world. Giving LexisNexis the tools, insights, and recommendations to utilize the sentiment of those people will enable them to deliver on that mission. We thank the LexisNexis team for their time and effort in working with Miami University to improve their organization. Please refer to the accompanying materials including our appendix, presentation, and all code necessary to conduct analysis and visualizations to deliver our solutions. If questions regarding our solution and its deployment come up, please do not hesitate to contact team representative Lauren Curtis at [email protected]. 4/28/2016 Miami University ISA 496 Final Report Page | 27 APPENDIX The following appendix contains supporting documentation, examples, and/or additional visualizations not included in the final report. Figure and table numbering aligns with the section that the appendix supports. Appendix A: Project Approach and Outcomes Tools Description JMP Pro Interactive software for desktop statistical discovery11 R Free software environment for statistical computing and graphics12 Total Mail Converter Convert Emails (MSG, EML) to PDF, TXT, DOC, PST in batch via user interface or command line13. ~ $60 for a license. Microsoft Excel and Power BI Excel 2013 Add-in for data preparation and visual analysis14 Python A programming language that lets you work quickly and integrate systems effectively15 Microsoft Vizio Tool for process mapping and workflow visualization16 SAS Enterprise Miner Solution to create accurate predictive and descriptive models on large volumes of data across different sources of the organization17 TABLE A-1. Tools Utilized in Project 11 http://www.jmp.com/en_us/software/jmp-pro.html 12 R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. 13 Total Mail Converter (2016). Computer software. Coolutils.com. N.p., n.d. Web. URL https://www.coolutils.com/TotalMailConverter. 14 https://powerbi.microsoft.com/en-us/ 15 Python (2016). Computer software. Python. Vers. 3.5.1. Python Software Foundation, n.d. Web. URL https://www.python.org/. 16 https://products.office.com/en-us/Visio/flowchart-software 17 http://www.sas.com/en_ph/software/analytics/enterprise-miner.html 4/28/2016 Miami University ISA 496 Final Report Page | 28 Appendix B: Data Retrieval FIGURE B-2. Example Spectrum Tool 4/28/2016 Miami University ISA 496 Final Report Page | 29 FIGURE B-3. Example NPS Questionnaire Questions 4/28/2016 Miami University ISA 496 Final Report Page | 30 FIGURE B-4. Feedback Email Data Collection 4/28/2016 Miami University ISA 496 Final Report Page | 31 Appendix C: Data Understanding FIGURE C-1. Distribution of Key Variables in the Call Topics Dataset FIGURE C-2. Distribution of Key Variables in the NPS Dataset 4/28/2016 Miami University ISA 496 Final Report Page | 32 Appendix D: Data Preparation TABLES D-1: D-3. Index of Recoding for Call Types, CSR Splits, and Market in Data Preparation Appendix E: Data Analysis & Modeling List E-1. Steps to creating a Document Term Matrix  Read in CSV’s to create a data frame (essentially a data table similar to excel)  Created a Corpus from Comments column of data frame  Pre-processed corpus o remove punctuation and numbers o remove stop words o convert text to lowercase o stemmed words to root word  Created a document term matrix(DTM) of remaining words o Removed any documents that have no words as a result of pre-processing Code Call Type 1 Access Product/Service 2 Account Mgt 3 Consultant 4 Doc Delivery 5 Search 6 Usability 7 Unknown Code CSR SPLIT Desc 1 LEGL Legal Research 2 LNC Old System Category 3 NF News & Financial 4 NONC NA 5 OPER Access 6 TECH Technical Support 7 Unknown NA Code Market 1 Academic associated fields 2 Bar 3 Consumers 4 Corporate associated fields 5 External 6 Government 7 Internal 8 Law firm associated fields 9 N/A 10 Unknown 4/28/2016 Miami University ISA 496 Final Report Page | 33 Appendix F. Data Visualization FIGURE F-1. NPS Decision Tree Model 4/28/2016 Miami University ISA 496 Final Report Page | 34 FIGURE F-2. Decision Tree Cumulative Lift The above Cumulative Lift chart shows that the Decision Tree model created to predict NPS is only roughly 16% more accurate than random guessing. This supports the recommendation that the available inputs and structure of the questionnaire do not support model building as it could. TABLE F-1. Average NPS Rating by Whether Customers will Continue Use of LexisNexis 4/28/2016 Miami University ISA 496 Final Report Page | 35 Figure F-3. Lexis Advance LDA Visualization 1. Left Panel: Inter-Topic Distance Map  “Topic Landscape”  Provides sense of topic similarity by approximating distances between topics  Inter-topic differences calculated by the Jensen-Shannon divergence  Scaling the set of inter-topic distances defaults to Principal Components  Inter-topic differences mapped to PC allow user to analyze correlations between topics 3. Top Panel Slider 2. Right Panel 1. Left Panel 4/28/2016 Miami University ISA 496 Final Report Page | 36  Size of circle represents term-frequency or token frequenc 2. Right Panel: Top 30 Most Salient Terms  Chuang’s Key Terms i. quantities measure how much information a term conveys about topics by computing the Kullback-Liebler divergence between the term & the marginal distribution of topics (distinctiveness), which is optionally weighted by the terms overall frequency (saliency)  Relevance: method and/or measure to rank terms within topics and interpret topics i. Compromise between the probability of word given the topic and the probability within topic divided by overall frequency of the word ii. How does a given term(w) apply to a topic(t) and how frequently is it found in t iii. Most relevant terms are displayed in the bar chart on the right side of visualization  Comparing the widths of the red and gray bars for a given term, users can quickly understand whether a term is highly relevant to the selected topic because of its lift (a high ratio of red to gray), or its probability (absolute width of red) i. Red = Relevance to selected topic ii. Gray = Overall term frequency in text corpus 3. Top Panel Slider: Relevance Metric  Adjust lambda in relevance metric calculation  Lambda addresses the compromise presented in relevance metric i. Lambda = 1: rank words solely on the width of the red bar (favors common words) ii. Lambda = 0: rank words solely on ratio of red to gray (favors rare words)  Recent study claims ideal lambda is a value equal to .6 4/28/2016 Miami University ISA 496 Final Report Page | 37 Figure F-4. Lexis Advance LDA Visualization Four Topics (Top) and Six Topics (Bottom) 4/28/2016 Miami University ISA 496 Final Report Page | 38 4/28/2016 Miami University ISA 496 Final Report Page | 39 Figures F-5: F-7. Classification Count by Topic of Lexis Advance (F-5), NewLexis (F-6) and Research (F-7) Figure F-8. Example Reformatted NPS Questionnaire Answers

辅导案例-ISA 496

Related

Previous Post辅导案例-CAB401:

Next Post辅导案例-COMP2009

Author admin