Data Mining CSC533 Study guides, Class notes & Summaries

Looking for the best study guides, study notes and summaries about Data Mining CSC533? On this page you'll find 8 study documents about Data Mining CSC533.

All 8 results

Sort by

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset
  • Hands-On Exercise 6-1: Outlier Detection with Titanic dataset

  • Exam (elaborations) • 7 pages • 2024
  • Hands-On Exercise 6-1: Outlier Detection with Titanic dataset In this Hands-on exercise, you will learn. • How to use quantiles to detect the outliers in data (the Titanic Training dataset) Related DM Book Chapters/Sections: • Section 2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Related Hands-on Exercises: • Exercise 1-2 Apache Spark and Basic Statistics Finish the assignments shown below. Submit a word document (...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 5-2: Clustering with Spark - Part II
  • Hands-On Experiment 5-2: Clustering with Spark - Part II

  • Exam (elaborations) • 5 pages • 2024
  • Hands-On Experiment 5-2: Clustering with Spark - Part II
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 5-1: Clustering with Spark
  • Hands-On Experiment 5-1: Clustering with Spark

  • Exam (elaborations) • 4 pages • 2024
  • Hands-On Experiment 5-1: Clustering with Spark In this Hands-on exercise, you will learn. • How to use the k-means clustering algorithm in Apache Spark • How to handle data and features for clustering • Training and prediction for clustering • Evaluation for clustering Related DM Book Chapters/Sections: • Section 10.1 Cluster Analysis • Section 10.2 Partitioning Methods • Section 10.2.1 k-Means: A Centroid-Based Technique Submit a word document (or PDF) with answers/expl...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 4-2: Classification with Titanic dataset
  • Hands-On Experiment 4-2: Classification with Titanic dataset

  • Exam (elaborations) • 4 pages • 2024
  • Hands-On Experiment 4-2: Classification with Titanic dataset 2.2.1 (20pts) Assignment 1: Index the Gender values We have learned how to index values using StringIndexer in previous hands-on exercises • Write codes for indexing the gender values 1. Import a Class 2. Define an indexer – Input column: Gender – Output column: IndexedGender 3. Train and transform • Take a screenshot of running your codes and outputs using the show (5) function 3 Building a Model 3.1 Training and T...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 4-1: Classification with Spark
  • Hands-On Experiment 4-1: Classification with Spark

  • Exam (elaborations) • 7 pages • 2024
  • Hands-On Experiment 4-1: Classification with Spark In this Hands-on exercise, you will learn • Decision Tree classifier in Apache Spark • How to handle data, features, and training & testing data • Training & Testing • Evaluation Related DM Book Chapters/Sections: • Section 8.1 Basic Concepts • Section 8.2 Decision Tree DataFrame-based Spark ML is new, much easier, and better. However, some features are missing. The evaluator for DataFrame provides limited metrics only. Th...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II
  • Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II

  • Exam (elaborations) • 4 pages • 2024
  • Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II 1.3 Create DataFrames You can create your DataFrames using Assignment 1 1. Write spark codes to read the following data. (a) Only read the following four tables that will be used for this exercise i. orders ii. products iii. departments iv. order_products_train (b) Make sure that you read the “headers” as well i. Each CSV file of the dataset has a header line. ii. You can achieve this behavior by Assignment ...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 3-1: Frequent Pattern Mining with Spark
  • Hands-On Experiment 3-1: Frequent Pattern Mining with Spark

  • Exam (elaborations) • 6 pages • 2024
  • 2.4 Let’s try to practice answering some exercise questions Q1: List 3 most frequent itemsets of size 1. Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 Q3: Colby is purchased most frequently with what other product? Q4: What is the confidence for the rule: American → Cheddar 3 Submission: Find frequent patterns using FPGrowth from a real-world grocery store dataset Please read the related news article “Kroger Knows Your Shopping Patterns B...
    (0)
  • $10.49
  • + learn more
Hands-On Experiment 2-2: Data Warehousing with Hive
  • Hands-On Experiment 2-2: Data Warehousing with Hive

  • Exam (elaborations) • 78 pages • 2024
  • Objectives In this Hands-on exercise, you will learn 1. Practice PySpark SQL for data analytics. 2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE in PySpark. 3. Analyzing Driver Risk factor 4. Analyzing data using Data Warehousing/OLAP functions in Hive Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it rollup-query). Run it, check the results, and explain the differences. – Replace the GROUPING SETS ...
    (0)
  • $10.49
  • + learn more