M5 Assign 2

Use Decision Tree to find the relationship between dependent variable Y and independent variables X0 and X1 in the below data file. Upload a screenshot of the decision tree in the discussion.

M6 Assign 1

(1) Select columns: Goal, students_reached, and funding_status and create a new data-frame. (1 point)

(2) Create random train and test data-frames in 75:25 ratio. (1 point)

(3) Using K-means, cluster the train data-frame into two clusters. Use Goal and students_reached columns (only independent variables) for clustering (4 points)

(4) Plot the scatter plots before and after clustering. (2 points)

(5) Use predict() function and predict cluster labels for test data-frame. (2 points)

Upload one Jupyter Notebook file for assignment submission.

M8 Assign 1

Q1. Collect tweets for the keyword ‘GoFundMe’. Store the following columns in a pandas data-frame (3 point).

(1) created_at

(2) id

(3) text

Q2 Store the pandas data-frame in an excel file without index (2).

Q3. Answer Q1 and Q2 for another keyword – UMSL. Use ‘UMSL’ as the keyword to search tweets (5 points).

Note: (1) You should write separate code for Q3; (2) The number of tweets is equal to the number of tweets received from Twitter using single use of the following code: tweets= api.search(“keyword”). Generally, it varies between 15 to 25)

Submit one Jupyter notebook file  and two excel files.

M8 Assign 2

Collect tweets  for the  keyword ‘Donorschoose’ and store in a pandas data-frame with the following columns:

(1) tweet_text (2) tweet_id (3) retweet_count and (4) place.

Upload a screenshot of the data-frame. You can either directly upload the screenshot as an image or paste it in a MS word file and upload the file.

(Note: The number of tweets is equal to the number of tweets received from Twitter using single use of the following code: tweets= api.search(“Donorschoose”). Generally, it varies between 15 to 25).

M10 Assign 1

Collect at least 100 tweets and perform following steps:

Linear Model (4 points / 1 point each)

1 (a). Extract retweet_count and followers_count from tweets.

1 (b). Perform train test split in 70:30 ratio where 70% of the data is stored in the train data-frame and remaining data is stored in the test data-frame.

1 (c).  Build linear model with train data-frame using retweet_count as dependent variable and followers_count as independent variable.

1 (d). Predict retweet_count on the test data-frame.

Decision Tree Model ( 6 points/1 point each)

2 (a). Extract retweet_count and followers_count and transform retweet_count to 1 if retweet_count>0 and 0 otherwise.

2 (b) Perform train test split in 70:30 ratio where 70% of the data is stored in train data-frame and remaining data is stored in test data-frame.

2 (c) Build decision tree model with train data-frame using retweet_count as dependent variable and followers_count as independent variable.

2 (d) Predict retweet_count on test data-frame.

2 (e) Show model accuracy.

2 (f) Show confusion matrix

M10 Assign 2

Collect at least 100 tweets and perform following steps:

1. Transform retweet_count to 1 if retweet_count>0 and 0 otherwise.

2. Build decision tree model with retweet_count as dependent variable and followers_count as independent variable.

3. Upload a decision tree plot using either matplotlib or Graphviz.

M11 Assign 1

(1) Follow  EC2 tutorial_v2_10_29_2021.pdf and create a VM using AWS EC2 service. Document each step using screenshots and description in a Microsoft (MS) Word file (5 points).

(2) Download and install Anaconda software on the VM and run Jupyter notebook to print “Hello World.” Document each step using screenshots and description in a MS Word file (5 points).

Submit one MS word file for the assignment.

After completing the assignment, you should terminate the VM following these steps: (1) select the VM (select checkbox) –> From the “Instance state” dropdown list, select “Terminate Instance”.