Artificial Intelligence for Business

Machine Learning

Welcome to exploring Machine Learning (ML), a key driver in the modern business and technology innovation era. This course is situated at the intersection of computer science, statistics, mathematics, and optimization, forming the foundation of modern Artificial Intelligence. ML has emerged as a critical tool, revolutionizing various industries. The emergence of powerful algorithms, combined with recent growth in computational power and availability of massive amounts of data, has enabled vast and varied ML applications, from predictive analytics in finance to advanced customer insights in retail and from optimizing complex supply chains to developing cutting-edge models like Large Language Models (LLMs), e.g., ChatGPT, and self-driving cars.

Throughout this course, we will delve into both the theoretical aspects of ML and its practical applications in real-world business scenarios. You will learn how ML models process and analyze large datasets to extract meaningful insights, inform strategic decisions, and generate business value across different sectors. Our curriculum covers various topics, including traditional ML techniques and the nuances of neural networks. By the end of this course, you will be equipped with the knowledge and skills to leverage ML in solving complex business challenges, ready to thrive in a data-driven business environment.

Student Feedback: [2025]

Announcements

Course starts on March 25, 2025. Look forward to seeing you in class!

A. Course logistics

We designed this course with specific policies to foster a productive and engaging educational atmosphere. We encourage you to carefully review these guidelines, which are fundamental to our collective learning experience.  

A.1. Teaching modality: This course is conducted exclusively in person. As we aim to enhance the quality of our interactive discussions, there will be no live streaming or recording of the lectures. Moreover, to foster a dynamic and participatory classroom environment, we are committed to providing a safe and inclusive space, encouraging every student to engage in discussions confidentially (within our classroom environment). In light of these goals, we consciously opt against recording our sessions.

A.1.2. Recording policy: To uphold the privacy and intellectual property rights of everyone in our course, we strictly forbid any form of digital recording during our sessions. These forms include video, audio, screenshots, and photographs. Violating this policy is prohibited under any circumstances, and storing recorded material from our lectures can present significant liability.

A.1.3 Attendance Policy: Regular attendance is essential for success, as sessions build on prior material and require active participation. Students must attend all classes and arrive on time.

  • Absences: Absences should be rare and limited to legitimate reasons (e.g., illness, emergencies, religious observances). Notify the instructor promptly, ideally before class, and submit a plan to complete missed work within one week, subject to approval. More than one unexcused absence requires a meeting with the instructor to continue attending. Excessive absences (over 10% of sessions) will automatically result in a failing grade (F).
  • Punctuality: Arriving over 10 minutes late counts as a partial absence; three late arrivals equal one absence.
  • Accommodations: Students with ongoing challenges (e.g., disabilities, family obligations) should contact the instructor early to discuss accommodations, per NYU Academic Affairs policies.

A.2. Communication with the Course Staff: We highly encourage you to visit us during our open-door office hours for any queries or discussions, as this is the most effective way to communicate with the course staff.

A.2. 1 Email Communication: timely and detailed email responses may not always be possible. To ensure your questions are addressed efficiently and thoroughly, we strongly encourage you to bring them to office hours or ask during class when appropriate. While email can be used for brief matters, it is not the preferred channel for discussing course content.

A.2.2. Office hours:  To view our available time slots, please refer to the right column of our course website.

A.2.3. Confidential matters: if you need to schedule an appointment for confidential discussions about course content or logistics, please schedule a time slot with one of the course staff members. Please refer to the 'Exceptional Circumstances' sections below. That section offers guidance on how to approach such conversations. Our discussions will be strictly limited to course-related topics and logistical concerns, and we will respectfully decline to engage in conversations extending beyond these subjects to maintain the professional and focused nature of our interactions.

A.2.4. Required notice: Please plan and provide us with sufficient notice for your requests. It is essential to understand that we may be unable to address immediate needs arising from last-minute planning. Your proactive planning dramatically enhances our ability to manage your queries effectively.

A.2.5. Exceptional circumstances policy: If you have a confidential matter to discuss with the course staff, please be aware that the course staff cannot assess personal issues or special requests due to exceptional circumstances, such as illness, family issues, etc. When pursuing such matters or seeking an exemption, your initial point of contact should be the Academic Affairs Office. It is necessary to provide them with the relevant documentation about your situation. The office will thoroughly review your case, considering the specifics of your circumstances. Following their evaluation, the Academic Affairs Office will communicate directly with the course staff, informing us whether an exception is warranted and offering guidance on the appropriate actions. This process ensures that all requests are considered fairly and consistently, per the university's policies and procedures. For detailed information and guidance, students are encouraged to refer to the university's policy for the undergraduate program here. This policy provides comprehensive guidelines for managing exceptional circumstances within the university framework, ensuring transparency and fairness in all decisions.

A.2.6. Instruction language: English is the mandatory language of instruction at NYU campuses, except for specific language courses. This policy ensures consistency and accessibility for all students across lectures, office hours, and all written communication. If somebody poses a question in a language other than English, the instructor or teaching assistant must request them to rephrase it in English, with responses given exclusively in English. This approach fosters a more inclusive learning environment, ensuring all students can understand the question and the answer regardless of their linguistic background. Even in cases where all class or meeting participants share the same non-English language, the policy still applies. While it may seem convenient to use a local language in monocultural interactions, it's essential to consider the goal of thriving in a broader global academic and professional environments.

B. Course contents

This course offers an introductory exploration of Machine Learning (ML), a pivotal technology in Artificial Intelligence that's reshaping how businesses operate and innovate. Focused on both the theoretical foundations and practical applications of ML, we'll examine how these powerful tools can analyze vast datasets to drive decision-making and solve real-world business problems. Covering a range of topics from traditional ML techniques to the latest in neural networks, the course prepares students to apply these concepts across various business domains, equipping them with the skills needed for a data-driven professional landscape.

B.1. Learning outcomes

At the end of this course, we expect that you will be able to

  • Distinguish the foundational principles and the purpose of the different categories of ML models.
  • Identify opportunities to use ML models in practical applications and select the appropriate methods to model data, extract insights, illuminate structure, and make predictions.
  • Implement the models, algorithms, at all levels of the ML pipeline, i.e., cleaning, sampling, preprocessing data, and running models in Python.
  • Evaluate the performance of ML models using statistical techniques.
  • Develop a mature view of the impact of ML in society and reason about its ethical implications.
  • The techniques you learn in this course apply to numerous business problems and serve as the foundation for further study in any application area you pursue.

B.1.1 Topics

This course will address selected ML techniques and actively engage students with the following tentative list of topics:

  • Supervised Learning: regression and classification
  • Model selection and assessment
  • Tree-based methods
  • Maximum Likelihood Estimation
  • Deep Neural Networks
  • Unsupervised Learning: PCA and dimensionality reduction, Clustering and Expectation Maximization,
  • ML and society (Ethics, Fairness, etc.)

B.2. Prerequisites

This course assumes no previous knowledge of ML. If you have significant ML experience, there is no need to take this class.

Formal: (i) Introduction to Computer Programming and (ii) Calculus

We will also draw basic concepts from the following courses, which we will quickly review in class(i) Linear Algebra, (ii) Probability, (iii) Multivariate Calculus, and (iv) Algorithms.

B.2.1 Standing requirement: Sophomores and up

B.3. Textbook

We will assign readings and exercises from the following book:

  • An Introduction to Statistical Learning, Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, 2nd Edition, Springer, 2021 (e-book available for free)

Given the novelty of many topics, current textbooks may not comprehensively address them. Instead, we will curate our materials from various online sources, including notes from other courses, publicly accessible assignments, books, and lecture notes. We will ensure you have access to these resources either through direct copies or links.

Additionally, we will assign readings from selected textbooks, some of which are freely available for download online. These texts have been chosen for their relevance and depth of information, aligning closely with our course content.

  • Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press, 2016 (link)
  • Dive into Deep Learning, Aston Zhang, Zack Lipton, Mu Li, Alex Smola (e-book available for free)
  • Artificial intelligence: A modern approach, by Russell, Stuart J., 2010, Prentice-Hall, ISBN:9780132071482
  • Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, by Foster Provost and Tom Fawcett, 2013, O'Reilly Media, Incorporated (e-book available for free via NYU Library)

B.4. Lecture slides

The course staff will upload the slide deck for each lecture shortly before the session begins. Please be aware that these slides serve primarily as visual aids to facilitate class discussions and should not be considered a substitute for the assigned readings. They complement the material rather than summarize it. As such, a thorough grasp of the course content will require engagement with the complete assigned readings.

B.5. Mathematics 

At its core, AI builds on algorithms founded on fundamental mathematical concepts. This course requires familiarity with basic mathematical principles, including linearity, conditional probability, Bayes' rule, multivariate derivatives, asymptotic notation, and statistical tests. In our first lecture, we will provide a math self-assessment to gauge your understanding and offer resources for mastering these essential concepts. While theorem proofs are not a course requirement, a solid grasp of the basic mathematical notation and programming concepts related to these topics is crucial for actively engaging in and comprehending the course discussions.

B.6. Software

  • The course requires access to the Python 3 programming language environment. We recommend installing Miniconda.
  • We recommend Overleaf (free for NYU affiliates) for typing math. This platform uses Latex, which is easy to use (see a good reference guide here).
C. Coursework

Please note that this is a 2-credit course that spans over seven weeks instead of the standard 14 weeks. Despite the condensed timeline, the weekly workload for this course is comparable to a regular 4-credit course. This course's rigorous nature necessitates allocating sufficient time and focused effort to ensure that you stay on track and meet the learning objectives within the designated timeframe.

C.1. Homework policy

  • Late assignments will have their grades divided by 2 for each late day. For example, submitting the day following the deadline halves your grade, two days divides your grade by 4, and so on.
  • In unexpected circumstances where you must miss a deadline, we will drop one assignment with the lowest grade. No other exception will be allowed.
  • We will grade Homework sets based on your ability to demonstrate the applicability of the concepts you learned to solve the problems. In open questions, when you show knowledge and exercise sound judgment to arrive at a solution, we will assign full credit to that answer, even if the solution is not entirely correct.

C.2. Generative AI in coursework Generative AI is a remarkable tool in modern education, offering personalized interactions with human knowledge. Recognizing its potential to enhance learning, we encourage students to utilize Generative AI resources as an aid tool for homework assignments and preparation of student lecture content. Still, you must fully understand what you submit and write your work in your own words. However, it is important to note two important exceptions to the use of Generative AI in this course:

  1. Final Exam: The final Exam is a traditional, closed-book format within the classroom setting. This format ensures a fair assessment of individual knowledge and understanding, independent of external AI assistance or other sources.
  2. Student Lectures: While you may use Generative AI in the preparation phase, the student must personally deliver the presentation of student lectures. The use of AI avatars or voices for the presentation is not permitted. This policy aims to develop and assess your presentation skills and ability to communicate complex ideas effectively.

In all uses of Generative AI, students must adhere to academic integrity principles, ensuring that all work submitted is their own and that any AI-generated content or ideas are appropriately credited. 

C.3. Academic integrity: We aim to foster a fair and supportive learning environment at NYU Shanghai. Upholding the highest standards of academic integrity is paramount in this course. The university will take any breach of academic integrity seriously, and any violations may result in severe consequences. Academic misconduct, such as plagiarism, failing to cite sources properly, or submitting work produced by others as your own, are considered serious violations. We emphasize the importance of originality and proper attribution in all your academic endeavors, as these are fundamental to the principles of intellectual honesty and scholarly practice.

C.3.1. Honor Pledge: In some cases, we may require students to sign the honor pledge and submit it with assignments and exams (in those cases, the course staff will only grade submissions with an honor pledge signature). For convenience, we will share a pledge template here: "I affirm that I will not give or receive any unauthorized help on this academic activity and that all work I submit is my own."

C.4. Collaboration policy

  • You may discuss the problem sets with other students in the class. Still, you must understand what you submit fully and write your solutions separately in your own words.
  • We will check for plagiarism using powerful algorithms that are hard to fool (please don't try).
  • As a general academic principle, students must name any sources or collaborators with whom they discussed problems.

C.5. Learning Management System Homework sets and assignments are going to be posted and announced on Brightspace.

C.6. Learning assessment

  • Exams: We will have a closed-book in-class Final Exam, which you will have 75 minutes to complete, and we will allow one two-sided cheat sheet (size 8.5x11"). The material we test on the exams is cumulative, and the Final Exam includes everything discussed during the course.
  • Problem sets: We will offer weekly problem sets that include coding and a written component.
  • Student Lecture: See "Student Lecture "
  • Participation: An evaluation of how much you have contributed to the lectures.

C.6.1. Grading

  • Weekly Problem sets (20%)
  • Final Exam (35%)
  • Student Lecture (30%)
  • Participation (15%)

C.6.2 Makeup Exam Policy: This course does not provide a makeup exam. If you cannot attend the final Exam and have an approved exception from the Academic Affairs office, you may apply to receive an 'Incomplete' grade. When a student has completed all but a single requirement or a small amount of course work, and for good reason is unable to submit the remaining coursework by the end of the semester. Academic Affairs will review the request to make sure that it meets the above criteria. Note that Academic Affairs will not approve requests to make up a significant amount of coursework. This policy will allow you to complete the course requirements during the subsequent course offering the following year. Considering this policy is essential, especially for those needing the course grade for graduation purposes. 

C.6.3 Regrade requests: Ensuring fair grading is a cornerstone of our course. If you feel that your assignment or Exam requires a re-evaluation, we ask that you visit us during office hours to discuss your concerns. A course staff member will document and present your points at our course staff meeting for a collective review. We commit to providing you with a response within 7-10 days following a comprehensive discussion of your request. Please be aware that we will meticulously reassess your entire submission when a regrade is requested. As a result, it is essential to understand that your final grade could either increase or decrease. If the student wishes to appeal the grade further due to grade miscalculations or misapplication of syllabus, area, or university policies. a formal written appeal should be submitted to the Assistant Dean for Academic Affairs.

C.6.4. Issues about performance: If you need help learning the material, contact us as soon as possible to discuss what is not working and how we can assist. It is challenging to address questions about performance at the end of the semester or after final grades are submitted. So, if you are worried about your learning, seek help early.

C.6.5. Pass/Fail Option: If you select the binary grade option (when available), to receive a passing grade, students need to (i) achieve at least 63% performance in the entire course and (ii) earn at least 41% of the grade of every assignment, exam, and project.

D. In-class code of conduct

D.1. Diversity Statement:  While we acknowledge individual identities and conditions, they are immaterial for interactions, learning and performance evaluation in the class. As such, they do not influence the course staff's judgments. We acknowledge our imperfections while fully committing to the work inside and outside our classrooms to build and sustain a campus community that increasingly embraces these core values. 

D.2. Electronic device policy

  • Students must turn off and put away cell/mobile/smartphones for the duration of the class (this is NYU Shanghai's policy)
  • Laptop screens can be distracting to students sitting behind you. If you open programs unrelated to the class, e.g., email, social media, games, videos, etc., please sit at the back of the classroom.

D.3. Instructor Goals: My primary aim is to impact student learning and growth positively. To achieve this, I strive to:

  • Cultivate critical and original thinking, laying a foundation for lifelong learning through engaging lectures, thought-provoking discussions, and relevant assignments.
  • Inspire students to lead lives of discovery, satisfaction, and impactful contributions.
  • Share my passion for the subject, igniting a similar enthusiasm among students.
  • Guide students towards mastery of the subject area, extending mentorship beyond the classroom.
  • Continuously developing new and stimulating course materials, assignments, and curricular initiatives.
  • Seamlessly integrating technology in the course to augment learning experiences.

The dates below are tentative. Depending on how the lectures develop, some topics may be covered earlier or later. We will populate the list as we progress.

WeekDateTopicReadingAssignments
13/25Introduction to the courseRequired: James-Witten-Hastie-Tibshirani, Ch. 1 (Introduction)

Math prep resources:

Course: deeplearning.ai, Mathematics for AI

Goodfellow-Bengio-Courville, Ch. 2, 3

Mathematics for Machine Learning by Garrett Thomas (Berkeley)

Talk

Predictive Learning and the Future of AI - Yann LeCun's at NYU Shanghai

Homework 0 (written)
3/27Supervised Learning: Regression, Model SelectionRequired: James-Witten-Hastie-Tibshirani, Ch. 2.1, 3.1-3.2 (Linear Regression)
24/1No Class: Spring Recess (April 1 - April 5) and Qingming Holiday
4/3No Class: Spring Recess (April 1 - April 5) and Qingming Holiday
34/8MSEs, Bias-Variance tradeoff, interpretabilityRequired: James-Witten-Hastie-Tibshirani, Ch. 2.2-2.2.2,2.3 (Bias-Variance) 
4/10KNN, Cross-validationRequired: James-Witten-Hastie-Tibshirani, Ch. 3.5, 5.1 (KNN, Cross-validaton)Homework 1 (coding)

Homework 1 (written)

44/13  
4/15Data issues, dimensionality reduction, RidgeRequired: James-Witten-Hastie-Tibshirani, Ch. 3.3-3.6, 6.2 (Data issues, Shrinkage Methods)
4/17The LassoRequired: James-Witten-Hastie-Tibshirani, Ch. 6.2 (Shrinkage Methods)
54/22Supervised Learning: Classification, Bayes Classifier, K-Nearest Neighbors

Perceptron introduction

James-Witten-Hastie-Tibshirani, Ch. 2.2.3, 4.1-4.2 (Classification)
4/24Perceptron, Logistic RegressionJames-Witten-Hastie-Tibshirani, Ch. 4.3 (Logistic Regression)Homework 2 (coding)

Homework 2 (written)

64/29Perceptron, Logistic Regression
5/1Model evaluation metricsJames-Witten-Hastie-Tibshirani, Ch. 10 (Deep Learning)

James-Witten-Hastie-Tibshirani, 4.4

Goodfellow-Bengio-Courville, Ch. 6

Russell-Norvig, Ch. 21.1-2

http://neuralnetworksanddeeplearning.com/

Goodfellow-Bengio-Courville, Ch. 6-8

75/6Deep Learning, maximum likelihood estimation, gradient descent 
5/8Deep Learning: loss functions, hidden layers, activation functionsHomework 3 (written)

Homework 3 (coding)

85/13Deep Learning: backpropagation, universal approximation theorem

 

 
5/15Final Exam 
5/15 

Student Lecture Deadline

 

Deadline: See Calendar or Brightspace.

Why: This project aims to expand your familiarity with business applications of Machine Learning (ML). The project will also exercise your ability to communicate what you have learned clearly to a broad audience, a critical skill for job interviews or technical meetings.

What: Your task is to study a business application of Machine Learning, ideally, one that you are particularly interested in, e.g., because it is related to your major, or you have worked on a project in which ML is applicable, or there is a dataset/problem you care about, etc. There are hundreds of ML applications in many different domains.

How: Your task is to produce a short lecture that teaches the application you have learned. The format of your presentation is a 5 to 7-minute video. You may optionally include a set of lecture notes, slides, etc. The critical goal is to keep the lecture self-contained and inform readers interested in that application, including those unfamiliar with your major. Your work is not meant to be a summary or a recitation of the paper you read. Instead the goal is to inform potential readers about your views of the method, especially about not readily available aspects in a quick first reading of the paper. You should help your reader to read between the lines and think critically about that piece of work, synthesizing the underlying ideas and your perspectives. You may assume that the viewers of your lecture have taken this course.

In your lecture, you will address the following points:

    • A comprehensive description of the domain and the problem you are studying (i.e., a student who is not familiar with your major should be able to understand your explanation. For example, if you're a finance major studying an application to "option valuation", you need to assume that some students who are not in finance may not know what "options" are.)
    • Insights: What are the properties of the problem that make it a good candidate for being solved with Machine Learning? Why do you expect that ML would work? Why is ML better than other methods in this context?
    • Modeling: Describe the components of the ML formulation, e.g., what are the modeling assumptions? Are they realistic?
    • Algorithm: What algorithms are used to solve the problem? How is training done?
    • Results and potential directions: What are the results? Are they significant and impactful? Do they improve on traditional approaches or baselines (and why)? What are the weaknesses of the methods? What are some promising further questions, and how could they be pursued?

The following short document below may serve as a reference for critical reading of a scientific paper or technical report: https://www.eecs.harvard.edu/~michaelm/postscripts/ReadPaper.pdf

Submission and privacy: Submit your work on Brightspace in mp4 format. Be careful with the size of your file, as Brighspace has trouble uploading a very large one. You may request that we keep your work confidential, otherwise we will add it to the video library of our course website after a thorough review by the instructor.

The course staff is here to help. Feel free to contact the instructor or the TA to discuss your lecture.

Video examples

See our Video Gallery section

Data sources

If you're interested in running your own experiments, here are a few data sources you can browse to search for a dataset:

Chinese Data Sets:

The NYI Shanghai Center for Data Science has created a portal for Chinese data sets: Chinese Datasets Archive

We encourage you to utilize our office hours for any queries or discussions. This approach is the most effective way to communicate with course staff, especially given the challenge of promptly responding to the high volume of emails. Office hours allow us to address your concerns quickly and comprehensively.

Please use the form provided below if you need to book a 15-minute office hour slot, schedule an appointment for confidential discussions about course content or logistics, or if you have a brief question, which we can quickly resolve via email. For any special requests related to exceptions to our syllabus rules or deadlines, refer to the 'Exceptional circumstances' section of the syllabus. This section offers clear guidelines on how to proceed with such requests.






    Homework 8


    Machine Learning!

    In this project you will build a neural network to classify digits, and more!

    Introduction

    This project will be an introduction to machine learning.

    The code for this project contains the following files, available as a zip archive.

    Files you'll edit:
    models.pyPerceptron and neural network models for a variety of applications
    Files you should read but NOT edit:
    nn.pyNeural network mini-library
    Files you will not edit:
    autograder.pyProject autograder
    backend.pyBackend code for various machine learning tasks
    dataDatasets for digit classification and language identification
    submission_autograder.pySubmission autograder (generates tokens for submission)

    Files to Edit and Submit: You will fill in portions of models.py during the assignment. Please do not change the other files in this distribution.

    Evaluation: Your code will be autograded for technical correctness. Please do not change the names of any provided functions or classes within the code, or you will wreak havoc on the autograder. However, the correctness of your implementation -- not the autograder's judgements -- will be the final judge of your score. If necessary, we will review and grade assignments individually to ensure that you receive due credit for your work.

    Academic Dishonesty: We will be checking your code against other submissions in the class for logical redundancy. If you copy someone else's code and submit it with minor changes, we will know. These cheat detectors are quite hard to fool, so please don't try. We trust you all to submit your own work only; please don't let us down. If you do, we will pursue the strongest consequences available to us.

    Proper Dataset Use: Part of your score for this project will depend on how well the models you train perform on the test set included with the autograder. We do not provide any APIs for you to access the test set directly. Any attempts to bypass this separation or to use the testing data during training will be considered cheating.

    Getting Help: You are not alone! If you find yourself stuck on something, contact the course staff for help. Office hours, section, and the discussion forum are there for your support; please use them. If you can't make our office hours, let us know and we will schedule more. We want these projects to be rewarding and instructional, not frustrating and demoralizing. But, we don't know when or how to help unless you ask.

    Discussion: Please be careful not to post spoilers.


    Installation

    For this project, you will need to install the following two libraries:

    You will not be using these libraries directly, but they are required in order to run the provided code and autograder.

    To test that everything has been installed, run:

    python autograder.py --check-dependencies

    If numpy and matplotlib are installed correctly, you should see a window pop up where a line segment spins in a circle:


    Provided Code (Part I)

    For this project, you have been provided with a neural network mini-library (nn.py) and a collection of datasets (backend.py).

    The library in nn.py defines a collection of node objects. Each node represents a real number or a matrix of real numbers. Operations on Node objects are optimized to work faster than using Python's built-in types (such as lists).

    Here are a few of the provided node types:

    • nn.Constant represents a matrix (2D array) of floating point numbers. It is typically used to represent input features or target outputs/labels. Instances of this type will be provided to you by other functions in the API; you will not need to construct them directly
    • nn.Parameter represents a trainable parameter of a perceptron or neural network
    • nn.DotProduct computes a dot product between its inputs

    Additional provided functions:

    • nn.as_scalar can extract a Python floating-point number from a node.

    When training a perceptron or neural network, you will be passed a dataset object. You can retrieve batches of training examples by calling dataset.iterate_once(batch_size):

    for x, y in dataset.iterate_once(batch_size):
        ...

    For example, let's extract a batch of size 1 (i.e. a single training example) from the perceptron training data:

    >>> batch_size = 1
    >>> for x, y in dataset.iterate_once(batch_size):
    ...     print(x)
    ...     print(y)
    ...     break
    ...
    
    

    The input features x and the correct label y are provided in the form of nn.Constant nodes. The shape of x will be batch_sizexnum_features, and the shape of y is batch_sizexnum_outputs. Here is an example of computing a dot product of x with itself, first as a node and then as a Python number.

    >>> nn.DotProduct(x, x)
    
    >>> nn.as_scalar(nn.DotProduct(x, x))
    1.9756581717465536

    Question 1 (24 points): Perceptron

    Before starting this part, be sure you have numpy and matplotlib installed!

    In this part, you will implement a binary perceptron. Your task will be to complete the implementation of the PerceptronModel class in models.py.

    For the perceptron, the output labels will be either or , meaning that data points (x, y) from the dataset will have y be a nn.Constant node that contains either or as its entries.

    We have already initialized the perceptron weights self.w to be a 1xdimensions parameter node. The provided code will include a bias feature inside x when needed, so you will not need a separate parameter for the bias.

    Your tasks are to:

    • Implement the run(self, x) method. This should compute the dot product of the stored weight vector and the given input, returning an nn.DotProduct object.
    • Implement get_prediction(self, x), which should return if the dot product is non-negative or otherwise. You should use nn.as_scalar to convert a scalar Node into a Python floating-point number.
    • Write the train(self) method. This should repeatedly loop over the data set and make updates on examples that are misclassified. Use the update method of the nn.Parameter class to update the weights. When an entire pass over the data set is completed without making any mistakes, 100% training accuracy has been achieved, and training can terminate.

    In this project, the only way to change the value of a parameter is by calling parameter.update(direction, multiplier), which will perform the update to the weights: weights←weights+direction⋅multiplier

    The direction argument is a Node with the same shape as the parameter, and the multiplier argument is a Python scalar.

    To test your implementation, run the autograder:

    python autograder.py -q q1

    Note: the autograder should take at most 20 seconds or so to run for a correct implementation. If the autograder is taking forever to run, your code probably has a bug.


    Neural Network Tips

    In the remaining parts of the project, you will implement the following models:

    • Q2: Regression
    • Q3: Handwritten Digit Classification
    • Q4: Language Identification

    Building Neural Nets

    Throughout the applications portion of the project, you'll use the framework provided in nn.py to create neural networks to solve a variety of machine learning problems. A simple neural network has layers, where each layer performs a linear operation (just like perceptron). Layers are separated by a non-linearity, which allows the network to approximate general functions. We'll use the ReLU operation for our non-linearity, defined as relu(x)=max(x,0). For example, a simple two-layer neural network for mapping an input row vector x to an output vector f(x) would be given by the function:

    where we have parameter matrices W1 and W2 and parameter vectors b1 and b2 to learn during gradient descent. W1 will be an i×h matrix, where i is the dimension of our input vectors x, and h is the hidden layer size. b1 will be a size vector. We are free to choose any value we want for the hidden size (we will just need to make sure the dimensions of the other matrices and vectors agree so that we can perform the operations). Using a larger hidden size will usually make the network more powerful (able to fit more training data), but can make the network harder to train (since it adds more parameters to all the matrices and vectors we need to learn), or can lead to overfitting on the training data.

    We can also create deeper networks by adding more layers, for example a three-layer net:

    Note on Batching

    For efficiency, you will be required to process whole batches of data at once rather than a single example at a time. This means that instead of a single input row vector x with size i, you will be presented with a batch of inputs represented as a b×i matrix X. We provide an example for linear regression to demonstrate how a linear layer can be implemented in the batched setting.

    Note on Randomness

    The parameters of your neural network will be randomly initialized, and data in some tasks will be presented in shuffled order. Due to this randomness, it's possible that you will still occasionally fail some tasks even with a strong architecture -- this is the problem of local optima! This should happen very rarely, though -- if when testing your code you fail the autograder twice in a row for a question, you should explore other architectures.

    Practical tips

    Designing neural nets can take some trial and error. Here are some tips to help you along the way:

    • Be systematic. Keep a log of every architecture you've tried, what the hyperparameters (layer sizes, learning rate, etc.) were, and what the resulting performance was. As you try more things, you can start seeing patterns about which parameters matter. If you find a bug in your code, be sure to cross out past results that are invalid due to the bug.
    • Start with a shallow network (just two layers, i.e. one non-linearity). Deeper networks have exponentially more hyperparameter combinations, and getting even a single one wrong can ruin your performance. Use the small network to find a good learning rate and layer size; afterwards you can consider adding more layers of similar size.
    • If your learning rate is wrong, none of your other hyperparameter choices matter. You can take a state-of-the-art model from a research paper, and change the learning rate such that it performs no better than random.
    • Smaller batches require lower learning rates. When experimenting with different batch sizes, be aware that the best learning rate may be different depending on the batch size.
    • Making the network too wide generally doesn't hurt accuracy too much. If you keep making the network wider accuracy will gradually decline, but computation time will increase quadratically in the layer size -- you're likely to give up due to excessive slowness long before the accuracy falls too much. The full autograder for all parts of the project takes 2-12 minutes to run with staff solutions; if your code is taking much longer you should check it for efficiency.
    • If your model is returning Infinity or NaN, your learning rate is probably too high for your current architecture.
    • Recommended values for your hyperparameters:
      • Hidden layer sizes: between 10 and 400
      • Batch size: between 1 and the size of the dataset. For Q2 and Q3, we require that total size of the dataset be evenly divisible by the batch size.
      • Learning rate: between 0.001 and 1.0
      • Number of hidden layers: between 1 and 3

    Provided Code (Part II)

    Here is a full list of nodes available in nn.py. You will make use of these in the remaining parts of the assignment:

    • nn.Constant represents a matrix (2D array) of floating point numbers. It is typically used to represent input features or target outputs/labels. Instances of this type will be provided to you by other functions in the API; you will not need to construct them directly
    • nn.Parameter represents a trainable parameter of a perceptron or neural network. All parameters must be 2-dimensional.
      • Usage: nn.Parameter(n, m) constructs a parameter with shape n×m
    • nn.Add adds matrices element-wise
      • Usage: nn.Add(x, y) accepts two nodes of shape batch_sizexnum_features and constructs a node that also has shape batch_sizexnum_features
    • nn.AddBias adds a bias vector to each feature vector
      • Usage: nn.AddBias(features, bias) accepts features of shape batch_sizexnum_features and biasof shape 1xnum_features, and constructs a node that has shape batch_sizexnum_features.
    • nn.Linear applies a linear transformation (matrix multiplication) to the input
      • Usage: nn.Linear(features, weights) accepts features of shape batch_sizexnum_input_features and weightsof shape num_input_featuresxnum_output_features, and constructs a node that has shape batch_sizexnum_output_features.
    • nn.ReLU applies the element-wise Rectified Linear Unit nonlinearity relu(x)=max(x,0). This nonlinearity replaces all negative entries in its input with zeros.
      • Usage: nn.ReLU(features), which returns a node with the same shape as the input.
    • nn.SquareLoss computes a batched square loss, used for regression problems
      • Usage: nn.SquareLoss(a, b), where a and b both have shape batch_sizexnum_outputs.
    • nn.SoftmaxLoss computes a batched softmax loss, used for classification problems
      • Usage: nn.SoftmaxLoss(logits, labels), where logits and labels both have shape batch_sizexnum_classes. The term "logits" refers to scores produced by a model, where each entry can be an arbitrary real number. The labels, however, must be non-negative and have each row sum to 1. Be sure not to swap the order of the arguments!
    • Do not use nn.DotProduct for any model other than the perceptron

    The following methods are available in nn.py:

    • nn.gradients computes gradients of a loss with respect to provided parameters.
      • Usage: nn.gradients(loss, [parameter_1, parameter_2, ..., parameter_n]) will return a list [gradient_1, gradient_2, ..., gradient_n], where each element is an nn.Constant containing the gradient of the loss with respect to a parameter.
    • nn.as_scalar can extract a Python floating-point number from a loss node. This can be useful to determine when to stop training.
      • Usage: nn.as_scalar(node), where node is either a loss node or has shape 1×1.

    The datasets provided also have two additional methods:

    • dataset.iterate_forever(batch_size) yields an infinite sequences of batches of examples.
    • dataset.get_validation_accuracy() returns the accuracy of your model on the validation set. This can be useful to determine when to stop training.

    Example: Linear Regression

    As an example of how the neural network framework works, let's fit a line to a set of data points. We'll start four points of training data constructed using the function y=7x0+8x1+3. In batched form, our data is:

    Suppose the data is provided to us in the form of nn.Constant nodes:

    >>> x
    
    >>> y
    

    Let's construct and train a model of the form f(x)=x0⋅m0+x1⋅m1+b. If done correctly, we should be able to learn than m0=7, m1=8, and b=3.

    First, we create our trainable parameters. In matrix form, these are:

    Which corresponds to the following code:

    m = nn.Parameter(2, 1)
    b = nn.Parameter(1, 1)

    Printing them gives:

    >>> m
    
    >>> b
    

    Next, we compute our model's predictions for y:

    xm = nn.Linear(x, m)
    predicted_y = nn.AddBias(xm, b)

    Our goal is to have the predicted y-values match the provided data. In linear regression we do this by minimizing the square loss:

    We construct a loss node:

    loss = nn.SquareLoss(predicted_y, y)

    In our framework, we provide a method that will return the gradients of the loss with respect to the parameters:

    grad_wrt_m, grad_wrt_b = nn.gradients(loss, [m, b])

    Printing the nodes used gives:

    >>> xm
    
    >>> predicted_y
    
    >>> loss
    
    >>> grad_wrt_m
    
    >>> grad_wrt_b
    

    We can then use the update method to update our parameters. Here is an update for m, assuming we have already initialized a multiplier variable based on a suitable learning rate of our choosing:

    m.update(grad_wrt_m, multiplier)

    If we also include an update for b and add a loop to repeatedly perform gradient updates, we will have the full training procedure for linear regression.


    Question 2 (24 points): Non-linear Regression

    For this question, you will train a neural network to approximate sin⁡(x) over [−2π,2π].

    You will need to complete the implementation of the RegressionModel class in models.py. For this problem, a relatively simple architecture should suffice. Use nn.SquareLoss as your loss.

    Your tasks are to:

    • Implement RegressionModel.run to return a batch_size x 1 node that represents your model's prediction.
    • Implement RegressionModel.get_loss to return a loss for given inputs and target outputs.
    • Implement RegressionModel.train, which should train your model using gradient-based updates.

    There is only a single dataset split for this task, i.e. there is only training data and no validation data or test set. Your implementation will receive full points if it gets a loss of 0.02 or better, averaged across all examples in the dataset. You may use the training loss to determine when to stop training (use nn.as_scalar to convert a loss node to a Python number).

    python autograder.py -q q2

    Question 3 (24 points): Digit Classification

    For this question, you will train a network to classify handwritten digits from the MNIST dataset.

    Each digit is of size 28×28 pixels, the values of which are stored in a 784-dimensional vector of floating point numbers. Each output we provide is a 10-dimensional vector which has zeros in all positions, except for a one in the position corresponding to the correct class of the digit.

    Complete the implementation of the DigitClassificationModel class in models.py. The return value from DigitClassificationModel.run() should be a batch_sizex10 node containing scores, where higher scores indicate a higher probability of a digit belonging to a particular class (0-9). You should use nn.SoftmaxLoss as your loss.

    For both this question and Q4, in addition to training data there is also validation data and a test set. You can use dataset.get_validation_accuracy() to compute validation accuracy for your model, which can be useful when deciding whether to stop training. The test set will be used by the autograder.

    To receive points for this question, your model should achieve an accuracy of at least 97% on the test set. For reference, our staff implementation consistently achieves an accuracy of 98% on the validation data after training for around 5 epochs.

    To test your implementation, run the autograder:

    python autograder.py -q q3

    Question 4 (28 points): Language Identification

    Language identification is the task of figuring out, given a piece of text, what language the text is written in. For example, your browser might be able to detect if you've visited a page in a foreign language and offer to translate it for you. Here is an example from Chrome (which uses a neural network to implement this feature):

    translation suggestion in chrome

    In this project, we're going to build a smaller neural network model that identifies language for one word at a time. Our dataset consists of words in five languages, such as the table below:

    WordLanguage
    discussedEnglish
    eternidadSpanish
    itseänneFinnish
    paleisDutch
    mieszkaćPolish

    Different words consist of different numbers of letters, so our model needs to have an architecture that can handle variable-length inputs. Instead of a single input x (like in the previous questions), we'll have a separate input for each character in the word: x0,x1,…,xL−1 where is the length of the word. We'll start by applying a network finitial that is just like the feed-forward networks in the previous problems. It accepts its input x0 and computes an output vector h1 of dimensionality d: h1=finitial(x0)

    Next, we'll combine the output of the previous step with the next letter in the word, generating a vector summary of the the first two letters of the word. To do this, we'll apply a sub-network that accepts a letter and outputs a hidden state, but now also depends on the previous hidden state h1. We denote this sub-network as f. h2=f(h1,x1)

    This pattern continues for all letters in the input word, where the hidden state at each step summarizes all the letters the network has processed thus far: h3=f(h2,x2)"

    Throughout these computations, the function f(⋅,⋅) is the same piece of neural network and uses the same trainable parameters; finitial will also share some of the same parameters as f(⋅,⋅). In this way, the parameters used when processing words of different length are all shared. You can implement this using a for loop over the provided inputs xs, where each iteration of the loop computes either finitial or f.

    The technique described above is called a Recurrent Neural Network (RNN). A schematic diagram of the RNN is shown below:

    Here, an RNN is used to encode the word "cat" into a fixed-size vector h3.

    After the RNN has processed the full length of the input, it has encoded the arbitrary-length input word into a fixed-size vector hL, where L is the length of the word. This vector summary of the input word can now be fed through additional output layers to generate classification scores for the word's language identity.

    Batching

    Although the above equations are in terms of a single word, in practice you must use batches of words for efficiency reasons. For simplicity, our code in the project ensures that all words within a single batch have the same length. In batched form, a hidden state hi is replaced with the matrix Hi of dimensionality batch_size×d.

    Design Tips

    The design of the recurrent function f(h,x) is the primary challenge for this task. Here are some tips:

    • Start with a feed-forward architecture finitial(x) of your choice, as long as it has at least one non-linearity.
    • You should use the following method of constructing f(h,x) given finitial(x). The first layer of finitial will begin by multiplying the vector x0 by some weight matrix W to produce z0=x0⋅W. For subsequent letters, you should replace this computation with zi=xiW+hiW_hidden using an nn.Add operation. In other words, you should replace a computation of the form z = nn.Linear(x, W) with a computation of the form z = nn.Add(nn.Linear(x, W), nn.Linear(h, W_hidden)).
    • If done correctly, the resulting function will be non-linear in both x and h
    • The hidden size d should be sufficiently large
    • Start with a shallow network for f, and figure out good values for the hidden size and learning rate before you make the network deeper. If you start with a deep network right away you will have exponentially more hyperparameter combinations, and getting any single hyperparameter wrong can cause your performance to suffer dramatically.

    Your task

    Complete the implementation of the LanguageIDModel class.

    To receive full points on this problem, your architecture should be able to achieve an accuracy of at least 81% on the test set.

    To test your implementation, run the autograder:

    python autograder.py -q q4

    Disclaimer: this dataset was generated using automated text processing. It may contain errors. It has also not been filtered for profanity. However, our reference implementation can still correctly classify over 89% of the validation set despite the limitations of the data. Our reference implementation takes 10-20 epochs to train.


    Submission

    Please submit models.py to HW8 (coding) on gradescope.

    This course is an introductory graduate-level machine learning for electrical and computer engineering students and covers fundamental algorithms in machine learning. No prior machine learning experience is required. If you have significant ML experience, there is no need to take this class. In particular, students may NOT enroll in this class if they have taken any one of CSE-GY 6923 (Intro grad ML), EE-UY 4423 (Intro UG ML), EL-GY 9133 (Advanced ML). Students with ML experience are encouraged to take graduate-level Probability (EL-GY 6303) in the Fall and advanced ML (Tandon remotely) in the Spring.

    There will be a significant programming component to this course and class/homework exercises.

    Teaching modalityin-person ONLY. There will be NO stream (zoom) and NO recording of the lectures. The course is designed for active discussion in class.

    Recording policy

    It is strictly prohibited to record video, audio, screenshots, or photograph the sessions, partially or entirely. The lectures will contain student contributions, voices, opinions, images, etc. We will create a protected and safe environment where students feel free and comfortable contributing to the sessions. The students' participation is confidential within our group and will not be recorded, distributed, and stored, on the Internet. It is a liability to store any recorded or photographed material from the lectures of this course on your device. Violators will be prosecuted within the legal jurisdiction applicable to the regions that this course spans.

    Communication with the Course Staff

    The best way to communicate with the course staff is to come to our office hours, because our inboxes are often in a state of overflow. We offer 4+ hours of support per week. If you think you need to send us an email, please follow the guidelines below.

    Email Policy

    Students are required to communicate through the contact form of our course website. All course staff members will receive your e-mail with a course and priority flag, which will help us notice your email in the middle of the daily email torrent we receive. Please allow at least 48 hours to receive a reply (we do not reply to emails over the weekend (Shanghai time), thus please plan accordingly). Emails sent to our personal email addresses will not be considered. If you have a private, personal matter to discuss with the course staff, see the "Exceptional Circumstances Policy" section below.

    Exceptional Circumstances Policy

    The course staff will not engage in conversations related to personal matters, exceptional circumstances, or requests. If you need to discuss such matters or need to request an exception,  you will be required to contact and send the appropriate documentation to the appropriate Coordinator of Student Advocacy, Compliance and Student Affairs below (depending on your home department). Their office will consider your case and thoroughly investigate your situation. The office will then contact the course staff, indicating whether the exception is justified and communicate a decision regarding the best course of action.

    Tandon students: If you are experiencing an illness or any other situation that might affect your academic performance in a class, please email Deanna Rayment, Coordinator of Student Advocacy, Compliance and Student Affairs, NYU Tandon. Deanna can reach out to your instructors on your behalf when warranted. The academic policies for Tandon can be found here.

    Courant students: Students with the last name that begins with the letter A-M have Betty Tsang, btsang@cs.nyu.edu , as their advisor. Students with the last name that begins with N-Z have James Paguyo, paguyo@cs.nyu.edu. Their phone numbers can be found here. The academic policies for Courant can be found here.

    All other students should contact Weikai (William) Chen, the Coordinator of Student Advocacy, Compliance and Student Affairs at NYU Shanghai.

    Learning outcomes

    At the end of this course, we expect that you will be able to

    • Distinguish the foundational principles and the purpose of the different categories of ML models.
    • Identify opportunities to use ML models in practical applications and select the appropriate ML methods to model data, extract insights, illuminate structure, and make predictions.
    • Implement the models, algorithms, at all levels of the ML pipeline, i.e., cleaning, sampling, preprocessing data, and running models in Python.
    • Evaluate the performance of ML models using statistical techniques.
    • Develop a mature view of the impact of ML in society and reason about its ethical implications.
    • The techniques you learn in this course apply to a wide variety of artificial intelligence problems and serve as the foundation for further study in any application area you choose to pursue.

    Topics

    This course will address selected ML techniques and actively engage students with the following tentative list of topics:

    • Reinforcement learning
    • Supervised Learning: regression and classification
    • Model selection and assessment
    • Tree-based methods
    • Maximum Likelihood Estimation
    • Deep Neural Networks
    • Unsupervised Learning: PCA and dimensionality reduction, Clustering and Expectation Maximization,
    • ML and society (Ethics, Fairness, etc.)

    Prerequisites

    Formal: Calculus, Probability and Statistics, Linear Algebra, and Python programming. No previous knowledge of ML is required.

    We will also draw basic concepts from the following courses:  (i) Multivariate Calculus,  (ii) Algorithms, (iii) Data Structures

    The course will require students to learn advanced programming paradigms, such as Object-Oriented Programing and data manipulation with Pandas, which we will review during the recitations.

    Textbook

    There is no textbook to purchase. Many of the topics we discuss are new and not sufficiently addressed in existing textbooks. We will draw material from different online sources, such as assorted notes from other courses and publicly available assignments, books and lecture notes (which we will provide).

    We are going to assign readings from the following textbooks, which are publicly available for free download on the Web:

    • An Introduction to Statistical Learning, James, Witten, Hastie and Tibshirani (link)
    • The Elements of Statistical Learning, Hastie, Tibshirani, Friedman (link)
    • Reinforcement Learning : An Introduction,Richard S. Sutton  and Andrew G. Barto  (link)
    • Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press, 2016 (link)

    Other useful references are:

    Mathematics: ML is all about algorithms implementing heuristics. The heuristics are inspired by mathematics. This course will require you to be fluent in basic concepts, such as linearity, conditional probability, Bayes rule, multivariate derivatives,  asymptotic notation, statistical test, etc.. While we will not require you to prove theorems, you need to be able to understand basic math and programming concepts and notation of the preceding courses  to be able to follow the discussion. We will distribute a math and coding self-assessment will be distributed in the first lecture.

    Software

    • The course requires access to the Python 3 programming language environment. We recommend installing Anaconda.
    • We recommend Overleaf (free for NYU affiliates) for typing your solutions. It’s based on Latex, which is easy to use and great for typing math. They have a good reference guide here.

    • To help you develop your programming skills, I am planning to sponsor a Datacamp subscription for each student for the duration of the course (expires TDB). There you can take over 100+ courses by expert instructors on topics such as importing data, data visualization or machine learning and learn faster through immediate and personalised feedback on every exercise.”

    Homework policy

    • Homework sets will be released on Tuesday at 9pm, with deadlines on Monday 9pm (Shanghai Time).
    • Late assignments will not be accepted.
    • In unexpected circumstances in which you need to miss a deadline, we will drop one assignment with the lowest grade. No other exception will be allowed.
    • Homework sets will be graded based on your ability to connect the concepts we discussed in class to solve the problems.

    Academic integrity:  We aim to build a fair, inclusive, and conducive environment for learning. NYU Shanghai expects you to maintain the utmost level of academic integrity in the course. The university will prosecute any violation of the code of academic integrity and can lead to failing the course and other severe penalties. In particular, plagiarism, appropriation of ideas without proper citation of the original source, or presenting text or code produced by others, when we expect to receive the products of your own work, represent severe instances of academic misconduct.

    Honor Pledge

    We require all student to sign the honor pledge and submit it together with every assignment and exam (submissions without an honor pledge signature will not be graded.) For your convenience we will addd the pledge to every written assignment header, but you still need to submit one for the coding ones.

    "I affirm that I will not give or receive any unauthorized help on this academic activity, and that all work I submit is my own.''

    Collaboration policy

    • You may discuss the problem sets with other students in the class, but the actual writing up of the solutions must be done separately in your own words. We will check for plagiarism using powerful algorithms that are hard to fool (please don't try).
    • Students must name any collaborators they discussed problems with at the top of each assignment (list at the top of each problem separately).

    NYU Classes (Learning Management System)

    We will post homework sets and assignments and announced on NYU Classes.

    Gradescope

    You will be submitting your homework solutions through Gradescope. Our course code is 5VJ8XE.

    Learning assessment

    • Exams: We will have 3 closed-book in-class exams. You will have 60 minutes to complete, and we will allow one two-sided cheat sheet (size, 8.5x11”).  The material we test on the exams is cumulative, and the final exam includes everything discussed during the semester.
    • Problem sets: We will offer weekly problem sets that include a coding and a written component.
    • Blog Post: All students will create a written Blog Post that shares opinions and insights about several applications of the concepts we discuss in class.
    • Project: See "Project"

    Grading

    • Weekly Problem sets (20%)
    • 3 exams (50%)
    • 1 Blog post (5%)
    • 1 Project (25%)

    Make up Exam Policy

    There will be no make up exams, except for the final exam. If you have an approved request for exception, you may miss one of the exams and we will replace your missed exam with the grade of your final exam, which is cumulative and tests everything discussed during the course. If you miss more than one exam you will automatically fail the course.

    Regrade requests

    Fair grading is essencial. If you believe your assignment/exam needs to be re-graded, you must come to one of our office hours slots and explain to one of our course staff members what makes it necessary for us to review your grade. The course staff member will take notes and bring your request to our course staff meeting. We will get back to you after a thorough discussion of your request. Please expect a response in at least 7-10 days. Because we will examine your entire submission in detail, your grade can go up or down as a result of a regrade request.

    Questions about performance

    If you are struggling in the class, contact me as soon as possible so that we can discuss what is not working, and how we can work together to ensure you are mastering the material. It is difficult to address questions about performance at the end of the semester, or after final grades are submitted: by Tandon policy no extra-credit or makeup work can be used to improve a students grade once the semester closes. So if you are worried about your grade, seek help from me early

    Pass/Fail Option

    If your department allows you to elect a pass/fail grade, to receive a passing grade you will need to (i) achieve at least 63% performance in the entire course and (ii) receive at least 51% in every assignment, exam, and project.

    Electronic device policy

    • Cell/mobile/smart phones must be turned off and put away for the duration of the class (this is NYU Shanghai’s policy)
    • Laptop screens can be distracting to students sitting behind you. If you open programs that are not related to the class, e.g., e-mail, social media, games, videos, etc., please sit at the back of the auditorium.

    Auditing

    Auditing is not allowed in this course.

    Diversity Statement

    We teach all students equitably and inclusively manner. We respect individual demographic characteristics, including but not limited to nationality, ethnicity, race, gender, age, religion, etc. For learning and performance evaluation, these individual identities are immaterial and do not influence the course staff judgments. We will promote diversity, equity and inclusion not only because diversity fuels excellence and innovation, but because we want to pursue justice. We acknowledge our imperfections while we also fully commit to the work, inside and outside of our classrooms, of building and sustaining a campus community that increasingly embraces these core values. Each of us is responsible for creating a safer, more inclusive environment.

    For this assignment, you will choose an application or theoretical work in any area of Machine Learning and write a blog post about it based on a critical read of one or more research papers. You may choose any area of ML that you're interested in. Ideally, the post should have at least 1600 words. The goal of the blog post is to familiarize yourself with a piece of previous work in the field. This writing assignment also serves as a literature review that may inform the proposal and development of your class project. However, you are not required to write about the same topic you will develop for your project.

    Follow the document below as a reference for critical reading of a scientific paper.

    https://www.eecs.harvard.edu/~michaelm/postscripts/ReadPaper.pdf

    The post is meant to inform potential readers about your views about the paper, especially about aspects that are not readily available in a quick, first reading of the paper. You should help your reader to read between the lines and think critically about this piece of research.

    Submission and privacy: Submit your work on Gradescope in PDF format. You will need to include an honor pledge on the top of your document (and submissions without it will not be graded. Here is the template:


    Honor Pledge :“I affirm that I will not give or receive any unauthorized help on this academic activity, and that all the work I submitted is my own.”

    Signature:


    Your write up is going to be kept confidential and is not going to be posted anywhere. Later on, we will discuss the opportunity for posting your work on the class website, with your consent, after a thorough review by the instructor. However, this is optional.

    You address the following points:

      • What is main technical content of the papers?
      • Why is it interesting in relation to the material of the course?
      • What are the weakness of the papers, and how could they be improved?
      • What are some promising further research questions in the direction of the papers, and how could they be pursued?

    Blog posts should not just be summaries of the papers you read; most of your text should be focused on synthesis of the underlying ideas, and your own perspective on the papers. Blog posts should be done individually (i.e. not in groups).

    The important goal is to keep your write up self-contained and to inform readers interested in that application, including those who are not familiar with your area. Your work is meant to inform potential readers about your views of the method, especially about aspects that are not readily available in a quick, first reading of the paper. You should help your reader to read between the lines and think critically about that piece of work.

    The outline below is a suggestion to produce a high quality post:

      • A comprehensive description of the domain and the problem you are studying (i.e., a student who is not familiar with your area should be able to understand your explanation. For example, if you're a finance student writing about an application to "option valuation", you need to assume that some students who are not in finance don't know what "options" are.)
      • Insights: What are the properties of the problem that make it a good candidate for being solved with The techinique you're studying? Why do you expect that approach to work?
      • Modeling: Describe the components of the ML formulation. What are the modeling assumptions? Are they realistic?
      • Algorithm/Methods: What algorithms or methods are used to solve the problem?
      • Results and potential directions: What are the results? Are they significant and impactful? Do they improve on traditional approaches or baselines (and why)? What are the weaknesses of the methods? What are some promising further questions and how could they be pursued?

    Here are a few examples of blog posts based on a single paper:

    Example 1: Blog post: Reinforcement Learning for fighting Covid   Paper: RL for fighting Covid (research paper)

    Example 2: Blog post: Artwork Personalization at Netflix Paper: News article recommendation

    Deadline: 3/29, 2021, Submit your materials via Gradescope.

    Goal: This assignment is meant to be a self-education exercise on an active area of ML. Your blog post may be novel and interesting to a broader audience of readers, but this is not required. We are going to grade your write-up based on how much you have understood the problem you're writing about.

    Note: When you read research papers, you will encounter several concepts you're not familiar with. This is the case for every level of education you have, MS, PhD, etc. For example, you may encounter Deep Reinforcement Learning in the papers you read, even though we haven't talked about this yet. The job of a researcher is to follow the references and either try to understand the concept or abstract what it is doing in the paper in a way that still allows you to understand the bigger picture.

    Reinforcement Learning: Because we have seen a great deal of Reinforcement Learning so far in the class, I will post some references below. But, again, feel free to choose any method or application in ML. There are hundreds of RL applications, in many different domains. This video talks about naturally applying RL in the real world:

    Business and Finance
    Computer/Data Science and Engineering

    This page lists many more applications in a number of different domains, such as education, health care, business management, etc.

    The course staff is here to help. Feel free to contact the instructor or the TA to discuss your topics.

    For the final project, you are going to produce a 5-minute video where you present how you applied the techniques introduced in this course to some data set of your choosing. The goal of the project is to prepare you to apply state-of-the-art methods to an application. The video you will produce is going to be posted on the course website, and you may use it as a demonstration of your work for future employers as part of your portfolio.
    Your first task is to pick a project topic, find a relevant dataset, and write a project proposal. You are to work in teams of 1-3 students. You may apply any techniques to just about any domain. The best class projects come from students working on topics that they're excited about. So, pick something that you can get excited and passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you're excited about (but be mindful of the limited time frame to complete the project).
    If you're already working on a research project that ML might be applicable to, then working out how to apply ML techniques to it will often make a very good project topic. Similarly, if you currently work in industry and have an application on which ML might help, that could also make a great project.
    Summary
    Teams of 1-3 students.  Be creative – think of new problems that you can tackle using the techniques you have learned.
    • Scope: ~40 hours/person
    • There are four deliverables:
      • Project proposal + Presentation: due 4/12
      • Project checkpoint - Milestone: due 4/26
      • Final video: due 5/15
    More details of the project deliverables below.
    Project proposal + Presentation

    The proposal consists of a 1-3 page document whose structure consists of answering the following questions:

    • What is the problem you are solving? Has it been addressed before?
    • What data will you use (how will you get it)?
    • What work do you plan to do to complete the project?
    • Which algorithms/techniques/models you plan to use/develop? Be as specific as you can.
    • How will you evaluate your method? How will you test it? How will you measure success?
    • What do you expect to submit/accomplish by the end of the semester?
    • Include a list of references, e.g., research papers, websites, etc.

    The projects should be original, i.e., haven't been done by anyone, including you, in the past, and you should observe the academic integrity policy of this course.

    Submit your proposal in PDF format on Gradescope.

    Proposal presentation

    Each group will prepare a short video, to be submitted in mp4 format, between 3-5 minutes to present their proposal. The course staff will going to provide feedback based on your presentation and report.

    Project Milestone

    The project milestone consists of a 2-3 page document is meant for the course staff to check in with you regarding your progress. Think of this as a draft of your final project but without your major results. After this step you will have two weeks to complete the project.

    • We expect that you have completed 40% of the project
    • Provide a complete picture of your project even if certain key parts have not yet been implemented/solved.
    • Include the parts of your project which have been completed so far, such as:
      • Revised introduction of your problem
      • Review of the relevant prior work
      • Description of the data collection process
      • Description or plots of any initial findings or summary statistics from your dataset
      • Description of any background necessary for your problem
      • Formal description of any important methods used
      • Description of general difficulties with your problem which bear elaboration
    • Make sure to outline the parts which have not yet been completed so that it is clear what you plan to do for the final version.

    Submit your milestone in PDF format on Gradescope.

    Final project format

    Video presentation, at most 5 minutes, to be submitted in mp4 format, and the complete code you used to develop your project. Optionally you may also submit an accompanying document (report), links to publicly available datasets (make sure you have permission to share) for replication purposes.

    Submission instructions: You will submit your materials on NYU Classes, one submission per group. Before submitting your work, find a "Project Team" slot available and add their members to it. Then submit your materials as a group.

    Academic honesty:  We will actively looking for plagiarism in your code submissions. You may use code written by others, but please make sure to acknowledge the source. We aim to build a fair, inclusive, and conducive environment for learning. NYU Shanghai expects you to maintain the utmost level of academic integrity in the course. The university will prosecute any violation of the code of academic integrity and can lead to failing the course and other severe penalties. In particular, plagiarism, appropriation of ideas without proper citation of the original source, or presenting text or code produced by others, when we expect to receive the products of your own work, represent severe instances of academic misconduct.

    Honor Pledge: We require all student to include and sign the following honor pledge and submit it together with the final project deliverables:

    "I affirm that I all the work I submitted was performed by my team, and any work that has not been done by our team is explicitly acknowledged.'' Students' signatures:

    Evaluation

    Projects will be evaluated based on:

    • The technical quality of the work: Does the technical material make sense? Are the methods tried reasonable? Are the proposed applications clever and interesting? Do the authors convey novel insights about the problem?
    • Significance: Did the authors choose an interesting or a "real" problem to work on, or only a small "toy" problem that can be solved with a trivial application of one of Python library functions? Is this work likely to be useful and/or have impact?
    • The novelty of the work, completeness of the solution, and the clarity of the presentation.

    In order for the course staff to be able to assign individual scores, we ask you to write down a brief summary of the individual contributions of each of the team members in the format outlined below at the end of each report:

    Example:
    ----------------
    Wei: Plotted graphs during data analysis, collected the data, preliminary data analysis, problem formulation, report writing
    Chris: Proposed the algorithm, Coding up the algorithm, running tests, tabulating final results
    ---------------

    Video examples: 

    See our Video Gallery section

    Data sources

    You are free to find any dataset you're interested in (as long as it allows you to apply a technique you learned in the course). Projects based on data sets emanating from PRC sources are highly desirable. Here are a few data sources you can browse to search for a dataset:

    css.php