Python data science

Hamish Gibbs

Recap

  • Great job!
  • We only have 4 days to go from introductory to advanced Python concepts.
    • Plus: programming tools like VSCode and git!
  • Classes and functions can be abstract, but they are the building blocks of what we will do today.
    • Hopefully today is more familiar to people who have used R!

Today: Python data science

  • Introduction to Python data science tools.
  • Introduction to a basic data science workflow.
  • This afternoon: collaborating on a data science project.

Tomorrow

  • Data science “challenge”
  • Predicting the nightly price of AirBnBs in London
  • See the Guidelines here.

Data science

  • Definition of data science:
    • “Extracting meaningful insights from data.”
  • Meaningful is important.
    • Use the tools of programming / statistics to create meaning from your data.
  • Usually, there is no “right” answer, just “better” and “worse” answers.
    • You exercise a lot of judgement.

Data science workflow

  • Data science is not just machine learning.
    • Most data science work is:
      • Data preparation
      • Data transformation
      • Method selection
        • Statistics / machine learning
      • Communicating results

Python data science tools

  • Today, we will learn about the most popular Python data science “stack”

Python data science tools

  • Tomorrow, we will use this “stack” to do our data science project
  • Exploratory analysis, data transformation
  • Regression model fitting and evaluation
  • Visualize results

R equivalents

Diving deeper

Tutorial #1: Selecting data

Tutorial #2: More selecting data

Data: Tutorials 1 and 2

  • Tutorials #1 and #2 come from the pandas-cookbook.
  • Go to the /data folder in the GitHub repository (link above).
  • Download the 311-service-requests.csv file and store it on your computer.

Packages: Tutorials 1 and 2

  • Install the required packages using your terminal in VSCode

    pip install pandas matplotlib scikit-learn
  • Trouble installing? Tell me!