M2 (Mandatory) Assignment

DSBA 2022 - M2: (Mandatory) Assignment

Introduction

Now it is time to bring most these steps together and apply them to a challenging setting. We expect you to get frustrated and learn from it 😅. Don’t give up too early.

You get a real-world dataset form a job-search platform.

https://github.com/aaubs/ds-master/raw/main/data/Job_search.zip

There are 4 dataframes in this dataset.

  • jobs.csv: detailed data about the job listings. title, description, location, etc.
  • user_work_interest.csv: self-identified interests of the users
  • user_past_experience.csv: previous work experience of the platform users
  • user_job_views.csv: “traffic data” users looking at jobs (implicit interest expression)

Task description

  1. Data augmentation: Use NLP techniques to create categories for the different positions (these are only free-text at the moment) and create a consistent industry column. Impute (predict) the education requirements for all jobs. You can use different strategies to create larger labels for the freetext, including (but not limitd to):
  • Hand labeling of several jobs, and use these to predict for the remaining labels.
  • Do a topic modeling on the job description, and label them with the topic most associated.
  • Do something else… (eg. cluster job descriptions by TFIDF-weighted DTM…)
  1. Use a 2-mode network approach based on the “traffic-data” to create job recommendation. Example:
    • create a network user_id - job_id
    • project to user_id
    • recommend to user_i what user_y also looked at if they are close to each other e.g. 1 jump away (and job is in the same city, state, fits user_i past experience, fits user_j interests…)

Computational Notebook

The notebook targets a machine-learning literate audience. Here you can go deeper into the technical details and method considerations. Provide thorough documentation of the whole process, the used methods. Describe the intuition behind the selected and used methods, justify choices made, and interpret results.

Please provide the notebook as:

  • a PDF(main upload, unzipped)
  • a ipython notebook (additional zipped)

Finally

  • Submission deadline is 26.10, 23:59 - on peergrade (class code: N8A46K)
  • Peer Feedback deadline is 1.11, 23:59 - Provide constructive comments as you would like to recieve them
  • In case of trouble/issues/questions, please write on Teams.

Solutions