Chapter 2 Data sources

Our group collected the data together. It is from https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t. This dataset contains current job postings available on the City of New York’s official jobs site: http://www.nyc.gov/html/careers/html/search/search.shtml. Internal postings available to city employees and external postings available to the general public are included.

The reason why we choose this dataset is that it is from the City of New York’s official website, which is very reliable, representative and updated. Therefore, it can show more useful information and trends about job postings in NYC.

This dataset has 2838 rows and 30 columns. Each row is a Job Posting. Each column is one kind of information for each job posting. For these 30 variables:

Variable Data Type Description
Job ID (num) Job Opening ID.
Agency (chr) Name of the City agency where the vacancy exists.
Posting Type (chr) Internal or External job posting. Internal postings available to city employees and externalpostings available to the general public are included.
#Of Positions (num) Number of vacancies to be filled.
Business Title (chr) Business title
Civil Service Title (chr) Civil Service title
Title Classification (chr) All Civil Service title are grouped into four categories: Competitive
Title Code No (chr) Civil Service title Code
Level (chr) Civil Service title level
Job Category (chr)
Full-Time/Part-Time indicator (chr) Full time or Part time Job
Career Level (chr) Career Level Class
Salary Range From (num) Proposed salary from: range. (Can be 0)
Salary Range To (num) Proposed salary to: range. (Can be 0)
Salary Frequency (chr) Proposed salary frequency
Work Location (chr) Agency location.
Division/Work Unit (chr) Department/Division within the hiring agency.
Job Description (chr) Description of the job responsibilities for this position.
Minimum Qual Requirements (chr) Minimum qualifications required for position.
Preferred Skills (chr) Preferred skills for this position.
Additional Information (chr) Additional information provided by the hiring agency.
To Apply (chr) Instructions on how to apply for this postion.
Hours/Shift (chr) Working hours and shift information.
Work Location 1 (chr) Specific work location for this opening.
Recruitment Contact (logi) Recruitment contact information.
Residency Requirement (chr) Residency requirements for this position.
Posting Date (chr) Date that the position was posted.
Post Until (chr) Date Posting will be removed. (Blank means post until filled)
Posting Updated (chr) Last Modification Date.
Process Date (chr) Dataset created date.

The problem with this data is that there are some blank value, and when we work with variable “Career Level”, although it shows the type of this variable is chr, when one of our group member work with this variable, it shows the type of this variable num and cannot read it. Thus, we create a new column to rename and store its value.