Visualizing Income Factors using D3.js

Income Evaluation Visualization

Understanding the visualizations

Here, I used D3 to create two interactive charts that show the relationship between different variables in the income evaluation dataset. The first chart is a scatter plot that shows the relationship between age, hours per week, and capital gain, and the second chart is a bar chart that shows the relationship between education and capital gain. The size of the dots in the scatter plot represents the capital gain, and the color of the dots represents the income. Blue dots represent people whose income is greater than 50K and orange dots represent people whose income is less than or equal to 50K; the two charts are linked together through a mouseover event, when a user hovers over an element in one chart, the corresponding elements in the other chart will be highlighted.

Observations

From a cursory look at the scatter plot, there is a higher density of large capital gain (large circles) between 30 and 70 hours per week and Ages between 30 and 60 years. However, beyond 80 hours per week and below 30 hours per week, the capital gains are mostly small (small circles), same applies to ages above 70 years. In the second visualization, the preschoolers bar is all orange, an indication that they all earn income less than 50k. However, those with qualifications such as Bachelors, Doctorate, HS-grad, and College are most likely to earn above 50k and the highest capital gain. I also observed that some of the columns have missing data. For example, “native_country” has 583 missing data, “workclass” has 1836 missing values, and “occupation” has 1843 missing values, these missing values were ignored since the columns were not used in my visualizations. One of the challenges I encountered was accessing the dataset from my local directory using my web browser. I had to override the default setting of chrome browser to enable access. Alternatively, I set up a python simple server which allowed me to view the html file and the directory from a browser without having to override the browser’s default settings. Another challenge was determining what features to use in the two visualizations and to interactively link the visualizations. It took a number of tries to figure out this choice and create an interactive link between the two visualizations and correctly filter the data based on the hovered element. I used the D3 documentation and some online tutorials as my main resources to help me understand how to use the D3 library and to find the correct syntax for certain operations.

Resources used:

  • D3.js documentation: https://github.com/d3/d3
  • D3 Scatter Plot Example: https://bl.ocks.org/d3noob/6f082f0e3b820b6bf68b78f2f7786084
  • D3 Bar Chart Example: https://bl.ocks.org/d3noob/8952219
  • D3 Mouseover Event: https://www.d3-graph-gallery.com/graph/interactivity_hover.html
  • D3 Scale Functions: https://github.com/d3/d3-scale