Data Mining and Data Science: Exploration Using Data Visualization

Mark Beaumont MD

January 12, 2022

 

 

 

 

 

Mark Beaumont and Preeti Joshi

11/04/2019

Data provides the quantitative evidence that can be translated into information. This, in turn, is analyzed to draw inferences and direct strategic growth of firms.  Significant amounts of data (referred to as “big data”) are being generated due to increased connectivity across digital platforms in our industrial environment. Many agree that data is the new oil1.  

The complexity of large datasets, multiple sources, interconnectivity and constant updates represent a challenge in terms of realizing its full potential and deriving useful insights. In addition, these large datasets cannot always be “read” by individuals who may not well-versed in the art of data analysis. Consequently, data “visualization” has emerged as a new form to understand complex datasets and quickly infer insights. This is both an art and a science and forms the basis for data driven decision making. By using familiar concepts such as charts, graphs, and maps; data visualization tools provide an efficient mode to identify and understand trends, outliers, and patterns in large datasets2. By curating data into a format amenable to understanding and interpretation, data visualization further helps in highlighting trends and outliers. A good visualization tells a story, removes the noise from data and highlights the useful and transformative information. The better you can convey your points visually the better you can leverage that information3.

For this assignment we have chosen to analyze the visualization presented at http://atlas.media.mit.edu/en/visualize/tree_map/hs92/export/bgr/all/show/2014/ 

What does the visualization do? What data are you exploring? Where does the data come from and who owns it?

The Observatory of Economic Complexity (OEC) is a data visualization tool that allows users to track trade information on countries. The tool is interactive in nature; and allows a user to employ a variety of different visualization modes that present country information in isolation or as a network representing trade. The project was created in 2010, and conducted at The MIT media Lab Macro Connections Group, now Collective Learning.

The data depicted in these interactive graphs provide information about products that have been manufactured by specific countries. The data are presented in a tree-mapping format. Tree-mapping is a method for displaying hierarchical data using nested figures, usually rectangles. Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node’s rectangle has an area proportional to a specified dimension of the data. Often the leaf nodes are colored to show a separate dimension of the data4

The table details, by country and year, the list of products that are imported into, and exported from a country along with the monetary value of the products and the relative proportion of GDP. The data also list the bilateral movement of goods and services. Additional features include the option to view the information in a stacked format which visually allows one to see all of the exported and imported products on one graph over time. The OEC program also provides options to view this data for imported and exported products on geological maps, using ring graphs, line graphs and scatter plots. When the color and size dimensions are correlated in some way with the tree structure, a user can identify patterns that would be otherwise difficult. An added advantage of treemaps is their efficient use of space. As a result, they can easily display thousands of items on the screen simultaneously.

What are the limits of the visualization? What questions do you wish it had answered that it didn’t?

A main disadvantage of the tool is the difficulty associated with visualizing and obtaining an accurate view of entire dataset for each country. Inferences are made about the data structure for each product. If the focus of the analysis is to understand how the product information is organized in a hierarchical tree, then graphs may not be the most effective means to present the data. Another disadvantage is the challenge associated with accurately comparing the relative size of the rectangles and the data it depicts. Comparing areas is not as accurate and effective as comparing two visual elements or lengths. This graph doesn’t show the hierarchical levels as clearly as other charts that visualize hierarchical data.

How can you imagine extending this visualization to incorporate new or different data to add more interest to the visualization?

Additional ideas that could be incorporated into the visualization include: 

  • Adding a layer represented by “color gradients” that can show the percent distribution for each of the specific descriptors within an industry.
  • Changing the dependent and independent variable; i.e. use industry as the fixed variable and then display the countries that are associated with the trade within this industry. 
  • For every country, the ability to see the top 10 connections for what is imported and exported.
  • The addition of references to primary data sources.

 

Is this data stock or flow data? If stock, could it be converted to flow? What would need to happen?

The data presented in the visualization is “flow data” being updated once a year.

What is the most interesting insight you gained from this visualization?

The visualization depicts interconnectivity and how the global economy has increased this network over time. In particular, it shows which industries have contributed to increased trade within countries and subsequently, increased the G.D.P for their trade partners. This in turn, represents which of the industries are profitable over time, where countries can strategically invest and which industries are in the early stages of growth, representing a potential for future growth.

These trends are particularly important to note as the global economy is connected across several digital platforms disrupting the traditional trading channels. As a result, predicting growth patterns based on historic data should be interpreted with caution as the disruption that digital has caused on economic growth will require additional datasets to be generated and integrated into the existing forms of data. 

 

References

  1. The world’s most valuable resource is no longer oil, but data (Economist, May 6, 2017); https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
  2. “ Data visualization beginner’s guide: a definition, examples, and learning resources” https://www.tableau.com/learn/articles/data-visualization
  3. “ Data Visualization”; https://en.wikipedia.org/wiki/Data_visualization
  4. “Treemapping”; https://en.wikipedia.org/wiki/Treemapping
  5. “ Observatory of Economic Complexity”   http://atlas.media.mit.edu/en/visualize/tree_map/hs92/export/bgr/all/show/2014/ 

Exhibit 1

Exhibit 2: 

Exhibit 3

Exhibit 4


Exhibit 5

Exhibit 6