The social cartogram project
Have you seen a cartogram ? They help visualization of geographic data like no other tool. The actual sizes of geographical units are scaled to some other value such as social or economic data. The magic is that even though the values are scaled, they roughly retain their structure atleast with respect to their neighbors.
See an example below (click here for more excellent examples).
This is a population cartogram. See how big India and China have become !
Very neat indeed – surely beats looking at excel spreadsheets !
This blog has been frustrated with the lack of data in Indian social policy. This lack of data prevents anyone from having a serious discussion on social issues. I have set aside an hour each day for the past week trying to collect and tabulate publicly available social data in India. This resulted in a massive spreadsheet available here.
The input data consists of :
- NSSO Data : 86th issue of Sarvekshana Journal of National Sample Survey Organization (April 2001 – Sep 2001) published by the Ministry of Statistics and Programme Implementation, Government of India, New Delhi. Specifically page 109-110. The website is at http://mospi.nic.in/mospi_nsso_rept_pubn.htm
- 2001 Census Data : Published by the Census Commissioner, Government of India, New Delhi. The website is at http://www.censusindia.net/
- Wikipedia / other Internet sources : The only piece of data used from Wikipedia was the land area of Indian states. The website it at http://en.wikipedia.org/wiki/List_of_states_of_India_by_area
Procedure : Making the data sheet
The first step was to tabulate the data from these diverse sources into a single spreadsheet.
- The 2001 census contains statewise (and UT wise) overall population data.
- The 2001 census has tabulated SC and ST data in all states and UTs.
- The NSSO data contains data for only the major indian states of (Punjab, Haryana, Raj, UP, Bihar, Assam, West Bengal, Orissa, MP, Guj, Mah, AP, Karnataka, Kerala, and TN). So out of 28 states and 7 UTs – NSSO has data for 15 major states and no UTs.
- The NSSO data does not account for new states Uttaranchal, Jharkhand, Chattisgarh.
- The NSSO data contains distribution of OBCs by rural and urban. So we need to combine them into a single count. For example in state X, if 10 % of rural population is OBC – and 20% of urban population is OBC. Then we need the state wide rural / urban count to arrive at a final number. This data can be found in the 2001 census.
- Using the above data, we can arrive at statewise ST/SC/OBC/FC percentages for the 15 states.
- For the remaining states and UTs, we do not have data. We cant do much about them.
- Some simple spreadsheet calculations gives us this data sheet.
- For states that did not have the NSSO data on OBC and FC (Others), we assumed the national average so that holes would not appear on the cartogram. This approach is debatable because the northeast states do not have many castes in the OBC group, J&K may also be wrong on the lower side. Using the national average is better than guessing wildly for the missing states and UTs.
Now what! We went through so much trouble to produce the most unreadable output. This is too boring to even think about understanding it.
So, enter the cartogram.
Procedure : Making the cartogram
I had major trouble with this before figuring it out with some help.
- Step 1 : We need a GIS map of india with states marked. This data is usually in the form of a SHP file. I just googled for it and found one here http://www.vdstech.com/map_data.htm
- The problem with that file was the data was old. The states of Jharkhand, Chattisgarh, Uttaranchal were missing. I tried seaching for more recent files, but came up empty handed.
- Since the SHP file did not have the three states mentioned above, we have to combine data with the original states. So Uttaranchal data was added to UP, Jharkhand to Bihar, Chattisgarh to MP. This may be problematic to a minor extent because Orissa also contributed some areas to Jharkhand.
- Next step was to find a tool capable to generating the cartogram. After trying a bunch of tools – I decided to use mapresso http://www.mapresso.com/
- First step is to input the data into the SHP file. This was the most frustrating and time consuming task. I finally ended up using a tool called Geoda https://www.geoda.uiuc.edu/
- Using Geoda table editor, I input all the data from the spreadsheet. This was painful because I copy pasted every single value. I am sure there is a better way to input data. This took a few days because I only blog an hour or so at a time.
- Finally, I had a shp file with data. Next step was to convert it into a so called PSC file that Mapresso understands.
- After much trouble with the tool shp2psc, I was able to create the input files for mapresso.
- Next steps were easy. I generated the cartograms with 300-500 iterations of whatever algorithm mapresso uses. The more iterations, the finer the map looks.
- After each map was generated, I print screen them into paint.
- This was hard work. I decided to add a graphic to point to this blog (more hits please!).
Feel free to use the cartograms or data which ever way you choose. Just mention the original source of these cartograms. You know where you found them.
Any problems with the data or methodology , please leave a comment here.
Enjoy ! See a separate post with all the cartograms.