Monday, September 8, 2008

Suggested Projects

  • The list below is only indicative.
  • Students are free to choose any other project, or make modifications to the topics/use alternative data sources from those given below
  • In fact, students are recommended to do their own digging before finalizing on a project. Make sure your team is convinced that something counter-intuitive (!!), non-trivial and useful can be unearthed.


1 ) A Joke of a Project

What can you find by Collaborative Filtering of User Ratings for Jokes? Available: 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003.

Dataset: http://goldberg.berkeley.edu/jester-data/



2 ) Open Source Software Development

Study of the Growth Dynamics of OSS

Study of the Social Networks behind OSS Development

Clustering of OSS projects - What are the Salient types of differences, especially with respect to Proprietary Software

Development? Have a look at related work here:http://www.nd.edu/~oss/Papers/papers.html

Dataset available on demand:http://www.nd.edu/~oss/Data/data.html


3 ) Does Governance Matter?

Governance consists of the traditions and institutions by which authority in a country is exercised. This includes the process by which governments are selected, monitored and replaced; the capacity of the government to effectively formulate and implement sound policies; and the respect of citizens and the state for the institutions that govern economic and social interactions among them. Can you uncover insights on the aspects of governance essential for economic growth, for human development, etc?

Dataset: http://info.worldbank.org/governance/wgi/index.asp


4 ) Telecom Regulation

Can you uncover some fundamental insights relating regulatory governance of telcos to performance related parameters of the industry?

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699152~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


5 ) Patterns in Wikipedia:

Identifying Patterns in Editing and Article Creation. Is there a method to the madness of article creation and edits on Wikipedia?

Dataset: http://en.wikipedia.org/wiki/Wikipedia:Database_download


6) The Internet CD Database

Patterns in CD Releases (artists, releases, tracks etc.)

Patterns in User Generated Content Creation

Datasets: http://www.freedb.org/en/download__database.10.html

http://musicbrainz.org/doc/Database


7 ) IIMC Course Selection:

Is there a Pattern in the Selection of Courses across batches? What inferences can be drawn about type and behaviour of students?

Dataset: Extract from PGP Office


8 ) Course Selection, CGPA and Career:

How does Course Selection affect CGPA ? Does CGPA have a bearing on placement, career, success?

Dataset: Extract from PGP Office, Alumni Cell


9 ) Library Data Mining

Patterns in book issuance. Build a predictive model for future issuance.

Dataset: Contact IIMC Library


10) IPmsger

Simple Frequency Analysis. Co-Occurence of certain Words.

Text Mining of Log: Is there method to the IPmsg madness? Find Patterns and Insights on Campus Chat.

Dataset: IPMsger LogFile of past years.

You can use log-analyzers . See http://www.hypernews.org/HyperNews/get/www/log-analyzers.html


11) Entrepreneurship

What combination of factors lead to an entrepreneurial culture?

2007 World Bank Group Entrepreneurship Survey measures entrepreneurial activity in 84 developing and industrial countries over the period 2003-2005.

Dataset: http://www.ifc.org/ifcext/sme.nsf/Content/Entrepreneurship+Database


12) Financial Services

Finance for All? Find insights on Policies and Pitfalls in Expanding Access for the benefit of the many.

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:21546633~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


13 ) Financial Structure

Construction of financial structure indicators to measure whether a country's banks are larger, more active, and more efficient than its stock markets. These indicators can then be used to investigate the empirical link between the legal, regulatory, and policy environment and indicators of financial structure. They can also be used to analyze the implications of financial structure for economic growth.

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20696167~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


14) Bank Regulation and Supervision

Analysis of the impact of bank regulation on various dimensions of bank performance. Study of factors that determines the decisions countries make on the orientation of the regulatory environment, and draw policy conclusions.

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20345037~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


15) Economic Growth and Environmental Quality

Analysis of Linkages between growth and environment quality

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699819~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


16) Commonalities in Controversial Pages

Can you find patterns in Most Frequently Edited Pages on Wikipedia over time?

What is the relationship between Page Views and Page Edits on Wikipedia?

Dataset: http://en.wikipedia.org/wiki/Wikipedia:Most_frequently_edited_pages

17) Small States, Small Problems?

Patterns in different Nations's Problems and relations with Size.

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699094~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


18) Data Mining of Electricity Regulation Dataset

How do the different variables in electricity regulatory governance impact performance?

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699165~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


19 ) Fiscal Policy and Economic Growth

Investigating Interelationships between the two.

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699114~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


20) Data Mining in Sports

Does past performance predict future preformance? Using statistical data about the past, is it possible to build a model of predictive value?

Dataset: Play Football Manager 2008, use the SAV file created mid-season and predict the rest of the season!


21) Facebook

Interaction Patterns in Online Social Networks

Dataset: Dummy Data can be got at : http://developers.facebook.com/fbopen/


22) Retail

Patterns in Buyer Behaviour.

Dataset: Get Data from Pantaloon, Big Bazaar, Monginis, etc. They would probably be willing, if sensitive information is blacked out.


23) Stock Markets

Are Closing Values telling us something valuable? With Cluster Analysis find out stocks that move together. Warning: Successful Completion of this Project could cause you to become a gazillionaire and risk dropping out of the course.

Dataset: Dig around for SENSEX/NIFTY backtesting data.


24) Vandal Detection

Wikipedia accepts edits even from anonymous editors. Can you device a model to identify the Vandal Edits automatically? (Have a look at http://www.research.ibm.com/visual/projects/history_flow/ for ideas)

Dataset: http://en.wikipedia.org/wiki/Wikipedia:Database_download


25) Website Optimization

Experiment with methods for predicting the next Web page a user will access

Dataset: Log Data from any website administrator. One possible source could be ISG.


26) News Mining

Mining Live News Data Streams for Patterns

Dataset: Any of the numerous news feeds.


27) Seeker behaviour on the Internet

Is there a pattern to the topics sought out by seekers f information on the Internet?

Dataset: Search Volumes Data from compete.com, alexa, etc. Detailed data from WikiStats about page views on Wikipedia: http://dammit.lt/wikistats/


28) The Role of Macroeconomic Factors in Growth

Is growth ireally negatively associated with inflation, large budget deficits, and

distorted foreign exchange markets? Investigations and Insights required!

Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699104~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html


29) Usage of UNIX

What are the trends in the way people use UNIX?

Dataset: http://pages.cpsc.ucalgary.ca/~saul/wiki/pmwiki.php/Resources/DataSets


30) Usage of Web Browsers

How do users use web browsers? Are there identifyable patterns across clusters of users?

Dataset: http://pages.cpsc.ucalgary.ca/~saul/wiki/pmwiki.php/Resources/DataSets


31) Health Risk Analysis of Adolescents in India  

 ---- Live Project  ------

What are the major health risks facing adolescents? What aspects of their psychographic, demographic, socio-cultural-economic factors can they be traced back to? Are there any trends visible? 

Dataset: Survey in progress at a Hospital in Mumbai, Data available on demand.


Feel free to dig around for more. Interesting data is lying around in the unlikeliest places :)



You can start off by having a look here: http://kdd.ics.uci.edu/

A very nice reference is to be found here:

http://delicious.com/pskomoroch/dataset

http://www.datawrangling.com/some-datasets-available-on-the-web


Explore the datasets available.

Think. Ponder. Mull. 

Gaze intensely into the distance, hand on chin. 

Make sure you have a solid Rationale for what you plan to do, in your Project Proposal Document. 








No comments: