- The list below is only indicative.
- Students are free to choose any other project, or make modifications to the topics/use alternative data sources from those given below
- In fact, students are recommended to do their own digging before finalizing on a project. Make sure your team is convinced that something counter-intuitive (!!), non-trivial and useful can be unearthed.
1 ) A Joke of a Project
What can you find by Collaborative Filtering of User Ratings for Jokes? Available: 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003.
Dataset: http://goldberg.berkeley.edu/jester-data/
2 )
Study of the Growth Dynamics of
Study of the Social Networks behind
Clustering of
Dataset available on demand:http://www.nd.edu/~oss/Data/data.html
3 ) Does Governance Matter?
Dataset: http://info.worldbank.org/governance/wgi/index.asp
4 ) Telecom Regulation
Can you uncover some fundamental insights relating regulatory governance of telcos to performance related parameters of the industry?
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699152~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
5 ) Patterns in Wikipedia:
Identifying Patterns in Editing and Article Creation. Is there a method to the madness of article creation and edits on Wikipedia?
Dataset: http://en.wikipedia.org/wiki/Wikipedia:Database_download
6) The Internet CD Database
Patterns in CD Releases (artists, releases, tracks etc.)
Patterns in User Generated Content Creation
Datasets: http://www.freedb.org/en/download__database.10.html
7 ) IIMC Course Selection:
Is there a Pattern in the Selection of Courses across batches? What inferences can be drawn about type and behaviour of students?
Dataset: Extract from PGP Office
8 ) Course Selection, CGPA and Career:
How does Course Selection affect CGPA ? Does CGPA have a bearing on placement, career, success?
Dataset: Extract from PGP Office, Alumni Cell
9 ) Library Data Mining
Patterns in book issuance. Build a predictive model for future issuance.
Dataset: Contact IIMC Library
10) IPmsger
Simple Frequency Analysis. Co-Occurence of certain Words.
Text Mining of Log: Is there method to the IPmsg madness? Find Patterns and Insights on Campus Chat.
Dataset: IPMsger LogFile of past years.
You can use log-analyzers . See http://www.hypernews.org/HyperNews/get/www/log-analyzers.html
What combination of factors lead to an entrepreneurial culture?
2007 World Bank Group Entrepreneurship Survey measures entrepreneurial activity in 84 developing and industrial countries over the period 2003-2005.
Dataset: http://www.ifc.org/ifcext/sme.nsf/Content/Entrepreneurship+Database
12) Financial Services
Finance for All? Find insights on Policies and Pitfalls in Expanding Access for the benefit of the many.
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:21546633~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
13 ) Financial Structure
Construction of financial structure indicators to measure whether a country's banks are larger, more active, and more efficient than its stock markets. These indicators can then be used to investigate the empirical link between the legal, regulatory, and policy environment and indicators of financial structure. They can also be used to analyze the implications of financial structure for economic growth.
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20696167~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
14) Bank Regulation and Supervision
Analysis of the impact of bank regulation on various dimensions of bank performance. Study of factors that determines the decisions countries make on the orientation of the regulatory environment, and draw policy conclusions.
Analysis of Linkages between growth and environment quality
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699819~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
Can you find patterns in Most Frequently Edited Pages on Wikipedia over time?
What is the relationship between Page Views and Page Edits on Wikipedia?
Dataset: http://en.wikipedia.org/wiki/Wikipedia:Most_frequently_edited_pages
Patterns in different Nations's Problems and relations with Size.
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699094~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
18) Data Mining of Electricity Regulation Dataset
How do the different variables in electricity regulatory governance impact performance?
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699165~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
19 ) Fiscal Policy and Economic Growth
Investigating Interelationships between the two.
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699114~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
20) Data Mining in Sports
Does past performance predict future preformance? Using statistical data about the past, is it possible to build a model of predictive value?
Dataset: Play Football Manager 2008, use the SAV file created mid-season and predict the rest of the season!
21) Facebook
Interaction Patterns in Online Social Networks
Dataset: Dummy Data can be got at : http://developers.facebook.com/fbopen/
22) Retail
Patterns in Buyer Behaviour.
Dataset: Get Data from Pantaloon, Big Bazaar, Monginis, etc. They would probably be willing, if sensitive information is blacked out.
Are Closing Values telling us something valuable? With Cluster Analysis find out stocks that move together. Warning: Successful Completion of this Project could cause you to become a gazillionaire and risk dropping out of the course.
Dataset: Dig around for SENSEX/NIFTY backtesting data.
24) Vandal Detection
Wikipedia accepts edits even from anonymous editors. Can you device a model to identify the Vandal Edits automatically? (Have a look at http://www.research.ibm.com/visual/projects/history_flow/ for ideas)
Dataset: http://en.wikipedia.org/wiki/Wikipedia:Database_download
Experiment with methods for predicting the next Web page a user will access
Dataset: Log Data from any website administrator. One possible source could be ISG.
26) News Mining
Mining Live News Data Streams for Patterns
Dataset: Any of the numerous news feeds.
27) Seeker behaviour on the Internet
Is there a pattern to the topics sought out by seekers f information on the Internet?
Dataset: Search Volumes Data from compete.com, alexa, etc. Detailed data from WikiStats about page views on Wikipedia: http://dammit.lt/wikistats/
28) The Role of Macroeconomic Factors in Growth
Is growth ireally negatively associated with inflation, large budget deficits, and
distorted foreign exchange markets? Investigations and Insights required!
Dataset: http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTRESEARCH/0,,contentMDK:20699104~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html
29) Usage of UNIX
What are the trends in the way people use UNIX?
Dataset: http://pages.cpsc.ucalgary.ca/~saul/wiki/pmwiki.php/Resources/DataSets
30) Usage of Web Browsers
How do users use web browsers? Are there identifyable patterns across clusters of users?
Dataset: http://pages.cpsc.ucalgary.ca/~saul/wiki/pmwiki.php/Resources/DataSets
31) Health Risk Analysis of Adolescents in India
---- Live Project ------
What are the major health risks facing adolescents? What aspects of their psychographic, demographic, socio-cultural-economic factors can they be traced back to? Are there any trends visible?
Dataset: Survey in progress at a Hospital in Mumbai, Data available on demand.
Feel free to dig around for more. Interesting data is lying around in the unlikeliest places :)
You can start off by having a look here: http://kdd.ics.uci.edu/
A very nice reference is to be found here:
Explore the datasets available.
Think. Ponder. Mull.
Gaze intensely into the distance, hand on chin.
Make sure you have a solid Rationale for what you plan to do, in your Project Proposal Document.
No comments:
Post a Comment