That is not to say you will be done and dusted with the project in Six Minutes Flat (unless you are in any way related tothis fellow)
The objective of theSix Minute Dreamis to get your vision, your idea for the prospective project out in class, to get feedback from your peers and professors.
While you are busy contemplating (gazing pensively into the far distances etc as mentionedbefore) what project to take up, keep in mind, at all time these two points:
Point a)
It should beusefuli.e something non-trivial, valuable andeven better, counter-intuitiveshould emerge from your findings. For instance, hours and days of crunching terabytes of financial data on a monster of a data mining package should yield something more substantial than the glorious result "Companies go bankrupt because of lack of money")
Point b)
It should befeasiblei.e within the bounds of possibility to complete in one term.
Good quality data must be available (For instance, while "Finding Patterns in Classified Top Secret MOSSAD internal communications about Covert Operations" might seem an interesting project, it won't do well at point b. Unless, of course, you arethis guy.)
Every team is expected to come up with a One Page Writeup about their dream project, upload it to this blogand present/discuss the same in classon 1st October. This presentation will be a 6 min communication of your idea to the class and profs. Think of it as a Pre-Proposal for the Project.
No powerpoint slides for class presentation, just coherent ideas.
Guidelines for the One Page Writeup:
Mention:
1) Project Name, Team Number (u can go crazy and call your team a name!) + Team Members
2) The Problem Statement
3) Data Source (preferrably the type of data available)
4) The Benefit / Utility : Who will potentially benefit from the insights mined? How?
5) Expected Outcomes: Gut feeling about what you expect might be the findings.
It is not expected that you know for sure which tool or technique you will use
(of course, if you have some idea,feel free to mention it! - What is important is that you have a fair idea what theproblem is that you plan to solve. Know thy pain-point! )
Submission Procedure:
Post this One-Page Write Up on the Blog:
Log In and Create a New Post with the Title
"TeamNo_ProjectName_6MinDream".
Copy-Paste the contents of the OnePage Doc to the Post and Press Publish Post.
(In case of any difficulties with blog access, you can e-mail me the one page doc. Please name the file "TeamNo_ProjectName_6MinDream.doc" )
Timelines:
29 September 09:59:59- One Page Writeup up on the Blog
1 October -Presentation and Discussion of your idea in Class
This is the list of team co-ordinators for the BITT-1 projects.
Please get in touch with them, form your team (Max 4 members including co-ordinator, per team) and add in your names to this list (edit this post, after logging in to the blog. )
(~ If you haven't yet, email me at myshkinonline (at) gmail (dot) com with your name and reg no to get an invite and editing rights).
After your team has been formed, think of the project you would like to take on.
Feel free to add it in to "Project Name" as you go along.
Your team will then come and present this (the "dream project"), informally, in class (Prof will confirm a date in class - most likely Monday).
~Happy Digging!
Note:I've learnt there have been some problems editing this post. The problem is I need to grant authorship rights (to create posts) as well as admin rights (to edit other people's posts).
In case you have been sent an invite and still cant edit this post, then accept the invite, log in and hang on - I'll be granting admin rights shortly.
There are some truly wonderful resources available on the www. "Why do people share knowledge? Why do experts in the field spend their time and effort making great tutorials and primers, to be given away for free?" is a real question. However, while wiser people are busy solving the mysteries of human motivation, you can get smart on data mining at places like these:
Here's a comprehensive list of DM blogs ("Comprehensive listings" and aggregators are in the race to be the "most meta" among all metas, aggregating, and aggregating agregators and so on and so forth...BUT, never mind all that :)...):
Data Mining is an Adventure. There is a tremendous pleasure in having discovered something that is not apparent, or even better, in having corrected and established as bunk something that seemed initially apparent and so-called "Common-Sense". For some inspiration, I would recommend you read (and most of you probably have already!) S.Levitt's "Freakonomics" http://en.wikipedia.org/wiki/Steven_Levitt http://pricetheory.uchicago.edu/levitt/home.html
Students are free to choose any other project, or make modifications to the topics/use alternative data sources from those given below
In fact, students are recommended to do their own digging before finalizing on a project. Make sure your team is convinced that something counter-intuitive (!!), non-trivial and useful can be unearthed.
1 ) A Joke of a Project
What can you find by Collaborative Filtering of User Ratings for Jokes? Available: 4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003.
Study of the Social Networks behind OSS Development
Clustering of OSS projects - What are the Salient types of differences, especially with respect to Proprietary Software
Development? Have a look at related work here:http://www.nd.edu/~oss/Papers/papers.html
Dataset available on demand:http://www.nd.edu/~oss/Data/data.html
3 ) Does Governance Matter?
Governance consists of the traditions and institutions by which authority in a country is exercised. This includes the process by which governments are selected, monitored and replaced; the capacity of the government to effectively formulate and implement sound policies; and the respect of citizens and the state for the institutions that govern economic and social interactions among them. Can you uncover insights on the aspects of governance essential for economic growth, for human development, etc?
Construction of financial structure indicators to measure whether a country's banks are larger, more active, and more efficient than its stock markets. These indicators can then be used to investigate the empirical link between the legal, regulatory, and policy environment and indicators of financial structure. They can also be used to analyze the implications of financial structure for economic growth.
Analysis of the impact of bank regulation on various dimensions of bank performance. Study of factors that determines the decisions countries make on the orientation of the regulatory environment, and draw policy conclusions.
Does past performance predict future preformance? Using statistical data about the past, is it possible to build a model of predictive value?
Dataset: Play Football Manager 2008, use the SAV file created mid-season and predict the rest of the season!
21) Facebook
Interaction Patterns in Online Social Networks
Dataset:Dummy Data can be got at : http://developers.facebook.com/fbopen/
22) Retail
Patterns in Buyer Behaviour.
Dataset: Get Data from Pantaloon, Big Bazaar, Monginis, etc. They would probably be willing, if sensitive information is blacked out.
23) Stock Markets
Are Closing Values telling us something valuable? With Cluster Analysis find out stocks that move together. Warning: Successful Completion of this Project could cause you to become a gazillionaire and risk dropping out of the course.
Dataset: Dig around for SENSEX/NIFTY backtesting data.
24) Vandal Detection
Wikipedia accepts edits even from anonymous editors. Can you device a model to identify the Vandal Edits automatically? (Have a look at http://www.research.ibm.com/visual/projects/history_flow/ for ideas)
What are the major health risks facing adolescents? What aspects of their psychographic, demographic, socio-cultural-economic factors can they be traced back to? Are there any trends visible?
Dataset: Survey in progress at a Hospital in Mumbai, Data available on demand.
Feel free to dig around for more. Interesting data is lying around in the unlikeliest places :)
You can start off by having a look here: http://kdd.ics.uci.edu/
This is the official Project Blog for the Business Intelligence Tools and Techniques Course at IIM Calcutta. If you have taken this course and haven't yet become a contributor, shoot a mail to myshkinonline (at) gmail (dot) com for an invite.