I'm looking into the database design for a data warehouse kind of project which involves a large number of inserts daily. The data archives will be further used to generate reports. I will have a list of users (for example a user set of 2 million), for which I need to monitor daily social networking activities associated with them.
For example, let there be a set of 100 users say U1, U2, ..., U100.
I need to insert their daily status count into my database.
Consider the total status count obtained for a user U1 for period June 30 - July 6, is as follows
June 30 - 99 July 1 - 100 July 2 - 102 July 3 - 102 July 4 - 105 July 5 - 105 July 6 - 107
The database should keep daily status count of each user, like for user U1:
July 1- 1 (100-99) July 2- 2 (102-100) July 3- 0 (102-102) July 4- 3 (105-102) July 5- 0 (105-105) July 6- 2 (107-105)
Similarly the database should hold daily details of the full set of users.
And on a later phase I envision taking aggregate reports out of these data, like total points scored on each day, week, month, etc; and to compare it with older data.
I need to start things from scratch. I am experienced with PHP as a server side script and MySQL. I am confused on the database side. Since I need to process about a million insertions daily, what are all the things that should be taken care of?
I am confused on how to design a MySQL database in this regard. Which storage engine to use and which design patterns should be followed, keeping in mind the data could later used effectively with aggregate functions?
Currently I envision the DB design with one table storing all the user id's with a foreign key and separate status count table for each day.
Does MySQL fit my requirement? 2 million or more DB operations are done every day. How are the server and other things to be considered in this case?
Insertion queries should be capable of inserting 1-2 million inserts every day. (We don't have update here.)
1.Sum of statuses for the whole set of users.
2.Sum of statuses for set of users under a geographic location.
3.Comparing status counts with days/weeks/months.
-->I believe some kind of indexes are needed in this case, but I read indexes could slow down insertion.
-->Also I have heard MyISAM would be a better choice than InnoDB considering speed aspects.