Hoping someone could help me with this...
I'm fairly new to tennis trading (been trading only 2 months) and have found there are a few good sites with regards to ratings/statistics. However with the season now finished I think its a good time to look into setting up my own database as this would allow me to make my own queries.
I've found point by point data online for all ATP matches over the last few years which comes in the following format:
SSRSRRSRRR;RSSSRS;RRRR;SRRSSS;SSRRSRSS;SRRRR;SRSRSS;SSRSS;RRRSSSSRRSRSSRRSRR.RSSRRSSRSRSS;
The 'S' represents a point won by the server, 'R' represents a point won by the receiver, ';' represents the end of a game and '.' represents the end of the set. The server in the first game will always be Player1 which is represented in the point by point spreadsheet I have and this obviously changes to Player2 in the second game and so on.
Does anybody have any suggestions as to where I would start with this data? In the long run I would want to raise queries based on sets/games/points. For example, given a situation where a player is a set and a break up, how often would they fall at least 2 points behind on serve. Would the data need to be totally reformatted to carry out this type of query.
Obviously I don't expect a step by step guide as I'm guessing this is a long winded process but anyone's input would be greatly appreciated.
Creating Tennis Database - Start Up
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
I think a lot of where to start will depend on what you already know computerwise, what you can do and what you want from the data.
The data's in a regular format so certain things have been made easy for you to deduce, like the winner of the match etc is always the last letter. Seems a lot of what you want would simply be pattern recognition so it may be worth simply stripping your data down to basics such as the winners of set, match etc rather than each point.
The data's in a regular format so certain things have been made easy for you to deduce, like the winner of the match etc is always the last letter. Seems a lot of what you want would simply be pattern recognition so it may be worth simply stripping your data down to basics such as the winners of set, match etc rather than each point.
-
- Posts: 3
- Joined: Fri Oct 27, 2017 1:32 pm
See attached raw data file.
With regards to what I know computer wise, I use excel mainly. I've recently been looking to further my knowledge on databases, and been having a play around with access.
So looking at the spreadsheet, it already gives the final set scores and match winner. From the point data, I can also determine the winner of each game like you said simply by the last letter once splitting data out. Doing this for all matches and grouping the data I can calculate statistics on the data as a whole for match/set/game for each player. An example of this would be calculating average percentage holds/breaks.
The part which I'm struggling with is the patterns of the games. For example if I wanted to find out the average break point percentage won for a given player how would i go about this? A break point can happen at 0-40, 15-40, 30-40, 40-A which is four different scorelines. Then chuck in the amount of possible combination of points to get to them scores, it makes it impossible to define a sequence. So I'm guessing there is another way around this?
With regards to what I know computer wise, I use excel mainly. I've recently been looking to further my knowledge on databases, and been having a play around with access.
So looking at the spreadsheet, it already gives the final set scores and match winner. From the point data, I can also determine the winner of each game like you said simply by the last letter once splitting data out. Doing this for all matches and grouping the data I can calculate statistics on the data as a whole for match/set/game for each player. An example of this would be calculating average percentage holds/breaks.
The part which I'm struggling with is the patterns of the games. For example if I wanted to find out the average break point percentage won for a given player how would i go about this? A break point can happen at 0-40, 15-40, 30-40, 40-A which is four different scorelines. Then chuck in the amount of possible combination of points to get to them scores, it makes it impossible to define a sequence. So I'm guessing there is another way around this?
You do not have the required permissions to view the files attached to this post.
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
The break point could only occur when at least 3 R's were in a sequence and the last letter would determine if the serve was broken. So the data after 3 R's would contain your break point data. Working backwards from the eventual games winner should give you all the A-40 40-40 etc scores
I was going to say something similar. If you look at the game matrix on Tennis Trader you have all possible scores on there and they can be reached via those paths. I would tend to look at the market from a problem-solving perspective. So it really depends on what you are trying to solve.