Football Data (CSV, JSON) - UPDATED 16/08/17

Football, Soccer - whatever you call it. It is the beautiful game.
Post Reply
welshboy06
Posts: 148
Joined: Wed Mar 01, 2017 2:06 pm

Sun Aug 13, 2017 12:33 pm

doovd wrote:
Sun Aug 13, 2017 12:28 pm
I think the nature of the data lends itself more to json as there are often one to many relationships (e.g. game has many goals). Thanks for this!
Yes, which is exactly why I chose json. Plus theres a neat library for python called jsonpickle, which lets me read and write my python objects directly to a json file. Much smaller and less overhead than a database or even a csv.

I believe the above json file has a slight error on some of the games dates (It points to a python object instead of showing the actual date) I've correct this and will be uploading the fixed version soon, as well as a csv copy of the BPL league.

Then I'll move on to scraping the other leagues.

Cheers,
Adam

spreadbetting
Posts: 889
Joined: Sun Jan 31, 2010 8:06 pm

Sun Aug 13, 2017 1:51 pm

Does it convert well to a MySql database?

Tenable
Posts: 16
Joined: Sat Jul 16, 2016 4:04 pm

Sun Aug 13, 2017 2:15 pm

welshboy06 wrote:
Sun Aug 13, 2017 9:45 am
Hi All,

I've decided to start collecting football data. Mainly because I have an interest in the sport and also because of a certain jonnyg throwing stats around in a very hard to read format.

The data is quite simple and was just scraped from HKJC.

The data is in json format, but you should be able to convert it to csv and import it to Excel. I chose json as it reduces data duplication, is structured and can be read reasonably well by the human eye. It also works really well with Python (Which I'm using to scrape and also analyse the data)
What I’ve noticed from his posts is Jonnyg is years behind the curve when it comes to data and analysis compared to most people and especially the users of this forum,, something that has manually taken him 5 years and 10hrs a day of intensive typing to do you have just done in a day or two,, and in far greater detail,, I’m guessing most was even automated while having your Sunday lunch :lol:

With the amount of readily available data for every sport that can be downloaded, scraped from a number of sources or even collected in real time,, im lost as to why anyone would still be sitting doing this manually and collecting such little in the scheme of things

User avatar
jonnyg
Posts: 691
Joined: Wed Jan 18, 2017 8:11 pm

Sun Aug 13, 2017 2:49 pm

welshboy06 wrote:
Sun Aug 13, 2017 9:45 am
Hi All,

I've decided to start collecting football data. Mainly because I have an interest in the sport and also because of a certain jonnyg throwing stats around in a very hard to read format.

The data is quite simple and was just scraped from HKJC.

The data is in json format, but you should be able to convert it to csv and import it to Excel. I chose json as it reduces data duplication, is structured and can be read reasonably well by the human eye. It also works really well with Python (Which I'm using to scrape and also analyse the data)
The data contains the following info.
League
Season
Game Date
Home Team
Away Team
And also Goals and Red cards (Player, Time and Team)


At the moment I've only gotten around to scraping the BPL seasons that have all goal data on hkjc (04-05 to 16-17) I may also add in corners, however that would be total for the game, not timings.

More Leagues and seasons will be coming soon, but I'm pretty busy with work etc atm.

The file is a .txt file stored in the below zip (Couldn't upload .txt directly) and is less than 2mb! So should be easy to read and manipulate on anyones setup.

Just to note: The data is all scraped from HKJC, so if there are any errors it would be down to the data they provided.

BPL.zip

Now for one of the main reasons I did this, @jonnyg Asked me a question in another thread...
when you say easy ?

how easy ?





Well the answer to that question is 3.5gls (Total goals / Total Games)

Since the 2011-2012 season, the amount of games where the Home team scored first and the First goal was scored On or Before the 8th minute...
Total Games: 84 Total Goals: 294 Average Goals: 3.5
Min Goals: 1
Max Goals: 9

Cheers,
Adam

the question was rather different


"can you tell me for example what is the average goal production since 2011-2012 in games where the home team in the PL opened the scoring on 8 minutes ?
"

User avatar
jonnyg
Posts: 691
Joined: Wed Jan 18, 2017 8:11 pm

Sun Aug 13, 2017 3:00 pm

PL 2017-2018 in games where the home team opened the scoring on 8 minutes or before > exactly 8 minutes will be in bold

4-3 3-3

2016-2017

3-1 2-1 4-0 5-0 1-0 4-2 4-0 2-0 3-1 4-2 1-2 3-1 3-4 1-1 6-3 1-0 1-1 1-0 2-2 1-4 1-3 2-2 4-2 1-0 4-0 2-1 3-2 3-1 4-0 6-1 2-0 3-0 1-2 4-2 2-4 3-1 2-0 2-1 1-1

28-5-6 < average goal production = 3.85

2015-2016

4-0 2-2 3-1 1-1 2-1 2-2 4-1 3-1 4-0 3-0 2-0 2-0 1-0 5-1 2-1 2-2 3-0 2-1 3-0 3-1 2-1 2-0 3-1 1-3 1-5 5-1 3-0 3-2 2-0

2014-2015

2013-2014

2012-2013

2011-2012


will double check the average goal production data at the end
Last edited by jonnyg on Sun Aug 13, 2017 3:29 pm, edited 7 times in total.

welshboy06
Posts: 148
Joined: Wed Mar 01, 2017 2:06 pm

Sun Aug 13, 2017 3:11 pm

spreadbetting wrote:
Sun Aug 13, 2017 1:51 pm
Does it convert well to a MySql database?
I've not tried it myself, I don't see why not.
All the fields are labeled, so should be easy enough to map to tables. You'd probably need to make a quick python script to spit out the relevant sql.

welshboy06
Posts: 148
Joined: Wed Mar 01, 2017 2:06 pm

Sun Aug 13, 2017 3:13 pm

jonnyg wrote:
Sun Aug 13, 2017 3:00 pm
PL 2017-2018 in games where the home team opened the scoring on 8 minutes or before > exactly 8 minutes will be in bold

4-3 3-3

2016-2017
I dont understand what you're asking?
But I've made the data available to the forum, so I'm sure you could use it to prove your own theories.

spreadbetting
Posts: 889
Joined: Sun Jan 31, 2010 8:06 pm

Sun Aug 13, 2017 3:14 pm

Thanks I'll give it a try after racing and thanks again for putting it up.

User avatar
Dallas
Posts: 5596
Joined: Sun Aug 09, 2015 10:57 pm

Sun Aug 13, 2017 3:16 pm

jonnyg wrote:
Sun Aug 13, 2017 2:49 pm
"can you tell me for example what is the average goal production since 2011-2012 in games where the home team in the PL opened the scoring on 8 minutes ?
Since 2011-2012 = 3.73
11-12 = 3.28
12-13 = 4.0
13-14 = 4.25
14-15 = 4.25
15-16 = 3.66
16-17 = 3.37

User avatar
jonnyg
Posts: 691
Joined: Wed Jan 18, 2017 8:11 pm

Sun Aug 13, 2017 3:31 pm

there were many more games then 84 in the Pl since 2011-2012 where the home team opened the scoring on or before 8 minutes :!:

Post Reply
  • Information
  • Who is online

    Users browsing this forum: Google [Bot] and 2 guests