Gathering data - Betfair only provides so much

HRacing · Wed May 24, 2017 10:14 pm

welshboy06 wrote: ↑
Wed May 24, 2017 7:23 pm

LinusP wrote: ↑
Wed May 24, 2017 7:07 pm

HRacing wrote: ↑
Wed May 24, 2017 5:09 pm
I did a quick search (no idea what the certifcates are ) and only found something very detailed. Is there a quick explanation or is it a long process? If it is leave it Liam as iv asked for more than enough advice from you this week!!
Yeh creating certificates is probably the hardest part of using the API, especially on windows!

https://github.com/liampauling/betfairl ... wiki/Setup

You will need an appKey as well, the delayed one is free but next to useless, its £299 for full access now
Yeah I'm glad I had the sense to apply before they started charging, even though I didn't have much use for it then!

Cheers for the help anyway lads, may have to revert to oldschool excel vba but that can be effective in its own way

welshboy06 · Wed May 24, 2017 10:38 pm

HRacing wrote: ↑
Wed May 24, 2017 10:14 pm

welshboy06 wrote: ↑
Wed May 24, 2017 7:23 pm

LinusP wrote: ↑
Wed May 24, 2017 7:07 pm

Yeh creating certificates is probably the hardest part of using the API, especially on windows!

https://github.com/liampauling/betfairl ... wiki/Setup

You will need an appKey as well, the delayed one is free but next to useless, its £299 for full access now
Yeah I'm glad I had the sense to apply before they started charging, even though I didn't have much use for it then!
Cheers for the help anyway lads, may have to revert to oldschool excel vba but that can be effective in its own way

Well to he honest, I'm not 100% sure about betfairs rules but I don't run my python data collecting app if I'm trading. Just in case. So I'm going to start collecting data in excel as well as try out some automation.
So excel isn't all bad

marksmeets302 · Thu May 25, 2017 9:06 am

I use a MySQL box on AWS for order and some market data but like you faced scalability issues when storing MarketBooks so I now zip up the raw json and store in s3 (AWS) for back testing / processing later.

Liam, have a look at mongodb. It's perfect for storing json objects: you read them in from betfair and without parsing just push them to the db. It's amazingly fast and seems to compress the data on its own. Just checked, my database holds 23 million objects and is now 330GB in size. Not much in this day and age. When backtesting you read the json objects again, but this time from the database and continue as you normally do.

welshboy06 · Thu May 25, 2017 1:28 pm

marksmeets302 wrote: ↑
Thu May 25, 2017 9:06 am

I use a MySQL box on AWS for order and some market data but like you faced scalability issues when storing MarketBooks so I now zip up the raw json and store in s3 (AWS) for back testing / processing later.
Liam, have a look at mongodb. It's perfect for storing json objects: you read them in from betfair and without parsing just push them to the db. It's amazingly fast and seems to compress the data on its own. Just checked, my database holds 23 million objects and is now 330GB in size. Not much in this day and age. When backtesting you read the json objects again, but this time from the database and continue as you normally do.

I've heard of mongodb and briefly looked at it. I'll take another look on the weekend though.
Is it an ORM or will I need to write complex sql queries to join tables and get the data into a pandas dataframe?
Or do you use something other than pandas?

Cheers

LinusP · Fri May 26, 2017 6:40 am

I've used mongo before, it seems to have a bad reputation when it comes to scalability though, do you store the full book or just the streaming update?

@Welshboy, its a NoSQL db so joins are considered harmful, not sure how tricky it would be to extract certain columns out into pandas from it.

marksmeets302 · Fri May 26, 2017 9:32 am

Full book, I haven't moved to streaming yet.

welshboy06 · Fri May 26, 2017 1:30 pm

LinusP wrote: ↑
Fri May 26, 2017 6:40 am
I've used mongo before, it seems to have a bad reputation when it comes to scalability though, do you store the full book or just the streaming update?

@Welshboy, its a NoSQL db so joins are considered harmful, not sure how tricky it would be to extract certain columns out into pandas from it.

Okay so I'm guessing you don't save your data in any sort of relational way? Just as json files which contain all the data you'd need to do your reporting?

I started planning a few tables and relationships out. Just to see if I could limit data redundancy and hopefully file size.

My current setup is to keep all the current race data in ram and write the json to a file at the end of a race. Im thinking I should have a file per race, then load the ones I want to analyse into pandas and take a look.

LinusP · Fri May 26, 2017 2:45 pm

I had speed issues when dumping json to a file so switched to dumping and writing per line and then zipping at the end of the race. I am going to wait for betfair to release their historical data and then switch to collecting just streaming data matching there format so I can use both with the same franework.

proffs1 · Mon Sep 18, 2017 5:41 pm

LinusP wrote: ↑
Mon May 08, 2017 6:26 am
If you have experience in python use my library, betfairlightweight:

https://github.com/liampauling/betfairlightweight

In order to use the API you need an app key, I recommend requesting one (delayed is free) you can then automate you account operations. But I think you will find the race card endpoint handy as it provides an interface to scraping the timeform data betfair displays on the website and does not require an app key.
Code: Select all
pip install betfairlightweight
Code: Select all
from betfairlightweight import APIClient

trading = APIClient(username='test', password='test', app_key='test')

trading.race_card.login()
race_card = trading.race_card.get_race_card(market_ids=['1.1234456'])

Hi Linus,

Amazing library. I'm not very familiar with Python and have been struggling a little bit but now I've managed to get the historical data into a mysql database with your python code.

Is it possible to also get data from "Market_definition" in a historical stream?

I currently use the following code:

Code: Select all

    def on_process(self, market_books):
        with open('output.txt', 'a') as output:
            for market_book in market_books:
                for runner in market_book.runners:
                    output.write('%s,%s,%s,%s,%s,%s,%s,%s\n' % (
                        market_book.publish_time, market_book.number_of_active_runners, market_book.market_id, market_book.status, market_book.inplay,
                        runner.selection_id, runner.total_matched, runner.last_price_traded or '',
                    ))

But would also like to get the sort_priority from MarketDefinitionRunner. I tried several things, but i suppose it would look something like this:

Code: Select all

    def on_process(self, market_books, market_definition):
        with open('output.txt', 'a') as output:
            for market_book in market_books:
                for runner in market_book.runners:
                   for runner in market_definition.runners:
                        output.write('%s,%s,%s,%s,%s,%s,%s,%s\n' % (
                            market_book.publish_time, market_definition_runner.sort_priority, market_book.number_of_active_runners, market_book.market_id, market_book.status, market_book.inplay,
                            runner.selection_id, runner.total_matched, runner.last_price_traded or '',
                        ))

Is that possible? If so, could you please give me some guidance?

Thank you very much in advance,

Proffs1

LinusP · Tue Sep 19, 2017 6:18 am

Glad you got it working, the runners are stored in a list so you have to either create a lookup or loop through. Using the original code I would do the following:

Code: Select all

runner_dict = {runner.selection_id: runner for runner in market_book.market_definition.runners}

If you put that at the top under the for market book loop you can then do the following:

Code: Select all

for runner in market_book.runners:
    runner_def = runner_dict.get(runner.selection_id)
    sort = runner_def.sort_priority

proffs1 · Tue Sep 19, 2017 10:30 pm

LinusP wrote: ↑
Tue Sep 19, 2017 6:18 am
Glad you got it working, the runners are stored in a list so you have to either create a lookup or loop through. Using the original code I would do the following:
Code: Select all
runner_dict = {runner.selection_id: runner for runner in market_book.market_definition.runners}
If you put that at the top under the for market book loop you can then do the following:
Code: Select all
for runner in market_book.runners:
    runner_def = runner_dict.get(runner.selection_id)
    sort = runner_def.sort_priority 

Wow, thank you very much. It works!

1clutch1 · Thu Dec 21, 2017 1:04 am

Hi There,

Is there any chance you can post the whole code?

I tried to do the same thing but got an error.

I'm not sure what sort variable does as my interpreter is saying its unused?
Thanks,

Code: Select all


    def on_process(self, market_books):
        with open('output.txt', 'a') as output:
            for market_book in market_books:
                runner_dict = {runner.selection_id: runner for runner in market_book.market_definition.runners}
                for runner in market_book.runners:
                    runner_def = runner_dict.get(runner.selection_id)
                    sort = runner_def.sort_priority

                   for runner in market_book.runners:
                    output.write('%s,%s,%s,%s,%s,%s,%s,%s,%s\n' % (
                    market_book.publish_time, market_definition_runner.sort_priority, market_book.number_of_active_runners, market_book.market_id, market_book.status, market_book.inplay,
                    runner.selection_id, runner.total_matched, runner.last_price_traded or '',
                    ))

LinusP · Thu Dec 21, 2017 7:31 am

What error are you getting? Looks like your indentation is off, python can be fussy.

1clutch1 · Sun Dec 31, 2017 9:05 pm

I checked the indentation and its fine. It's prob just the way I posted it in. I think I need to edit the base resource.py file to add the call into a dictionary

runner_def = market_def.runners_dict.get((runner.selection_id , runner.handicap, runner.event_name))
AttributeError: 'RunnerBook' object has no attribute 'event_name'

that's the error I'm getting with the following code

Code: Select all

from betfairlightweight import APIClient
from betfairlightweight.streaming import StreamListener, MarketStream
import os

class HistoricalStream(MarketStream):
    def __init__(self, listener):
        super(HistoricalStream, self).__init__(listener)
        print('Time,MarketId,Status,Inplay,sortPriority,runnerName,LastPriceTraded\n')



    def on_process(self, market_books):
        for market_book in market_books:
            for runner in market_book.runners:
                market_def = market_book.market_definition

                runner_def = market_def.runners_dict.get((runner.selection_id , runner.handicap, runner.event_name))
                print(runner_def.name , runner_def.handicap, runner.selection_id, runner.last_price_traded,
                      runner.event_name)


class HistoricalListener(StreamListener):
    def _add_stream(self, unique_id, stream_type):
        if stream_type == 'marketSubscription':
            return HistoricalStream(self)


apiclient = APIClient('aa', 'bb', 'cc')

stream = apiclient.streaming.create_historical_stream(
    directory='/Users/mac/PycharmProjects/xbot/sample2',
    listener=HistoricalListener(max_latency=1e100))
stream.start(async=False)

LinusP · Mon Jan 01, 2018 10:16 am

Event name is in the market definition:

Code: Select all

market_book.market_definition.event_name

Gathering data - Betfair only provides so much

Login • Register