How much data before eliminating markets?

eightbo · Mon Feb 11, 2019 6:46 pm

Created a bot for the

's

I'm letting it trade uninterrupted for a while to see how it gets on.

What's a sensible number of markets to enable meaningful evaluation of things like:
- Tuesday vs. Sunday
- Morning vs. Evening
- Newcastle vs. Doncaster
- Open Races vs. Handicaps

Cheers

LinusP · Mon Feb 11, 2019 7:24 pm

I think it depends on the strategy and to be honest I don’t think any of those variables you have listed are valid for filtering markets. I know a lot of people do filter on venue / race types but unless you have a reason as to why a strategy shouldn’t / should work you are just over fitting.

The issue is that it is so easy to look at the pnl and start filtering, before you know it you have a profitable strategy, the issue is that 99% of the time you are just over fitting.

An example I had recently was that I was finding that different courses required slightly different parameters for optimum profit, after a lot of trial and error I worked out that it was actually to do with another variable (course specific) Factoring this in I was able to continue betting on every course with the same parameters, with this all being a numbers game the more you can turnover the better!

ruthlessimon · Mon Feb 11, 2019 7:30 pm

The generic response will be (3mths+) in sample, then 1mth (20%/30%) out sample. If that passes (which is a tough test to pass!); then it goes practice mode (debatable step), then live small.

As a side, I'm uncomfortable with discrete variables being used for optimisation - because I personally like seeing how "minor changes" affect profitability. We can't do that if we use days/race type/course (although arguably we can, if they can be logically grouped)

ShaunWhite · Mon Feb 11, 2019 8:03 pm

This is one of those questions where if anyone here had a decent stats qualification, they could just give you a formula. I've seen them but I don't understand them

It depends on the strike rate really. To explain what I mean, take an extreme case.....if your bot backs 100/1 shots at 120/1 then sampling 300 markets won't tell you anything. If it's a coin flip, then 300 is a reasonable start. I'd guess a sample which includes 300 of the lesser likely outcome would be a fair number.

BUT...I have to concur with Linus and Simon about the dangers of overfitting. On the positive side you're gathering data using real money (I guess) so it's as reliable as you can get, assuming you're not using microstakes and encountering rounding issues with commission etc.

eightbo · Mon Feb 11, 2019 8:43 pm

LinusP wrote: ↑
Mon Feb 11, 2019 7:24 pm
I think it depends on the strategy and to be honest I don’t think any of those variables you have listed are valid for filtering markets. I know a lot of people do filter on venue / race types but unless you have a reason as to why a strategy shouldn’t / should work you are just over fitting.

The issue is that it is so easy to look at the pnl and start filtering, before you know it you have a profitable strategy, the issue is that 99% of the time you are just over fitting.

An example I had recently was that I was finding that different courses required slightly different parameters for optimum profit, after a lot of trial and error I worked out that it was actually to do with another variable (course specific) Factoring this in I was able to continue betting on every course with the same parameters, with this all being a numbers game the more you can turnover the better!

Your advice is appreciated, Linus. I can't currently think of any reason to filter out markets however does comparing courses not highlight room for optimisation as you mention? Also, if I feel that tweaking a parameter may be beneficial, am I expected to run multiple 'beta' instances of BA in practice mode alongside my live account and compare or is there some better way of going about things?

eightbo · Mon Feb 11, 2019 8:48 pm

ruthlessimon wrote: ↑
Mon Feb 11, 2019 7:30 pm
The generic response will be (3mths+) in sample, then 1mth (20%/30%) out sample. If that passes (which is a tough test to pass!); then it goes practice mode (debatable step), then live small.

As a side, I'm uncomfortable with discrete variables being used for optimisation - because I personally like seeing how "minor changes" affect profitability. We can't do that if we use days/race type/course (although arguably we can, if they can be logically grouped)

Thank you. The automation is currently firing in around 30 markets / day if that helps.

eightbo · Mon Feb 11, 2019 8:50 pm

For some backstory I was sat on the loo today considering how the more you flip a coin, the closer your results will become to the true expectancy (0.5). I figure if your automation performs poorly in one market, this will become apparent as the number of markets traded rises. Or is that not relevant here...?

foxwood · Mon Feb 11, 2019 9:27 pm

eightbo wrote: ↑
Mon Feb 11, 2019 8:50 pm
the more you flip a coin, the closer your results will become to the true expectancy (0.5)

Mmmm - assuming a fair coin returning a purely random result, the EV of an infinite number of flips is generally accepted as being 50%

However, that does not mean (say) 300 heads in a row won't happen - presuming reversion to EV and that a tail must occur is what Gambler's Fallacy is all about https://en.wikipedia.org/wiki/Gambler%27s_fallacy

Current thinking (eg Ole Peters et al) is that these rare events do seem to occur more frequently than implied by odds etc and that means some people will be stuck on the downside of the event and unable to recover their position in their lifetime (eg 1930's, 2008 etc crashes)

Same goes for trading - no matter what the stats say has happened in the past the whole strategy may collapse with the next race if the strategy is based on selective random outcomes. If the strategy has a solid reason to work that you can define and understand then you have a genuine edge and should clean up until things change and the edge disappears - eg starting position at track x wins more than odds imply - edge wiped out if the rails get moved and the angles flattened / bookies wise up and shorten their odds (if they didn't know already!) / etc

If only it were so simple as it seems in the loo

ruthlessimon · Mon Feb 11, 2019 9:48 pm

eightbo wrote: ↑
Mon Feb 11, 2019 8:43 pm
Also, if I feel that tweaking a parameter may be beneficial, am I expected to run multiple 'beta' instances of BA in practice mode alongside my live account and compare or is there some better way of going about things?

This is why you should heed the following from Xitian - because you've just spotted why Peter's incorrect (or his position needs clarifying; cos I still don't get his viewpoint

)

I can't think of anything worse, than to realise 6mths in; a new variable is needed - which can't be tested because we went live too early - & the data is dirty/biased to our original thought process - meaning it cannot be reused. Far better to have a malleable trading/backtesting sandbox, with (ideally, although impractical/difficult) every variable

xitian wrote: ↑
Tue Nov 20, 2018 1:15 pm
Writing a backtesting simulation system might take a couple months depending how much time you spend on it? Imagine how many years you could use it in future though, and how many ideas you can trial. Just make sure you keep some out of sample data, and make sure you know what assumptions you’re making when you backtest/simulate.

eightbo · Mon Feb 11, 2019 10:46 pm

foxwood wrote: ↑
Mon Feb 11, 2019 9:27 pm
Mmmm - assuming a fair coin returning a purely random result, the EV of an infinite number of flips is generally accepted as being 50%

However, that does not mean (say) 300 heads in a row won't happen - presuming reversion to EV and that a tail must occur is what Gambler's Fallacy is all about https://en.wikipedia.org/wiki/Gambler%27s_fallacy

That's right. Interestingly though if you've already flipped 100k times, such a streak would only shift the expectancy for heads by 0.15%.

It seems sensible to assume your results have incurred a high level of randomness initially, but surely you only need to go so far until you're probably close enough to start justifying changes. If things seem drastically wrong in a particular area, I imagine I'll adjust much sooner than if they only seem moderately wrong.

I think we can all agree it would be too soon to tweak things after 1 market, and ineffective to tweak things after 50k markets. Ultimately the chips will fall where they may but you must admit it's entertaining to wonder where might be optimal...

ruthlessimon · Mon Feb 11, 2019 11:09 pm

eightbo wrote: ↑
Mon Feb 11, 2019 10:46 pm
I think we can all agree it would be too soon to tweak things after 1 market

Debatable; because "exemplar markets" can yield edges we haven't even contemplated

A 50tick lay loss on a single race (trading the current strategy); only to realise in hindsight the market had already drifted 50ticks - "I wonder what would've happened, long term, if we had backed that pattern?"

spreadbetting · Mon Feb 11, 2019 11:36 pm

eightbo wrote: ↑
Mon Feb 11, 2019 6:46 pm
Created a bot for the 's

I'm letting it trade uninterrupted for a while to see how it gets on.

What's a sensible number of markets to enable meaningful evaluation of things like:
- Tuesday vs. Sunday
- Morning vs. Evening
- Newcastle vs. Doncaster
- Open Races vs. Handicaps

Cheers

Seems to me you'll be backfitting your data no matter how many markets you let it run with those type of variables.

eightbo · Mon Feb 11, 2019 11:39 pm

ruthlessimon wrote: ↑
Mon Feb 11, 2019 11:09 pm

eightbo wrote: ↑
Mon Feb 11, 2019 10:46 pm
I think we can all agree it would be too soon to tweak things after 1 market
Debatable; because "exemplar markets" can yield edges we haven't even contemplated

A 50tick lay loss on a single race (trading the current strategy); only to realise in hindsight the market had already drifted 50ticks - "I wonder what would've happened, long term, if we had backed that pattern?"

That's too deep even for me

p.s. @MemphisFlash I see you. Let's get down to business I know that youuuuuuuuuuuuuuu, you've got what I neeeeeeeeeeed

eightbo · Mon Feb 11, 2019 11:51 pm

spreadbetting wrote: ↑
Mon Feb 11, 2019 11:36 pm
Seems to me you'll be backfitting your data no matter how many markets you let it run with those type of variables.

Hi there. Thank you for your input. LinusP has already mentioned this.
Kindly ignore these variables and replace them in your head with those you deem useful. At what point is sensible to begin adjusting iyo?

ruthlessimon · Tue Feb 12, 2019 1:14 am

eightbo wrote: ↑
Mon Feb 11, 2019 11:39 pm
That's too deep even for me

Too deep??

Come on 8; we're not on the Geek's forum any more - this is real trader talk

How much data before eliminating markets?

Login • Register