Hi
I've been back-testing a strategy and it is showing a profit over the ~3300 races I've tested so far. However, the equity curve is "choppy" to say the least. If I filter out a particular set of courses it looks a lot better (see graph).
The problem I have is that I only knew which courses to filter out AFTER running the back-test. Is this legit? I can't help but think I've fallen into some mathematical/statistical trap by doing this and am fooling myself by using after-the-event information.
Any feedback much appreciated.
Cheers
Am I fooling myself?
-
- Posts: 3140
- Joined: Sun Jan 31, 2010 8:06 pm
When you're 'backtesting' a system they'll always be an element of backfitting, so you'll always be able to boost the bottom line by filtering out certain variables. If you can figure out a reasonable explanation as to why removing those courses increases your profits then you've probably found a worthwhile filter to add.
None of us know what you're doing so can't second guess for you but like gazuty says it's horses for courses, certain courses will always play out better for different strategies because they have short runs ins etc
None of us know what you're doing so can't second guess for you but like gazuty says it's horses for courses, certain courses will always play out better for different strategies because they have short runs ins etc
The only thing I would add is; if you do amend your criteria, don't dismiss it forever.
I determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.
Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Regards
Peter
I determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.
Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Regards
Peter
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
Isn't that a bit of a contradiction Pete? (i.e. basically your original sample wasn't big enough)PeterLe wrote: ↑Sat Sep 15, 2018 2:55 pmI determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.
Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Out of interest, was the sample size different (smaller) when you tested Wolves a second time? Or was this retested on the full data set (i.e. combined with the data that suggested Wolves wasn't very good)?
& did you find a reason for the change in fortune @ Wolves?
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
Is there data missing from the "everything" line?
Whenever I filter a variable (i.e.course), I always see a drop in frequency - so I'm interested in how you've got them to both match
Hi Simon, no not a contradiction, the initial sample set was high not low.ruthlessimon wrote: ↑Sat Sep 15, 2018 4:51 pmIsn't that a bit of a contradiction Pete? (i.e. basically your original sample wasn't big enough)PeterLe wrote: ↑Sat Sep 15, 2018 2:55 pmI determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.
Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Out of interest, was the sample size different (smaller) when you tested Wolves a second time? Or was this retested on the full data set (i.e. combined with the data that suggested Wolves wasn't very good)?
I understand what you are saying though, if I had carried on for longer it would have turned a corner.
At the time There were a lot of track side traders at wolvs ( in the adjacent hotel) who were cleaning up, I don’t recall when their advantage stopped though?
Just in my 11 year on Betfair and I probably missed wolves out for a couple of years mid way through. At the moment I still trade on wolves (all on auto) so the sample size is still ongoing
Regards
Peter
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
Fair enough yeah that makes sense
It's a question I've always had regarding my own data, whether it's worth weighting the newer data - to spot changes like Wolves quicker; because a full set will be slow to change. But balancing this between recency bias is a total nightmare!
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
Here's an old strategy of mine - unfortunately (unlike SB & Peter), I couldn't think up a reason for the outperformance, so this got put on the back burner
I filtered it by the price of the fav (only 1, 2, 3 - hence why they won't add up to the full set)
I probably could've/should've equally started a similar thread (both in the same boat!)
I filtered it by the price of the fav (only 1, 2, 3 - hence why they won't add up to the full set)
I probably could've/should've equally started a similar thread (both in the same boat!)
- northbound
- Posts: 737
- Joined: Mon Mar 20, 2017 11:22 pm
I experienced something similar lately.
Been playing with greyhound racing data, found a couple of straight backing strategies which were profitable every single month since May. At the same time, every single month they were unprofitable at Newcastle.
So, you might be onto something there. Not sure how this is going to pan out long term though, both for your strategy and mine. Also, something which has been profitable so far has no guarantee of being profitable in the future: market participants might change over time.
Simon, there's no data missing. I have a row for each race in Excel, one of the columns is the profit/loss for the race. I create a new column in Excel which is the 'everything' cumulative profit/loss. When I 'filter' I don't use the excel filter, but instead I have a another column with a formula that drives the profit/loss for races at the filtered courses to zero. Then I can simply plot both columns on the same chart.ruthlessimon wrote: ↑Sat Sep 15, 2018 5:14 pmIs there data missing from the "everything" line?
Whenever I filter a variable (i.e.course), I always see a drop in frequency - so I'm interested in how you've got them to both match
Thanks to all for the feedback - much appreciated.
I'm continuing to run the back-test on more races - I want to run against a full year. The problem is my back-testing software is painfully slow so the information I seek is going to take some time to obtain... and I'm impatient
- ShaunWhite
- Posts: 9731
- Joined: Sat Sep 03, 2016 3:42 am
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
6hrs!?!ShaunWhite wrote: ↑Sun Sep 16, 2018 5:36 pmHow long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
lol - I moan if it takes anything longer than a couple of minutes
Lolruthlessimon wrote: ↑Sun Sep 16, 2018 5:45 pm6hrs!?!ShaunWhite wrote: ↑Sun Sep 16, 2018 5:36 pmHow long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
lol - I moan if it takes anything longer than a couple of minutes
I kicked my back-test off at 01:30 on Friday. The 3300 races was a snapshot roughly 24 hours later.
Please tell me how you're doing it so quickly!!
I suspect it's the amount of data I'm processing that is the difference. I log full market depth from 30 mins out until market is suspended, including all the in-play data. My back-test then involves replaying all that data and simulating the bet placement and matching.
- ruthlessimon
- Posts: 2094
- Joined: Wed Mar 23, 2016 3:54 pm
30mins out + inplay! Blimely yah that'd be a lot of data
If I'm looking at a "specific group" - I will initially refine my full dataset (i.e. only Hcaps) - straight away that reduces the workload on Excel
But generally, I'll be working on 3mth samples, with only the top 4 runners - this usually (max) equals between 10,000 - 20,000 rows, 600 columns (5mins price, 5mins vol)
For me personally, the majority of speed issues seem to be related to inefficient formulas