Am I fooling myself?

boony · Sat Sep 15, 2018 2:54 am

Hi

I've been back-testing a strategy and it is showing a profit over the ~3300 races I've tested so far. However, the equity curve is "choppy" to say the least. If I filter out a particular set of courses it looks a lot better (see graph).

Untitled.png

The problem I have is that I only knew which courses to filter out AFTER running the back-test. Is this legit? I can't help but think I've fallen into some mathematical/statistical trap by doing this and am fooling myself by using after-the-event information.

Any feedback much appreciated.

Cheers

gazuty · Sat Sep 15, 2018 7:29 am

Completely legit.

As the saying goes, horses for courses and I would think bots for events. I use different bots for different courses and would also consider it legit to filter out courses.

spreadbetting · Sat Sep 15, 2018 2:44 pm

When you're 'backtesting' a system they'll always be an element of backfitting, so you'll always be able to boost the bottom line by filtering out certain variables. If you can figure out a reasonable explanation as to why removing those courses increases your profits then you've probably found a worthwhile filter to add.

None of us know what you're doing so can't second guess for you but like gazuty says it's horses for courses, certain courses will always play out better for different strategies because they have short runs ins etc

PeterLe · Sat Sep 15, 2018 2:55 pm

The only thing I would add is; if you do amend your criteria, don't dismiss it forever.
I determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.
Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Regards
Peter

ruthlessimon · Sat Sep 15, 2018 4:51 pm

PeterLe wrote: ↑
Sat Sep 15, 2018 2:55 pm
I determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.

Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.

Isn't that a bit of a contradiction Pete? (i.e. basically your original sample wasn't big enough)

Out of interest, was the sample size different (smaller) when you tested Wolves a second time? Or was this retested on the full data set (i.e. combined with the data that suggested Wolves wasn't very good)?

& did you find a reason for the change in fortune @ Wolves?

ruthlessimon · Sat Sep 15, 2018 5:14 pm

boony wrote: ↑
Sat Sep 15, 2018 2:54 am
I've been back-testing a strategy and it is showing a profit over the ~3300 races I've tested so far. However, the equity curve is "choppy" to say the least. If I filter out a particular set of courses it looks a lot better (see graph).

Is there data missing from the "everything" line?

Whenever I filter a variable (i.e.course), I always see a drop in frequency - so I'm interested in how you've got them to both match

PeterLe · Sat Sep 15, 2018 5:16 pm

ruthlessimon wrote: ↑
Sat Sep 15, 2018 4:51 pm

PeterLe wrote: ↑
Sat Sep 15, 2018 2:55 pm
I determined many years ago (based on a large sample set) that Wolverhampton (amongst a few others) was a negative course for me, so I set the bots to miss these out.

Much later I tried these courses again, only to find that Wolverhampton turned in some massive results. My only regret was that I hadn't tested them frequently, albeit for smaller stakes.
Isn't that a bit of a contradiction Pete? (i.e. basically your original sample wasn't big enough)

Out of interest, was the sample size different (smaller) when you tested Wolves a second time? Or was this retested on the full data set (i.e. combined with the data that suggested Wolves wasn't very good)?

Hi Simon, no not a contradiction, the initial sample set was high not low.
I understand what you are saying though, if I had carried on for longer it would have turned a corner.
At the time There were a lot of track side traders at wolvs ( in the adjacent hotel) who were cleaning up, I don’t recall when their advantage stopped though?
Just in my 11 year on Betfair and I probably missed wolves out for a couple of years mid way through. At the moment I still trade on wolves (all on auto) so the sample size is still ongoing
Regards
Peter

ruthlessimon · Sat Sep 15, 2018 5:23 pm

PeterLe wrote: ↑
Sat Sep 15, 2018 5:16 pm
I understand what you are saying though, if I had carried on for longer it would have turned a corner.
At the time There were a lot of track side traders at wolvs ( in the adjacent hotel) who were cleaning up

Fair enough yeah that makes sense

It's a question I've always had regarding my own data, whether it's worth weighting the newer data - to spot changes like Wolves quicker; because a full set will be slow to change. But balancing this between recency bias is a total nightmare!

ruthlessimon · Sat Sep 15, 2018 5:48 pm

Here's an old strategy of mine - unfortunately (unlike SB & Peter), I couldn't think up a reason for the outperformance, so this got put on the back burner

I filtered it by the price of the fav (only 1, 2, 3 - hence why they won't add up to the full set)

I probably could've/should've equally started a similar thread

(both in the same boat!)

northbound · Sat Sep 15, 2018 5:59 pm

boony wrote: ↑
Sat Sep 15, 2018 2:54 am
If I filter out a particular set of courses it looks a lot better (see graph).

I experienced something similar lately.

Been playing with greyhound racing data, found a couple of straight backing strategies which were profitable every single month since May. At the same time, every single month they were unprofitable at Newcastle.

So, you might be onto something there. Not sure how this is going to pan out long term though, both for your strategy and mine. Also, something which has been profitable so far has no guarantee of being profitable in the future: market participants might change over time.

boony · Sun Sep 16, 2018 2:04 pm

ruthlessimon wrote: ↑
Sat Sep 15, 2018 5:14 pm

boony wrote: ↑
Sat Sep 15, 2018 2:54 am
I've been back-testing a strategy and it is showing a profit over the ~3300 races I've tested so far. However, the equity curve is "choppy" to say the least. If I filter out a particular set of courses it looks a lot better (see graph).
Is there data missing from the "everything" line?

Whenever I filter a variable (i.e.course), I always see a drop in frequency - so I'm interested in how you've got them to both match

Simon, there's no data missing. I have a row for each race in Excel, one of the columns is the profit/loss for the race. I create a new column in Excel which is the 'everything' cumulative profit/loss. When I 'filter' I don't use the excel filter, but instead I have a another column with a formula that drives the profit/loss for races at the filtered courses to zero. Then I can simply plot both columns on the same chart.

Thanks to all for the feedback - much appreciated.

I'm continuing to run the back-test on more races - I want to run against a full year. The problem is my back-testing software is painfully slow so the information I seek is going to take some time to obtain... and I'm impatient

ShaunWhite · Sun Sep 16, 2018 5:36 pm

boony wrote: ↑
Sun Sep 16, 2018 2:04 pm
The problem is my back-testing software is painfully slow so the information I seek is going to take some time to obtain... and I'm impatient

How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.

ruthlessimon · Sun Sep 16, 2018 5:45 pm

ShaunWhite wrote: ↑
Sun Sep 16, 2018 5:36 pm
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.

6hrs!?!

lol - I moan if it takes anything longer than a couple of minutes

boony · Sun Sep 16, 2018 5:58 pm

ruthlessimon wrote: ↑
Sun Sep 16, 2018 5:45 pm

ShaunWhite wrote: ↑
Sun Sep 16, 2018 5:36 pm
How long is a long time? I run my big backtests overnight. My 'full' test takes about 6 hours and is just about finishing when I get up again.
6hrs!?!

lol - I moan if it takes anything longer than a couple of minutes

Lol

I kicked my back-test off at 01:30 on Friday. The 3300 races was a snapshot roughly 24 hours later.

Please tell me how you're doing it so quickly!!

I suspect it's the amount of data I'm processing that is the difference. I log full market depth from 30 mins out until market is suspended, including all the in-play data. My back-test then involves replaying all that data and simulating the bet placement and matching.

ruthlessimon · Sun Sep 16, 2018 6:30 pm

boony wrote: ↑
Sun Sep 16, 2018 5:58 pm
Please tell me how you're doing it so quickly!!

I suspect it's the amount of data I'm processing that is the difference. I log full market depth from 30 mins out until market is suspended, including all the in-play data.

30mins out + inplay! Blimely yah that'd be a lot of data

If I'm looking at a "specific group" - I will initially refine my full dataset (i.e. only Hcaps) - straight away that reduces the workload on Excel

But generally, I'll be working on 3mth samples, with only the top 4 runners - this usually (max) equals between 10,000 - 20,000 rows, 600 columns (5mins price, 5mins vol)

For me personally, the majority of speed issues seem to be related to inefficient formulas

Am I fooling myself?

Login • Register