Chat GPT & Generative AI tools

Kai · Mon Feb 26, 2024 8:07 am

Anyone know where to order one of these?? : https://www.youtube.com/shorts/Tjss7uFrYmg

Asking for friend cheers

paspuggie48 · Mon Feb 26, 2024 9:00 am

Euler wrote: ↑
Thu Feb 15, 2024 9:17 pm
Just watched a presentation on Sora, it's amazing: -

https://openai.com/sora

I saw a tweet about Sora and have to admit I thought the video it was showing was real people walking down a street in Japan.
I couldn't believe it was AI generated it was so realistic.
Incredible !

sniffer66 · Mon Feb 26, 2024 11:46 am

Just playing around, but GPT4 with Advanced Data Analysis is a bit of a game changer
What would have taken me an age to code was done in minutes

The code snippet you've shared outlines a process for loading a pre-trained neural network model, preparing a dataset, making predictions, and calculating the profit and loss (P&L) of a betting strategy based on those predictions. Here's a breakdown of the key steps in the script and what they accomplish:

Import Libraries: Essential libraries for data manipulation, machine learning model handling, and metrics calculation are imported.

Load the Trained Model: The script loads a previously trained and saved neural network model using joblib.load.

Prepare the Dataset:

The dataset is loaded into a pandas DataFrame.
Rows with missing values are removed to ensure data quality.
The 'Date of Race' column is converted to datetime format, and along with the 'Time' column, a unique 'Race ID' is created for each race.
The 'Betfair SP' column is converted to numeric, with non-numeric values coerced into NaN and subsequently dropped.
Categorical variables are encoded using Label Encoding to transform them into a format suitable for machine learning models.
The dataset is split into features (X) and the target variable (y), with the target variable being binary (1 for a win, 0 otherwise).
Feature Scaling: The features are scaled using StandardScaler to normalize the data, a crucial step before feeding it into many machine learning algorithms, including neural networks.

Make Predictions:

The scaled features are used to make predictions with the loaded model.
The model's accuracy and a classification report are printed, providing insight into the model's performance on the dataset.
Calculate P&L:

The script calculates the total profit and loss of a betting strategy where a bet is placed on the horse with the highest predicted probability of winning in each race, provided its Betfair Starting Price (BSP) is greater than 2.0.
The script iterates through each race, selecting the horse with the highest predicted probability of winning if its BSP is above the threshold, and calculates the P&L based on the outcome.
Print Results: Finally, the total P&L and the number of horses backed according to the strategy are printed, giving an overview of the strategy's financial outcome.

This script provides a comprehensive approach to evaluating a betting strategy using machine learning predictions

jimibt · Mon Feb 26, 2024 1:47 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 11:46 am
Just playing around, but GPT4 with Advanced Data Analysis is a bit of a game changer
What would have taken me an age to code was done in minutes
...

This script provides a comprehensive approach to evaluating a betting strategy using machine learning predictions

Stu - yeah, these tools are amazing when used in combo with an objective view on what your Features should be. I've worked extensively in this area both with work based issues in the past and in the forex markets as we speak.

Key takeaways for me regards features:

1. Try to use features that fall neatly into boxed ranges. So for example, if using back or lay odds as a feature, use them a book%, rather than straight decimal odds.
2. Other range based metrics (in terms of forex) I use: "RsiValue", "AtrValue", "ADXValue", "AdxrValue", "DiPlusValue", "Vwma", "Tdfi"
3. At all costs, avoid using simple boolean values in features as the range is just too narrow to allow for comprehensive transformation.
4. Carefully review the Transformer that you train against your model. There are a ton to choose from. for my uses, i tend to use KMeansTrainer as i'm looking for data clustering.

This of course is the tip of the rabbitBerg (or is it an iceHole

) and selecting timeframes and other timescaled sequencing also really affects the quality of the predictions.

Not sure how you plan on applying it, but in forex, i typically use 4 timeframes simultaneously (m5, m30, h4, d1) to ATTEMPT to arrive at a confluence on the labels on all 4. Let's just say that I'm having varying (and at times, lacklustre) success. But, you definitely start to see improvements as you determine the features that represent your goals better.

If I'm honest, I spend FAR too much time analysing this stuff and as such, it can become a never ending quest - there is a middle ground!!

I can say tho that this space has become a lot richer since the days when I tried modelling on horseracing 3-4 years ago - such is the speed of progress in this field.

Be interested to hear how your progress goes!!

jimibt · Mon Feb 26, 2024 1:56 pm

FYI - thought I'd ask ChatGpt about KMeansTrainer, rather than describe it myself. The reason I use this one mostly is due to pattern recognition:

>>>>>>
The KMeansTrainer in ML.NET is primarily used for clustering tasks. Let’s dive into the details:

Clustering Task:
K-means is a popular clustering algorithm that groups data points into clusters based on their similarity.
The goal is to minimize the within-cluster sum of squared distances.
It’s commonly used when you want to discover natural groupings or patterns within your data.
Unlike supervised learning, clustering doesn’t require labeled data; it identifies structure purely from the input features.

KMeansTrainer Characteristics:
Input Features: The data fed to the KMeansTrainer should have a Single data type (no label column needed).

Output Columns:
Score: This column contains the distances of each data point to all clusters’ centroids.
PredictedLabel: The closest cluster’s index predicted by the model.

Initialization Methods for Cluster Centers:
K-means requires initial cluster centers. Three options are available:
Random Initialization: Simple but may lead to suboptimal results.
K-means++: Improved initialization algorithm that guarantees a solution competitive to the optimal K-means solution.
K-means||: Parallel method that reduces the number of passes needed for good initialization (default method).

Scoring Function:
The Score column contains the squared L2-norm distance (Euclidean distance) of input vectors to each cluster’s centroid.
The PredictedLabel is the index of the closest cluster.

Example Usage:
You can concatenate your input features into a single Features column.
Then use the KMeansTrainer to train the model using the k-means++ clustering algorithm1.
Remember, K-means clustering is a powerful tool for understanding patterns in your data, especially when you don’t have labeled examples.
>>>>>

sniffer66 · Mon Feb 26, 2024 2:14 pm

Cheers Jim

I do have some Boolean yes or no values in a few of my features so I'll look into those. I have a pretty rich dataset for previous horse races and about 25 features I'm using per horse. Scaling them all to a similar range fit seems to work OK though

As a test I 'm just trying to ID the winning runner, using the dataset, and then calculating a P&L based on BSP. I've used all 2023 data at 60k rows to train on, and then applied the best model after hyperparameter tuning to my 2024 data. Results look pretty decent so far, but it's obviously all theoretical.

First time I've really played around with modelling properly, so just testing things out at the moment. My main stumbling block was getting the preprocessing correct, but think I've nailed that now

jimibt · Mon Feb 26, 2024 2:19 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 2:14 pm
Cheers Jim

I do have some Boolean yes or no values in a few of my features so I'll look into those. I have a pretty rich dataset for previous horse races and about 25 features I'm using per horse. Scaling them all to a similar range fit seems to work OK though

As a test I 'm just trying to ID the winning runner, using the dataset, and then calculating a P&L based on BSP. I've used all 2023 data at 60k rows to train on, and then applied the best model after hyperparameter tuning to my 2024 data. Results look pretty decent so far, but it's obviously all theoretical.

First time I've really played around with modelling properly, so just testing things out at the moment. My main stumbling block was getting the preprocessing correct, but think I've nailed that now

was going to explain about boolean in my own words, but yet again.. cpt

:

>>>>>
In ML.NET, when using the KMeansTrainer for clustering, it’s generally recommended to avoid using boolean features directly. Let me explain why and provide some guidance on suitable data types.

Boolean Features:
Boolean features (true/false or 0/1) are not ideal for K-means clustering because they represent categorical information rather than continuous values.
K-means relies on distance-based calculations, such as Euclidean distance, to group data points. Boolean features don’t have meaningful distances between them.
When you use boolean features directly, the algorithm may not perform well, as it treats them as continuous values.

Best Types of Data for K-means:
K-means works best with continuous numerical features. These features have meaningful distances, allowing the algorithm to find natural clusters.
Suitable data types include:
Single (float): This is the most common type for continuous numeric features.
Double: Similar to single, but with higher precision.
Integers: If your data has discrete values (e.g., age, income), consider using integer features.
Floating-point representations of categorical features: If you have categorical data (e.g., product categories), convert them to numeric representations (e.g., one-hot encoding) before using them in K-means.

Feature Preprocessing:
Before applying K-means, preprocess your features:
Normalize continuous features to have zero mean and unit variance. This ensures that all features contribute equally to the clustering.
Scale features to a similar range. For example, use Min-Max scaling to map them to [0, 1].
Handle missing values appropriately (e.g., impute with mean or median).
>>>>>

sniffer66 · Mon Feb 26, 2024 2:26 pm

jimibt wrote: ↑
Mon Feb 26, 2024 2:19 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 2:14 pm
Cheers Jim

I do have some Boolean yes or no values in a few of my features so I'll look into those. I have a pretty rich dataset for previous horse races and about 25 features I'm using per horse. Scaling them all to a similar range fit seems to work OK though

As a test I 'm just trying to ID the winning runner, using the dataset, and then calculating a P&L based on BSP. I've used all 2023 data at 60k rows to train on, and then applied the best model after hyperparameter tuning to my 2024 data. Results look pretty decent so far, but it's obviously all theoretical.

First time I've really played around with modelling properly, so just testing things out at the moment. My main stumbling block was getting the preprocessing correct, but think I've nailed that now
was going to explain about boolean in my own words, but yet again.. cpt :

>>>>>
In ML.NET, when using the KMeansTrainer for clustering, it’s generally recommended to avoid using boolean features directly. Let me explain why and provide some guidance on suitable data types.

Boolean Features:
Boolean features (true/false or 0/1) are not ideal for K-means clustering because they represent categorical information rather than continuous values.
K-means relies on distance-based calculations, such as Euclidean distance, to group data points. Boolean features don’t have meaningful distances between them.
When you use boolean features directly, the algorithm may not perform well, as it treats them as continuous values.

Best Types of Data for K-means:
K-means works best with continuous numerical features. These features have meaningful distances, allowing the algorithm to find natural clusters.
Suitable data types include:
Single (float): This is the most common type for continuous numeric features.
Double: Similar to single, but with higher precision.
Integers: If your data has discrete values (e.g., age, income), consider using integer features.
Floating-point representations of categorical features: If you have categorical data (e.g., product categories), convert them to numeric representations (e.g., one-hot encoding) before using them in K-means.

Feature Preprocessing:
Before applying K-means, preprocess your features:
Normalize continuous features to have zero mean and unit variance. This ensures that all features contribute equally to the clustering.
Scale features to a similar range. For example, use Min-Max scaling to map them to [0, 1].
Handle missing values appropriately (e.g., impute with mean or median).
>>>>>

Bit tough when you have a YES/NO cell, and preprocessing will only work to binary. I'll test results with those features removed and compare

Capture.JPG

jimibt · Mon Feb 26, 2024 2:32 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 2:26 pm

Bit tough when you have a YES/NO cell, and preprocessing will only work to binary. I'll test results with those features removed and compare

Capture.JPG

in truth, it will depend on your trainer choice. KMeans/clustering patterns is maybe where this is more specific.

Fugazi · Mon Feb 26, 2024 2:53 pm

jimibt wrote: ↑
Mon Feb 26, 2024 1:47 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 11:46 am
Just playing around, but GPT4 with Advanced Data Analysis is a bit of a game changer
What would have taken me an age to code was done in minutes
...

This script provides a comprehensive approach to evaluating a betting strategy using machine learning predictions
Stu - yeah, these tools are amazing when used in combo with an objective view on what your Features should be. I've worked extensively in this area both with work based issues in the past and in the forex markets as we speak.

Key takeaways for me regards features:

1. Try to use features that fall neatly into boxed ranges. So for example, if using back or lay odds as a feature, use them a book%, rather than straight decimal odds.
2. Other range based metrics (in terms of forex) I use: "RsiValue", "AtrValue", "ADXValue", "AdxrValue", "DiPlusValue", "Vwma", "Tdfi"
3. At all costs, avoid using simple boolean values in features as the range is just too narrow to allow for comprehensive transformation.
4. Carefully review the Transformer that you train against your model. There are a ton to choose from. for my uses, i tend to use KMeansTrainer as i'm looking for data clustering.

This of course is the tip of the rabbitBerg (or is it an iceHole ) and selecting timeframes and other timescaled sequencing also really affects the quality of the predictions.

Not sure how you plan on applying it, but in forex, i typically use 4 timeframes simultaneously (m5, m30, h4, d1) to ATTEMPT to arrive at a confluence on the labels on all 4. Let's just say that I'm having varying (and at times, lacklustre) success. But, you definitely start to see improvements as you determine the features that represent your goals better.

If I'm honest, I spend FAR too much time analysing this stuff and as such, it can become a never ending quest - there is a middle ground!!

I can say tho that this space has become a lot richer since the days when I tried modelling on horseracing 3-4 years ago - such is the speed of progress in this field.

Be interested to hear how your progress goes!!

You guys are speaking absolute gibberish to me.

Serious question though - how did you learn about it? Mathematicians ? Computer scientists ? Computer programmers? Or just self taught online over the years ?

sniffer66 · Mon Feb 26, 2024 4:26 pm

Fugazi wrote: ↑
Mon Feb 26, 2024 2:53 pm

jimibt wrote: ↑
Mon Feb 26, 2024 1:47 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 11:46 am
Just playing around, but GPT4 with Advanced Data Analysis is a bit of a game changer
What would have taken me an age to code was done in minutes
...

This script provides a comprehensive approach to evaluating a betting strategy using machine learning predictions
Stu - yeah, these tools are amazing when used in combo with an objective view on what your Features should be. I've worked extensively in this area both with work based issues in the past and in the forex markets as we speak.

Key takeaways for me regards features:

1. Try to use features that fall neatly into boxed ranges. So for example, if using back or lay odds as a feature, use them a book%, rather than straight decimal odds.
2. Other range based metrics (in terms of forex) I use: "RsiValue", "AtrValue", "ADXValue", "AdxrValue", "DiPlusValue", "Vwma", "Tdfi"
3. At all costs, avoid using simple boolean values in features as the range is just too narrow to allow for comprehensive transformation.
4. Carefully review the Transformer that you train against your model. There are a ton to choose from. for my uses, i tend to use KMeansTrainer as i'm looking for data clustering.

This of course is the tip of the rabbitBerg (or is it an iceHole ) and selecting timeframes and other timescaled sequencing also really affects the quality of the predictions.

Not sure how you plan on applying it, but in forex, i typically use 4 timeframes simultaneously (m5, m30, h4, d1) to ATTEMPT to arrive at a confluence on the labels on all 4. Let's just say that I'm having varying (and at times, lacklustre) success. But, you definitely start to see improvements as you determine the features that represent your goals better.

If I'm honest, I spend FAR too much time analysing this stuff and as such, it can become a never ending quest - there is a middle ground!!

I can say tho that this space has become a lot richer since the days when I tried modelling on horseracing 3-4 years ago - such is the speed of progress in this field.

Be interested to hear how your progress goes!!
You guys are speaking absolute gibberish to me.

Serious question though - how did you learn about it? Mathematicians ? Computer scientists ? Computer programmers? Or just self taught online over the years ?

I'm completely self taught on the coding side, so make a good few mistakes. I was in IT support prior to taking early retirement, so have a background, but not in a developer role. I did some minor coding but very basic stuff. The great thing is GPT 4 (monthly sub) will do most of the work for you. Just upload your data, normalise it, and ask it for what you need, and it will produce the code.
The trick is in the asking, it's taken me a few days to learn to ask the right questions/requests

Fugazi · Mon Feb 26, 2024 4:51 pm

sniffer66 wrote: ↑
Mon Feb 26, 2024 4:26 pm

Fugazi wrote: ↑
Mon Feb 26, 2024 2:53 pm

jimibt wrote: ↑
Mon Feb 26, 2024 1:47 pm

Stu - yeah, these tools are amazing when used in combo with an objective view on what your Features should be. I've worked extensively in this area both with work based issues in the past and in the forex markets as we speak.

Key takeaways for me regards features:

1. Try to use features that fall neatly into boxed ranges. So for example, if using back or lay odds as a feature, use them a book%, rather than straight decimal odds.
2. Other range based metrics (in terms of forex) I use: "RsiValue", "AtrValue", "ADXValue", "AdxrValue", "DiPlusValue", "Vwma", "Tdfi"
3. At all costs, avoid using simple boolean values in features as the range is just too narrow to allow for comprehensive transformation.
4. Carefully review the Transformer that you train against your model. There are a ton to choose from. for my uses, i tend to use KMeansTrainer as i'm looking for data clustering.

This of course is the tip of the rabbitBerg (or is it an iceHole ) and selecting timeframes and other timescaled sequencing also really affects the quality of the predictions.

Not sure how you plan on applying it, but in forex, i typically use 4 timeframes simultaneously (m5, m30, h4, d1) to ATTEMPT to arrive at a confluence on the labels on all 4. Let's just say that I'm having varying (and at times, lacklustre) success. But, you definitely start to see improvements as you determine the features that represent your goals better.

If I'm honest, I spend FAR too much time analysing this stuff and as such, it can become a never ending quest - there is a middle ground!!

I can say tho that this space has become a lot richer since the days when I tried modelling on horseracing 3-4 years ago - such is the speed of progress in this field.

Be interested to hear how your progress goes!!
You guys are speaking absolute gibberish to me.

Serious question though - how did you learn about it? Mathematicians ? Computer scientists ? Computer programmers? Or just self taught online over the years ?

I'm completely self taught on the coding side, so make a good few mistakes. I was in IT support prior to taking early retirement, so have a background, but not in a developer role. I did some minor coding but very basic stuff. The great thing is GPT 4 (monthly sub) will do most of the work for you. Just upload your data, normalise it, and ask it for what you need, and it will produce the code.
The trick is in the asking, it's taken me a few days to learn to ask the right questions/requests

Yeah I'm good with chatgpt 4, though find sometimes by having zero coding knowledge I don't know the right questions at times to get it out of a negative loop of answers that dont get anywhere

sionascaig · Wed Feb 28, 2024 8:09 am

https://www.bbc.co.uk/news/technology-68412620

Interesting article re Gemini - in order to remove "bias" in training data Google have introduced bias in responses and ended up with absurdities...

"When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable.

Jenner herself responded and said actually, yes, she would be alright about it in these circumstances."

Archangel · Wed Feb 28, 2024 10:37 am

The only real barrier to AI taking over the world will be bandwidth. Each iteration of AI will probably require computing power by an order of magnitude. There arent enough computers in the world right now. But OpenAI want 7 Trillion to manufacture chips

Fugazi · Wed Feb 28, 2024 12:45 pm

Archangel wrote: ↑
Wed Feb 28, 2024 10:37 am
The only real barrier to AI taking over the world will be bandwidth. Each iteration of AI will probably require computing power by an order of magnitude. There arent enough computers in the world right now. But OpenAI want 7 Trillion to manufacture chips

We used to play games on floppy disks.

They will make the space

Chat GPT & Generative AI tools

Login • Register