80+ Best ChatGPT Prompts for Data Science

Data science is a constantly evolving field, and staying ahead requires continuous learning and exploration. Thankfully, with the advent of cutting-edge AI technology like ChatGPT, data scientists now have a powerful tool at their disposal. In this blog post, we will explore the best ChatGPT prompts for data science tasks, from building machine learning models to code optimization and everything in between. Whether you’re a seasoned data scientist looking to enhance your skills or a curious beginner eager to dive into the world of data science, these prompts will challenge and inspire you. Get ready to unleash the full potential of ChatGPT in your data science journey!

80+ Best ChatGPT Prompts for Data Science

  • I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please build a machine learning model that predicts [target variable].
  • I want you to act as an automatic machine learning (AutoML) bot using TPOT for me. I am working on a model that predicts […]. Please write Python code to find the best classification model with the highest AUC score on the test set.
  • I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please write code for data visualization and exploration.
  • I want you to act as a coder in Python. I have a dataset [name] with columns [name]. [Describe graph requirements]
  • I want you to act as a software developer. Please help me improve the time complexity of the code below. [Insert code]
  • I want you to act as a code translator. Can you please convert the following code from Python to R? [Insert code]
  • I want you to act as a data scientist and explain the model’s results. I have trained a scikit-learn XGBoost model and I would like to explain the output using a series of plots with SHAP. Please write the code.
  • I want you to act as a data scientist and code for me. I have a time series dataset of [describe dataset]. Please perform a time series decomposition and plot the components.
  • I want you to act as a data scientist and code for me. I have a time series dataset of [describe dataset]. Please help me build an ARIMA model to forecast the data.
  • I want you to act as a deep learning expert. Please write code to create a simple neural network with TensorFlow for [describe task].
  • I want you to act as a deep learning expert. I have a dataset [describe dataset]. Please write code to perform transfer learning using a pretrained model from TensorFlow Hub.
  • I want you to act as a natural language processing expert. I have a text dataset [describe dataset]. Please help me extract named entities using SpaCy.
  • I want you to act as a data scientist and code for me. I have several datasets with different structures [describe datasets]. Please help me combine them into a single dataset for analysis.
  • I want you to act as a data privacy expert. What are some privacy-preserving techniques we can use in data science projects?
  • I want you to act as a big data expert. I have a dataset [describe dataset]. Please help me build a machine learning model using Apache Spark.
  • I want you to act as a data science education expert. What are the best courses and resources for learning data science?
  • I want you to act as a Python code generator and create a function that will do [task].
  • I want you to act as a Python script writer and write a program that will scrape [data source] data from a website.
  • I want you to act as a Python developer and write a module that will calculate [metric] using [dataset].
  • I want you to act as a data scientist and detect [anomalies] in the [network traffic] of [organization] using [machine learning] algorithms.
  • I want you to act as a security analyst and identify [intrusions] in the [system logs] of [server] using [anomaly detection] techniques.
  • I want you to act as a fraud analyst and detect [fraudulent transactions] in the [financial data] of [company] using [statistical analysis] methods.
  • I want you to act as an automatic machine learning (AutoML) bot using TPOT for me. I am working on a model that predicts […]. Please write python code to find the best classification model with the highest AUC score on the test set.
  • I want you to act as an AutoML system and generate Python code to build a machine learning pipeline that optimizes [metric] on [dataset].
  • I want you to act as an ML engineer and create an AutoML script that tunes [hyperparameters] to achieve the best performance on [dataset].
  • I want you to act as a data scientist and use Auto-sklearn to automatically build a classification model that predicts [target variable] based on [features] features.
  • I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please build a machine learning model that predict [target variable].
  • I want you to act as a data scientist and train a classification model to predict [target variable] based on [features] dataset.
  • I want you to act as a machine learning engineer and build a classification model that can classify [label] based on [features] features.
  • I want you to act as a deep learning specialist and train a convolutional neural network to classify [object] using [image format] images.
  • I want you to act as a software developer. I would like to compare the efficiency of two algorithms that performs the same thing in python. Please write code that helps me run an experiment that can be repeated for 5 times. Please output the runtime and other summary statistics of the experiment. [Insert functions]
  • I want you to act as a performance tester and compare the speed of [function1] and [function2] when processing [input data] in [Python script].
  • I want you to act as a data scientist and compare the speed of different [machine learning algorithms] on [dataset] using the [timeit] module.
  • I want you to act as a speed optimizer and compare the speed of different [Python libraries] for [task] in [code snippet].
  • I want you to act as a data scientist. I need to create a numpy array. This numpy array should have the shape of (x,y,z). Please initialize the numpy array with random values.
  • I want you to act as a data scientist and create a 1D NumPy array of [length] that contains [values].
  • I want you to act as a Python developer and create a 2D NumPy array of shape [row, column] that represents the [matrix] in [dataset].
  • I want you to act as a machine learning expert and create a random 3D NumPy array of shape [batch_size, height, width] that simulates [image data].
  • I want you to act as a data scientist and cluster the [customers] in [dataset] into [n] groups based on their [purchase history].
  • I want you to act as a machine learning expert and develop a [clustering model] that groups the [documents] in [dataset] based on their [content].
  • I want you to act as a data analyst and visualize the [clusters] in [dataset] using [dimensionality reduction] techniques.
  • I want you to act as a data scientist and reduce the [dimensionality] of the [image data] in [dataset] using [principal component analysis] technique.
  • I want you to act as a data scientist and provide a step-by-step guide on how to perform [t-SNE] for my dataset.
  • I want you to act as a data scientist and explain the difference between [PCA] and [LDA] and how they can be used for [dimensionality reduction] in my dataset.
  • I want you to act as a data scientist and code for me. I have trained a [model name]. Please write the code to tune the hyper parameters.
  • I want you to act as a hyperparameter tuner and optimize the [hyperparameter] of a [algorithm] algorithm to achieve the highest [metric] on [dataset].
  • I want you to act as a machine learning expert and use Optuna to perform a Bayesian optimization of [hyperparameters] for a [model] on [dataset].
  • I want you to act as a data scientist and perform a random search of [hyperparameters] for a [algorithm] algorithm to achieve the best [metric] on [dataset].
  • I want you to act as a data analyst and preprocess the [raw data] in [dataset] by removing [duplicate records] and [missing values].
  • I want you to act as a data engineer and preprocess the [time-series data] in [dataset] by resampling it to a [lower or higher frequency].
  • I want you to act as a data scientist and preprocess the [text data] in [dataset] by [tokenizing] it and removing [stop words] and [punctuation marks].
  • I want you to act as a data scientist and cluster the [customers] in [dataset] into [n] groups based on their [purchase history].
  • I want you to act as a machine learning expert and develop a [clustering model] that groups the [documents] in [dataset] based on their [content].
  • I want you to act as a data analyst and visualize the [clusters] in [dataset] using [dimensionality reduction] techniques.
  • I want you to act as a data scientist and reduce the [dimensionality] of the [image data] in [dataset] using [principal component analysis] technique.
  • I want you to act as a data scientist and provide a step-by-step guide on how to perform [t-SNE] for my dataset.
  • I want you to act as a data scientist and explain the difference between [PCA] and [LDA] and how they can be used for [dimensionality reduction] in my dataset.
  • I want you to act as a data scientist and code for me. I have trained a [model name]. Please write the code to tune the hyper parameters.
  • I want you to act as a hyperparameter tuner and optimize the [hyperparameter] of a [algorithm] algorithm to achieve the highest [metric] on [dataset].
  • I want you to act as a machine learning expert and use Optuna to perform a Bayesian optimization of [hyperparameters] for a [model] on [dataset].
  • I want you to act as a data scientist and perform a random search of [hyperparameters] for a [algorithm] algorithm to achieve the best [metric] on [dataset].
  • I want you to act as a data analyst and preprocess the [raw data] in [dataset] by removing [duplicate records] and [missing values].
  • I want you to act as a data engineer and preprocess the [time-series data] in [dataset] by resampling it to a [lower or higher frequency].
  • I want you to act as a data scientist and preprocess the [text data] in [dataset] by [tokenizing] it and removing [stop words] and [punctuation marks].
  • I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please write code for data visualisation and exploration.
  • I want you to act as a data analyst and generate a visualization that shows the distribution of [feature] in [dataset].
  • I want you to act as a data scientist and generate summary statistics of [feature] in [dataset].
  • I want you to act as a data explorer and clean [dataset] by removing missing values, duplicates, and outliers.
  • I want you to act as a fake data generator. I need a dataset that has x rows and y columns: [insert column names]
  • I want you to act as a data generator and create a synthetic dataset with [number of features] features and [number of instances] instances.
  • I want you to act as a data scientist and generate a time series dataset with [seasonality] seasonality and [trend] trend.
  • I want you to act as a data simulation expert and generate a dataset that simulates [process] with [parameters] parameters.
  • I want you to act as a coder. I have trained a machine learning model on an imbalanced dataset. The predictor variable is the column [Insert column name]. In python, how do I oversample and/or undersample my data?
  • I want you to act as a data scientist and use SMOTE to oversample the minority class of [imbalanced dataset] for classification task.
  • I want you to act as a machine learning expert and use stratified sampling to balance the distribution of [target variable] in [dataset].
  • I want you to act as a data engineer and apply random undersampling to address the class imbalance in [imbalanced dataset] for training a model.
  • I want you to act as a machine learning expert and build a [text classification model] that classifies [customer feedback] in [dataset] as positive or negative.
  • I want you to act as a data scientist and analyze the [sentiment] of the [reviews] in [dataset] using [natural language processing] techniques.
  • I want you to act as a language model researcher and develop a [language model] that can generate [text data] similar to the [training data].
  • I want you to act as a data scientist and develop a [content-based recommender system] that suggests [articles] based on [user interests].
  • I want you to act as a machine learning expert and build a [collaborative filtering model] that recommends to [customers] based on their [purchase history].
  • I want you to act as a data analyst and evaluate the [accuracy] of the [recommendations] generated by the [recommender system] in [dataset].
  • I want you to act as a data scientist and code for me. I have a time series dataset [describe dataset]. Please build a machine learning model that predict [target variable]. Please use [time range] as train and [time range] as validation.
  • I want you to act as a time series expert and build a recurrent neural network that predicts [target variable] based on [time series data].
  • I want you to act as a data scientist and train a seasonal ARIMA model to forecast [variable] in [time series data] using [forecast horizon] forecast periods.
  • I want you to act as a machine learning engineer and train a long short-term memory network that detects [event] in [sensor data].
  • I want you to act as a coder in python. I have a dataset [name] with columns [name]. [Describe graph requirements]
  • I want you to act as a data visualization expert and create a [type of plot] that shows the relationship between [variable1] and [variable2] in [dataset].
  • I want you to act as a data scientist and create a [type of plot] that displays the distribution of [variable] in [dataset] and compare it across different [categorical variable].
  • I want you to act as a data analyst and create a [type of plot] that shows the trend of [variable] over time in [dataset].
  • I want you to act as a coder. I have a folder of images. [Describe how files are organised in directory] [Describe how you want images to be printed]
  • I want you to act as a data scientist and explain the model’s results. I have trained a [library name] model and I would like to explain the output using LIME. Please write the code.
  • I want you to act as a machine learning specialist and use Lime to explain how a [model] made a prediction for a specific instance in [dataset].
  • I want you to act as a data scientist and use Lime to identify the important features that contributed to the prediction of [target variable] for [model] on [dataset].
  • I want you to act as a model explainer and use Lime to explain how a [model] handles the interaction between [features] in [dataset].
  • I want you to act as a data scientist and explain the model’s results. I have trained a scikit-learn XGBoost model and I would like to explain the output using a series of plots with Shap. Please write the code.

Benefits of using ChatGPT as a data science tool

Using ChatGPT as a data science tool offers several benefits that can enhance the data exploration process and provide valuable insights. Here are some key advantages of utilizing ChatGPT in data science:

  • Interactivity: ChatGPT enables interactive conversations, allowing data scientists to have a dynamic and iterative exploration process. It can quickly respond to queries and provide instant feedback, enhancing the speed of analysis.
  • Intuitive Language: ChatGPT understands natural language, making it easy to communicate complex data science concepts and queries in a conversational manner. This simplifies the interaction and eliminates the need for formal programming languages.
  • Exploratory Data Analysis (EDA): ChatGPT can assist in EDA tasks by generating descriptive statistics, visualizations, and summaries of datasets. It can help identify patterns, outliers, and correlations within the data, providing a comprehensive overview for analysis.
  • Ideation and Hypothesis Generation: ChatGPT can generate ideas and hypotheses based on the data. It can propose potential relationships or insights that data scientists may not have considered, stimulating creative thinking and generating new research directions.
  • Documentation and Collaboration: ChatGPT can be used as a documentation tool during the analysis process. It can summarize findings, record experimental setups, and keep track of insights. This feature facilitates collaboration as it allows data scientists to share knowledge and workflows with team members.
  • Contextual Understanding: ChatGPT maintains context throughout the conversation, providing accurate responses and insights that align with the ongoing analysis. It can recall previous questions and answers, leading to more meaningful and contextually relevant discussions.
  • Quick Prototyping: ChatGPT can be used to rapidly prototype and test ideas without the need for extensive coding or implementation. This enables data scientists to experiment and validate hypotheses quickly, saving time and effort in the early stages of the analysis.

Overall, incorporating ChatGPT into data science workflows can enhance the exploration process, improve efficiency, and promote creative thinking in generating insights from data.

Using ChatGPT to conduct exploratory data analysis

When it comes to conducting exploratory data analysis (EDA), ChatGPT can be a valuable tool for data scientists. By using ChatGPT to interact with your data, you can gain valuable insights and uncover patterns that may not be immediately apparent. Here are some ways that ChatGPT can be used in EDA:

  • Data Summarization: ChatGPT can help in summarizing large datasets by providing statistical measures such as mean, median, mode, and standard deviation. It can also generate histograms and box plots to visualize the distribution of data.
  • Data Cleansing: ChatGPT can assist in identifying missing values, outliers, or inconsistencies in your dataset. It can suggest appropriate methods for handling missing data and help in identifying potential errors or anomalies.
  • Feature Selection: ChatGPT can provide insights into the relevance and importance of different features in your dataset. By discussing the variables with ChatGPT, it can suggest which features are most likely to have a significant impact on your analysis.
  • Pattern Recognition: ChatGPT can help in identifying patterns and trends in your data. By asking questions and discussing the data with ChatGPT, you can gain a deeper understanding of relationships between variables and uncover hidden insights.
  • Outlier Detection: ChatGPT can help in identifying outliers in your dataset. By discussing the data with ChatGPT, it can suggest potential outliers based on statistical measures such as the z-score or interquartile range.
  • Data Visualization: ChatGPT can generate visualizations to help in data exploration. By discussing the data with ChatGPT, you can ask it to plot specific variables or generate charts and graphs based on your requirements.

It is important to note that ChatGPT should be used as a complementary tool in data exploration and not as a replacement for traditional statistical analysis. It can provide valuable insights and suggestions, but it is still essential to validate and verify the findings using proper statistical techniques and domain knowledge.

Techniques and insights generated by ChatGPT in data exploration

When it comes to data exploration, ChatGPT can be a powerful tool that can generate valuable techniques and insights. Here are some ways in which ChatGPT can contribute to your data science endeavors:

  • Generating summary statistics: ChatGPT can quickly summarize key statistics from your dataset, such as mean, median, standard deviation, and more. This can give you a high-level understanding of your data distribution without having to write complex code.
  • Identifying outliers: By asking ChatGPT to help identify outliers in your dataset, it can provide insights on data points that deviate significantly from the norm. This can be useful in identifying potential errors or anomalies in your data that may require further investigation.
  • Recommending visualization techniques: When you’re unsure which visualization technique would best represent your dataset, ChatGPT can offer suggestions based on your specific data characteristics. It can recommend different chart types, such as scatter plots, line graphs, or histograms, to effectively communicate your data insights.
  • Guiding feature selection: Feature selection is a crucial step in building machine learning models. With ChatGPT, you can discuss your dataset’s features and the problem you’re trying to solve, and it can provide suggestions on which features are most important for your specific task.
  • Exploring correlations: ChatGPT can assist in identifying potential correlations between different variables in your dataset. By asking questions related to the variables of interest, it can help you explore relationships and identify patterns that may inform your analysis.

References:

https://www.learnprompt.org/chat-gpt-prompts-for-data-science/

https://docs.kanaries.net/articles/chatgpt-prompt-data-scientist

Leave a Comment

promptsheaven

Open one of your favorite AI Tools, then open PromptsHeaven (PH), pick one of the prompts from PH and insert it into your AI Tool and start working on the result. No more wasting time & money to figure out what to do next!

Item added to cart.
0 items - $0.00