We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows
N/A
Folder structure: / └── my_data ├── _delta_log ├── part-0000-xxx.parquet └── part-0001-xxx.parquet Relative path for dataset_name set up in dataset type(from azure ML v1 APIs) file: my_data/*.parquet
from azureml.core import Workspace,Dataset import pandas as pd ws = Workspace.from_config() dataset = Dataset.get_by_name(ws, "Dataset_name") downloaded_parquets = dataset.download() df_list = list() for file in downloaded_parquets: df_list.append(pd.read_parquet(file, engine='pyarrow')) df = pd.concat(df_list) df.head(6)
id name 3 USA 1 FR 2 UK 5 SPAIN
id name 1 FR 4 INDIA 5 Crude 2 UK 3 USA 5 SPAIN
The actual data it is showing is all the data that was added, doesn't show the most recent version
No response
The content you are editing has changed. Please copy your edits and refresh the page.
An error occurred while loading your tasklist. Please try refreshing the page, or edit + save the issue body.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Operating System
Windows
Version Information
N/A
Steps to reproduce
Folder structure:
/
└── my_data
├── _delta_log
├── part-0000-xxx.parquet
└── part-0001-xxx.parquet
Relative path for dataset_name set up in dataset type(from azure ML v1 APIs) file: my_data/*.parquet
from azureml.core import Workspace,Dataset
import pandas as pd
ws = Workspace.from_config()
dataset = Dataset.get_by_name(ws, "Dataset_name")
downloaded_parquets = dataset.download()
df_list = list()
for file in downloaded_parquets:
df_list.append(pd.read_parquet(file, engine='pyarrow'))
df = pd.concat(df_list)
df.head(6)
Expected behavior
id name
3 USA
1 FR
2 UK
5 SPAIN
Actual behavior
id name
1 FR
4 INDIA
5 Crude
2 UK
3 USA
5 SPAIN
The actual data it is showing is all the data that was added, doesn't show the most recent version
Addition information
No response
Tasks
The text was updated successfully, but these errors were encountered: