Data Ingestion
Data Ingestion
- pd.read_csv("XYZ.csv", )
- pd.read_table("XYZ.csv", sep=",")
- pd.read_table("XYZ.csv", sep=",", header=None) -- pandas will provide header column.
- pd.read_table("XYZ.csv", sep=",", names=['a', 'b', 'c', 'd', 'e']) --- provide column names.
- pd.read_table("XYZ.csv", sep=",", names=['a', 'b', 'c', 'd', 'e'], index_col="Names") --- Make one column as row labels.
- pd.read_table("XYZ.csv", sep=",", names=['a', 'b', 'c', 'd', 'e'], index_col=["Names1", "Names2"]) --- Make two column as row labels.
Checkk NULL values in dataframe
- X.isnull()
Read a particular character as NULL value from file
- pd.read_csv("XYZ.csv", sep=",", na_values=["d","e"]) -- 'd' and "e" character will be read a NULL.
- pd.read_csv("XYZ.csv", sep=",", na_values="Col1":["d","e"], "Col2":["a") -- 'd' and "e" from 'COL1" and "a" from "Col2" column will be read a NULL.
Reading large files
- pd.read_csv("XYZ.csv", sep=",", na_values=["d","e"], skiprows=[3,5]) - will skip rows 3 and 5 reading
Defining max no of rows to be read from file
- pd.options.display.max_rows = 10
- pd.read_csv("XYZ.csv", sep=",", na_values=["d","e"], skiprows=[3,5]) - MAX=10 rows will be read. In this case first 5 rows and last 5 rows will be read.
- pd.read_csv("XYZ.csv", sep=",", na_values=["d","e"], nrows=5) - First 5 rows will be read.
Reading a chunk of file
- fileChunk = pd.read_csv("XYZ.csv", sep=",", na_values=["d","e"], chunksize=5) -- every chunk will have 5 rows.
- for temp_chunksize in fileChunk:
- print(fileChunk)
Writing to a CSV file
- X.to_csv("XYZ.csv") -- NAN value will be empty string
- X.to_csv("XYZ.csv", na_rep="NULL") -- NAN value will be written as NULL
- X.to_csv("XYZ.csv", na_rep="NULL", index = False, header=False) -- NAN value will be written as NULL, No row and column label will be printed.
- X.to_csv("XYZ.csv", na_rep="NULL", index = False, columns=["col1", "col2"]) -- only 2 columns will be printed.
Reading JSON, HTML, Pickle file
- pd.read_json("iris.json") - READ JSON file
- X = pd.read_html("*.html")
- X.to_pickle('file') -- stores to pickle file
- Y - pd.read_pickle('file') -- read from pickle file.
0 comments:
Post a Comment