I am new to Python and getting to grips with Pandas. I am trying to perform a simple import CSV, filter, write CSV but can’t the filter seems to be dropping rows of data compared to my Access query.
I am importing via the command below:
Costs1516 = pd.read_csv('C:......../1b Data MFF adjusted.csv')
Following import I get a data warning that the service code column contains data of multiple types (some are numerical codes others are purely text) but the import seems to attribute data type Object which I thought would just treat them both as strings and all would be fine….
I want the output dataframe to have the same structure as the the imported data (Costs1516), but only to include rows where ‘Service Code’ = ‘110’.
I have pulled the following SQL from Access which seems to do the job well, and returns 136k rows:
SELECT [1b Data MFF adjusted].*, [1b Data MFF adjusted].``[Service code] FROM [1b Data MFF adjusted] WHERE ((([1b Data MFF adjusted].[Service code])="110"));
My pandas equivalent is below but only returns 99k records:
Costs1516Ortho = Costs1516.loc[Costs1516['Service code'] == '110']
I have compared the two outputs and I can’t see any reason why pandas is excluding some lines and including others….I’m really stuck…any suggested areas to look or approaches to test gratefully received.
Link: Selecting Pandas DataFrame Rows Based On Conditions
Source: Stack Sql