Selecting Pandas DataFrame Rows Based On Conditions


I am new to Python and getting to grips with Pandas. I am trying to perform a simple import CSV, filter, write CSV but can’t the filter seems to be dropping rows of data compared to my Access query.

I am importing via the command below:

Costs1516 = pd.read_csv('C:......../1b Data MFF adjusted.csv')

Following import I get a data warning that the service code column contains data of multiple types (some are numerical codes others are purely text) but the import seems to attribute data type Object which I thought would just treat them both as strings and all would be fine….

I want the output dataframe to have the same structure as the the imported data (Costs1516), but only to include rows where ‘Service Code’ = ‘110’.

I have pulled the following SQL from Access which seems to do the job well, and returns 136k rows:

SELECT [1b Data MFF adjusted].*, [1b Data MFF adjusted].``[Service code]
FROM [1b Data MFF adjusted]
WHERE ((([1b Data MFF adjusted].[Service code])="110"));

My pandas equivalent is below but only returns 99k records:

Costs1516Ortho = Costs1516.loc[Costs1516['Service code'] == '110']

I have compared the two outputs and I can’t see any reason why pandas is excluding some lines and including others….I’m really stuck…any suggested areas to look or approaches to test gratefully received.

Link: Selecting Pandas DataFrame Rows Based On Conditions
Source: Stack Sql


About Author

Leave A Reply