First, the OP misunderstands the rows and columns in its framework.
But acutal output considers rows that are in both data frames (the only common element of the string is 'y')
OP read the y mark for the string. However, y is the name of the column.
df1 = pd.DataFrame( {"x":[1, 2, 3, 4, 5],
This is very easy to mislead, because in the dictionary it looks like y and x - these are two lines.
If you create df1 from a list of lists, it should be more intuitive:
df1 = pd.DataFrame([[1,3], [2,4], [3,5], [4,6], [5,7]], index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
So, back to the problem, concat is an abbreviation for concatenate (means connecting in a sequence or chain along this path [source] ) Running concat along the 0 axis means linking two objects along the axis .
1 1 <-- series 1 1 ^ ^ ^ | | | 1 caa 1 olx 1 noi gives you 2 cns 2 ag 0 2 t | | | VV v 2 2 <--- series 2 2
So ... I think you have a feeling now. What about the sum function in pandas? What does sum(axis=0) mean?
Suppose the data looks like
1 2 1 2 1 2
Maybe ... summation along the 0 axis, you can guess. Yes!!
^ ^ ^ | | | saaulxmoi gives you two values 3 6 ! | nsvg 0 | | VV
How about dropna ? Suppose you have data
1 2 NaN NaN 3 5 2 4 6
and you want to save
2 3 4
The documentation says that a Return object with labels on a given axis is omitted, where any or all of the data is alternately missing.
Should you put dropna(axis=0) or dropna(axis=1) ? Think about it and try with
df = pd.DataFrame([[1, 2, np.nan], [np.nan, 3, 5], [2, 4, 6]])
Hint: think about this word.