Jump to content


Anyone know Python or Anaconda?


Recommended Posts


What's the question?

 

I've got about 1,200 different categories, which I've put into 1,200 dataframes (not sure if that's a good way to go about this, but it's what I did).

 

Example:

 

LastName FirstName

CD Teacher

Zoogs Bob

Damodred Moiraine

Damodred Alastair

Zoogs Larry

Zoogs Bob

 

where LastName is the category and I want to do "stuff" to the first names. By stuff I mean lots of different operations. There's a different dataframe for each last name. I want to do the same "stuff" to the first names for every LastName. I just can't figure out how to make a loop that will go through multiple dataframes.

 

In basic English the loop would be:

 

For each last name:

Do these 30 fancy things to the first names

 

 

 

Not sure why I just now thought of this but once I put them into dataframes I could delete the last name column. That might help uncomplicate things for me. I started learning Python a week ago. There are about 3 other columns in the data set.

Link to comment

Bob Zoogs, that's me.

 

Do you use R, by any chance? "Dataframes" sounds rather R-like.

 

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

 

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

 

for(i = 0; i < length(dfList); i++) {
  doTheFancyThings();
}
Where:

doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped :) Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language! :D

 

R should be pretty well suited to something like this. The methodology is Split/Apply/Combine but that kind of supposes you have everything (including the LastName field) in *one* dataframe. Have it as a factor, and then I think it's tapply your way through that. I'm not super familiar with dplyr, but I'm sure that would provide an even easier grammar for the operation.

Link to comment

Bob Zoogs, that's me.

 

Do you use R, by any chance? "Dataframes" sounds rather R-like.

 

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

 

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

 

for(i = 0; i < length(dfList); i++) {
  doTheFancyThings();
}
Where:

doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped :) Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language! :D

 

 

 

Yes I know R. Not super well but better than Python. But the things I need to do need to be done in Python.

 

The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.

Link to comment

Bob Zoogs, that's me.

 

Do you use R, by any chance? "Dataframes" sounds rather R-like.

 

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

 

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

 

for(i = 0; i < length(dfList); i++) {
  doTheFancyThings();
}
Where:

doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped :) Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language! :D

 

R should be pretty well suited to something like this. The methodology is Split/Apply/Combine but that kind of supposes you have everything (including the LastName field) in *one* dataframe. Have it as a factor, and then I think it's tapply your way through that. I'm not super familiar with dplyr, but I'm sure that would provide an even easier grammar for the operation.

 

 

  • Fire 3
Link to comment

The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.

Oh, wow, that's interesting. I'm not sure how memory issues work in python. Please keep us posted, as I'll be curious to see the solution!

 

Also, does this help?

http://stackoverflow.com/questions/36601956/how-can-i-iterate-through-multiple-dataframes-to-select-a-column-in-each-in-pyth

 

They have

 

for name in dfList:
Or: pandas http://pandas.pydata.org/pandas-docs/stable/groupby.html

 

(I can't tell if memory will play a factor there in your case. If you already have the data manually split out, is it not possible to have that in a list and loop over it?)

Link to comment

 

The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.

Oh, wow, that's interesting. I'm not sure how memory issues work in python. Please keep us posted, as I'll be curious to see the solution!

 

Also, does this help?

http://stackoverflow.com/questions/36601956/how-can-i-iterate-through-multiple-dataframes-to-select-a-column-in-each-in-pyth

 

They have

 

for name in dfList:
Or: pandas http://pandas.pydata.org/pandas-docs/stable/groupby.html

 

(I can't tell if memory will play a factor there in your case. If you already have the data manually split out, is it not possible to have that in a list and loop over it?)

 

 

 

I've actually been to that post and since Python is so new to me it doesn't make a lot of sense and I can't really translate it to what I want to do. Pandas is what I'm using.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.

Visit the Sports Illustrated Husker site



×
×
  • Create New...